measurement
F1 scores for hallucination detection methods are consistently higher on HaluBench than on RAGTruth, suggesting that RAGTruth is a more difficult benchmark.
Authors
Sources
- Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog www.datadoghq.com via serper
Referenced by nodes (3)
- hallucination detection concept
- F1 score concept
- RAGTruth concept