Relations (1)
related 2.58 — strongly supporting 5 facts
RAGTruth is a benchmark specifically designed for the evaluation of hallucination detection methods [1], [2]. The performance of various hallucination detection techniques, such as Datadog and RL4HS, is measured and compared using the RAGTruth dataset [3], [4], [5].
Facts (5)
Sources
Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com 3 facts
referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.
measurementF1 scores for hallucination detection methods are consistently higher on HaluBench than on RAGTruth, suggesting that RAGTruth is a more difficult benchmark.
claimThe Datadog hallucination detection method showed the smallest drop in F1 scores between HaluBench and RAGTruth, suggesting robustness as hallucinations become harder to detect.
Awesome-Hallucination-Detection-and-Mitigation - GitHub github.com 1 fact
referenceNiu et al. (2024) published 'RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models' in the proceedings of ACL 2024.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 1 fact
measurementOn the RAGTruth dataset, which covers QA, summarization, and data-to-text tasks, the RL4HS framework improves fine-grained hallucination detection compared to chain-of-thought-based and supervised baselines.