hallucination detection ↔ RAGTruth

Relations (1)

related 2.58 — strongly supporting 5 facts

RAGTruth is a benchmark specifically designed for the evaluation of hallucination detection methods [1], [2]. The performance of various hallucination detection techniques, such as Datadog and RL4HS, is measured and compared using the RAGTruth dataset [3], [4], [5].

Facts (5)

Sources

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog 3 facts

referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.

measurementF1 scores for hallucination detection methods are consistently higher on HaluBench than on RAGTruth, suggesting that RAGTruth is a more difficult benchmark.

claimThe Datadog hallucination detection method showed the smallest drop in F1 scores between HaluBench and RAGTruth, suggesting robustness as hallucinations become harder to detect.

Awesome-Hallucination-Detection-and-Mitigation - GitHub github.com GitHub 1 fact

referenceNiu et al. (2024) published 'RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models' in the proceedings of ACL 2024.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact

measurementOn the RAGTruth dataset, which covers QA, summarization, and data-to-text tasks, the RL4HS framework improves fine-grained hallucination detection compared to chain-of-thought-based and supervised baselines.