summarization ↔ hallucination detection

Relations (1)

related 2.00 — strongly supporting 3 facts

Hallucination detection is a critical evaluation task applied to summarization models, as evidenced by benchmarks like RAGTruth [1] and the Hallucination Leaderboard [2], which utilize datasets like CNN/DM [3] to measure the accuracy and factual consistency of generated summaries.

Facts (3)

Sources

The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Hugging Face 2 facts

claimThe Hallucination Leaderboard includes tasks across several categories: Closed-book Open-domain QA (NQ Open, TriviaQA, TruthfulQA), Summarisation (XSum, CNN/DM), Reading Comprehension (RACE, SQuADv2), Instruction Following (MemoTrap, IFEval), Fact-Checking (FEVER), Hallucination Detection (FaithDial, True-False, HaluEval), and Self-Consistency (SelfCheckGPT).

referenceThe CNN/DM (CNN/Daily Mail) dataset consists of news articles paired with multi-sentence summaries, used to evaluate a model's ability to generate summaries that accurately reflect article content while avoiding incorrect or irrelevant information.

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog 1 fact

referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.