Relations (1)
Facts (3)
Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org 1 fact
claimThe authors of the paper 'Re-evaluating Hallucination Detection in LLMs' demonstrate that prevailing overlap-based metrics systematically overestimate hallucination detection performance in Question Answering tasks, which leads to illusory progress in the field.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 1 fact
claimROUGE-based evaluation systematically overestimates hallucination detection performance in Question Answering tasks.
Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com 1 fact
referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.