claim
Reference-based metrics like ROUGE show a clear misalignment with human judgments when identifying hallucinations in Question Answering tasks, as they consistently reward fluent yet factually incorrect responses.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- hallucination detection concept
- ROUGE concept
- Question Answering concept