perspective
The authors of 'Re-evaluating Hallucination Detection in LLMs' argue that ROUGE is a poor proxy for human judgment in evaluating hallucination detection because its design for lexical overlap does not inherently capture factual correctness.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- factual correctness concept
- hallucination detection concept
- ROUGE concept