claim
ROUGE, a metric based on lexical overlap, exhibits high recall but extremely low precision when used for hallucination detection, leading to misleading performance estimates.
Authors
Sources
- The Illusion of Progress: Re-evaluating Hallucination Detection in ... arxiv.org via serper
Referenced by nodes (2)
- hallucination detection concept
- ROUGE concept