claim
The ROUGE metric suffers from critical failure modes that undermine its utility for hallucination detection, specifically sensitivity to response length, an inability to handle semantic equivalence, and susceptibility to false lexical matches.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- hallucination detection concept
- ROUGE concept