claim
Many hallucination detection methods for Large Language Models rely on ROUGE for evaluation, despite ROUGE being a metric based on lexical overlap that misaligns with the objective of detecting hallucinations.
Referenced by nodes (2)
- hallucination detection concept
- ROUGE concept