claim
Many hallucination detection methods for Large Language Models rely on ROUGE for evaluation, despite ROUGE being a metric based on lexical overlap that misaligns with the objective of detecting hallucinations.

Referenced by nodes (2)