claim
The ROUGE metric is prone to extreme cases of failure, such as when punctuation differences (e.g., 'Lung.' vs 'lung') prevent a direct match, or when short responses prevent the computation of ROUGE-2 and ROUGE-L scores.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (1)
- ROUGE concept