claim
The BLEU metric accounts for significant length differences between generated text and ground truth, making it more versatile than ROUGE, but it remains a weak measure of factual correctness.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper