claim
BertScore mitigates some shortcomings of ROUGE and BLEU but does not intuitively reflect factual accuracy or the degree of hallucination in medical texts.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper