claim
Automated metrics like ROUGE, BLEU, and BERT-score, which are designed to compare model-generated text with expert-written examples, have significant limitations in healthcare because they focus on surface-level textual similarity rather than semantic nuances, contextual dependencies, and domain-specific knowledge.

Authors

Sources

Referenced by nodes (3)