Fact — claim — Knowledge Tree

Traditional n-gram metrics like ROUGE and BLEU are insufficient for capturing the clinical validity of generated text in medical LLMs.

Authors

Person: Not available Organization: arXiv
A Comprehensive Benchmark and Evaluation Framework for Multi ...

Sources

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv via serper

Referenced by nodes (2)

ROUGE concept
BLEU concept