claim
Traditional n-gram metrics like ROUGE and BLEU are insufficient for capturing the clinical validity of generated text in medical LLMs.

Authors

Sources

Referenced by nodes (2)