claim
The ROUGE metric is prone to extreme cases of failure, such as when punctuation differences (e.g., 'Lung.' vs 'lung') prevent a direct match, or when short responses prevent the computation of ROUGE-2 and ROUGE-L scores.

Authors

Sources

Referenced by nodes (1)