claim
Research by Honovich et al. (2022) and Kang et al. (2024) indicates that the ROUGE evaluation metric is poorly aligned with human judgments of factual correctness in AI systems.

Authors

Sources

Referenced by nodes (3)