claim
Research by Honovich et al. (2022) and Kang et al. (2024) indicates that the ROUGE evaluation metric is poorly aligned with human judgments of factual correctness in AI systems.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- artificial intelligence concept
- factual correctness concept
- ROUGE concept