claim
The ROUGE evaluation metric fails to recognize semantic equivalence between different phrasings, such as 'elevation' and 'relief' in the context of topographic maps, leading to lower scores due to lexical mismatch.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (1)
- ROUGE concept