claim
ROUGE-based evaluation systematically overestimates hallucination detection performance in Question Answering tasks.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (3)
- hallucination detection concept
- Question Answering concept
- ROUGE concept