claim
A large-scale human study of hallucinations in extreme summarization using XSum (BBC articles) found that extrinsic hallucinations are frequent, even in gold summaries, and that textual entailment correlates best with human faithfulness and factuality compared to ROUGE, BERTScore, or QA-based metrics.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (4)
- ROUGE concept
- BERTScore concept
- extrinsic hallucination concept
- BBC entity