claim
A large-scale human study of hallucinations in extreme summarization using XSum (BBC articles) found that extrinsic hallucinations are frequent, even in gold summaries, and that textual entailment correlates best with human faithfulness and factuality compared to ROUGE, BERTScore, or QA-based metrics.

Authors

Sources

Referenced by nodes (4)