Fact — reference — Knowledge Tree

The HalluQA benchmark evaluates AI models using human annotations of intrinsic and extrinsic hallucinated spans and factuality, alongside metrics such as ROUGE-1/2/L, BERTScore, textual entailment, QA-based consistency, and Spearman correlation with human scores.

Authors

Person: Not available Organization: GitHub
EdinburghNLP/awesome-hallucination-detection - GitHub

Sources

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub via serper

Referenced by nodes (1)

BERTScore concept