SummaC
Facts (10)
Sources
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 6 facts
referenceHallucination Risk Metrics (HaRiM+), SummaC, SummaCzs, SummaCconv, and Hallucination Risk Ratio (HRR) are used as evaluation metrics for the QReCC and XLSum datasets.
referenceThe SummaC benchmark utilizes balanced accuracy as a metric for evaluating factual consistency.
referenceThe SummaC (Summary Consistency) benchmark for inconsistency detection consists of six datasets: CoGenSumm, XSumFaith, Polytope, FactCC, SummEval, and FRANK.
referenceSummac is a code and model repository for hallucination detection.
referenceSummaC is a collection of benchmarks used for binary factual consistency evaluation.
referenceSCALE is a metric proposed for hallucination detection that is compared against Q², ANLI, SummaC, F1, BLEURT, QuestEval, BARTScore, and BERTScore.
vectara/hallucination-leaderboard - GitHub github.com 2 facts
referenceKey academic papers regarding factual consistency in summarization include: SUMMAC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization; TRUE: Re-evaluating Factual Consistency Evaluation; TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models; ALIGNSCORE: Evaluating Factual Consistency with A Unified Alignment Function; MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents; TOFUEVAL: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization; RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models; and FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs.
referenceThe SummaC and True papers are cited as relevant resources for hallucination detection in the Vectara hallucination-leaderboard GitHub repository.
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org Aug 13, 2025 2 facts
referenceLaban et al. (2022) introduced 'SummaC', a model that revisits Natural Language Inference (NLI) based models for detecting inconsistency in summarization tasks.
referenceThe study evaluated several alternative metrics for text evaluation, including BERTScore (Zhang et al., 2020), BLEU (Papineni et al., 2002), SummaC (Laban et al., 2022), and UniEval-fact (Zhong et al., 2022), benchmarking them against LLM-as-Judge labels.