summarization
Also known as: summarizing, summarization tasks, Summarisation
Facts (20)
Sources
Survey and analysis of hallucinations in large language models frontiersin.org Sep 29, 2025 5 facts
procedureNegative prompting involves explicitly instructing a large language model to avoid hallucination, such as by stating 'Do not include any information not present in the input text,' which can reduce fabrication in summarization and QA tasks.
referenceFabbri et al. (2022) introduced QAGS (QA-based factual consistency evaluation for summarization) as a method for improving factual consistency evaluation in summarization tasks, presented at the 2022 Conference of the North American Chapter of the Association for Computational Linguistics.
referenceCohS (Kazemi et al., 2023) and QAFactEval (Fabbri et al., 2022) are benchmarks that focus on factual consistency in summarization tasks.
claimNegative prompting prevents speculative completions in summarization tasks with high feasibility for implementation.
claimLayered hybrid mitigation pipelines show superior performance in factual question answering and summarization tasks while remaining implementable using free and open-source tools.
vectara/hallucination-leaderboard - GitHub github.com 4 facts
claimHallucination in generative AI models is not limited to summarization tasks; it is a failure to follow instructions that would likely manifest in other generative tasks, such as writing emails.
claimThe Vectara hallucination leaderboard focuses on evaluating summarization tasks rather than general 'closed book' question answering, meaning the large language models evaluated do not require memorization of human knowledge but rather a solid grasp of the supported languages.
procedureThe Vectara hallucination leaderboard explicitly filters out model responses that refuse to summarize a document or provide one-to-two word answers to prevent models from gaming the evaluation.
procedureThe Vectara hallucination leaderboard evaluation is performed only on documents for which all models provided a summary, ensuring a consistent comparison set.
The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Jan 29, 2024 3 facts
measurementHaluEval includes 5,000 general user queries with ChatGPT responses and 30,000 task-specific examples across three tasks: question answering (HaluEval QA), knowledge-grounded dialogue (HaluEval Dialogue), and summarisation (HaluEval Summarisation).
claimThe Hallucination Leaderboard includes tasks across several categories: Closed-book Open-domain QA (NQ Open, TriviaQA, TruthfulQA), Summarisation (XSum, CNN/DM), Reading Comprehension (RACE, SQuADv2), Instruction Following (MemoTrap, IFEval), Fact-Checking (FEVER), Hallucination Detection (FaithDial, True-False, HaluEval), and Self-Consistency (SelfCheckGPT).
referenceThe CNN/DM (CNN/Daily Mail) dataset consists of news articles paired with multi-sentence summaries, used to evaluate a model's ability to generate summaries that accurately reflect article content while avoiding incorrect or irrelevant information.
A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Nov 4, 2024 2 facts
referenceEncoder-decoder architectures, such as T5 or BART (Bidirectional and Auto-Regressive Transformers), use an encoder to create a context-rich representation of the input sequence, which the decoder then uses to generate an output sequence, making them flexible for tasks like translation, summarization, and question answering.
measurementOpenAI's GPT-3 model contains 175 billion parameters and is known for high-quality text generation, translation, question answering, and summarization.
A framework to assess clinical safety and hallucination rates of LLMs ... nature.com May 13, 2025 1 fact
referenceSong et al. (2020) developed a method for summarizing medical conversations by identifying important utterances, published in the Proceedings of the 28th International Conference on Computational Linguistics.
Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org 1 fact
referenceLuo et al. (2024) evaluated the factual consistency of summarization in the era of large language models in the journal Expert Systems with Applications.
Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org 1 fact
claimSummarization techniques speed up computation in Entity Resolution by dividing large blocks into sub-blocks with representatives, allowing for a constant number of comparisons per new record.
Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aug 25, 2025 1 fact
referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.
Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org Jul 9, 2024 1 fact
claimCurrent Large Language Models have a wide range of applications including question answering, code generation, text recognition, summarization, translation, and prediction.
KG-IRAG: A Knowledge Graph-Based Iterative Retrieval-Augmented ... arxiv.org Mar 18, 2025 1 fact
referenceEdge et al. (2024) published 'From local to global: A graph rag approach to query-focused summarization' in arXiv preprint arXiv:2404.16130, which introduces a graph-based retrieval-augmented generation approach for summarization.