factuality
Also known as: factualness
Facts (21)
Sources
Awesome-Hallucination-Detection-and-Mitigation - GitHub github.com 5 facts
referenceThe paper "Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity" by Wang et al. (2023) provides a survey on the state of factuality in large language models, covering aspects of knowledge, retrieval, and domain-specificity.
referenceThe paper "Fine-tuning Language Models for Factuality" by Tian et al. (2023) discusses fine-tuning strategies specifically aimed at improving the factuality of language models.
referenceThe paper 'Integrative Decoding: Improving Factuality via Implicit Self-consistency' by Cheng et al. (2025) introduces an integrative decoding method using implicit self-consistency.
referenceThe paper "Factuality Enhanced Language Models for Open-Ended Text Generation" by Lee et al. (2022) proposes techniques to enhance the factuality of language models during open-ended text generation tasks.
referenceThe paper 'Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation' by Chang et al. (2025) proposes a monitoring method for partial responses.
Survey and analysis of hallucinations in large language models frontiersin.org Sep 29, 2025 4 facts
claimMistral shows balanced behavior across the dimensions of Factuality, Coherence, Prompt Sensitivity, Model Variability, and Usability, indicating a mixed attribution of hallucination sources.
claimThe radar plot in Figure 4 of the study 'Survey and analysis of hallucinations in large language models' visualizes the comparative performance of DeepSeek, Mistral, and LLaMA 2 across five behavioral dimensions: Factuality, Coherence, Prompt Sensitivity, Model Variability, and Usability.
claimFactuality, as a metric for language model evaluation, reflects the ability of a model to generate responses that are factually accurate and aligned with reference ground truth.
referenceMaynez et al. (2020) investigated faithfulness and factuality in abstractive summarization at the 58th Annual Meeting of the Association for Computational Linguistics.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 3 facts
referenceA metric for evaluating LLM factuality is the percentage of examples for which the model assigns the highest probability to the factual completion.
claimThe EdinburghNLP awesome-hallucination-detection repository provides a taxonomy of error types for AI systems, including comprehension, factualness, specificity, and inference.
referenceEvaluation metrics for AI systems include counts of correct and wrong answers, as well as failure counts categorized by comprehension, factualness, specificity, and inference.
LLM Hallucination Detection and Mitigation: State of the Art in 2026 zylos.ai Jan 27, 2026 2 facts
claimThe taxonomy of hallucination detection distinguishes between factuality, which is absolute correctness against real-world truth, and faithfulness, which is adherence to provided input or context.
referenceThe paper 'Integrative Decoding: Improve Factuality via Self-consistency,' published on arXiv, details an approach to improving the factuality of large language model outputs through self-consistency mechanisms.
Pascale Fung's Post - LLM Hallucination Benchmark linkedin.com 11 months ago 1 fact
claimThe HalluLens benchmark separates the evaluation of LLM hallucination from the evaluation of factuality to avoid conflating the two concepts.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 1 fact
claimReinforcement learning from knowledge feedback (RLKF) achieves superior factuality in AI models compared to decoding strategies or supervised fine-tuning.
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org 1 fact
claimZhang et al. (2023) identified reliability in LLMs by examining tendencies regarding hallucination, truthfulness, factuality, honesty, calibration, robustness, and interpretability.
Unknown source 1 fact
claimThe response verification framework described in the paper 'A Knowledge Graph-Based Hallucination Benchmark for Evaluating...' assesses the factuality of long-form text by identifying hallucinations in the output of Large Language Models.
How Datadog solved hallucinations in LLM apps - LinkedIn linkedin.com Oct 1, 2025 1 fact
claimDatadog's LLM-as-a-Judge feature allows users to create custom LLM-based evaluations to measure qualitative performance metrics such as helpfulness, factuality, and tone on LLM Observability production traces.
LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com Feb 10, 2026 1 fact
claimLLM monitoring systems can derive hallucination or correctness scores using automated evaluation pipelines, such as cross-checking model answers against a knowledge base or using an LLM-as-a-judge to score factuality.
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org Feb 23, 2026 1 fact
referenceThe paper 'Evaluating the factuality of large language models using large-scale knowledge graphs' is a cited reference regarding the evaluation of large language model factuality.