The authors of the survey argue that mitigating hallucination is a systemic and collaborative issue, not solely a technical one, and that decentralized methods involving human feedback and community standards are essential.
Understanding whether hallucinations are caused by prompt formulation or intrinsic model behavior is essential for designing effective prompt engineering strategies, developing grounded architectures, and benchmarking Large Language Model reliability.
Wu et al. (2023) introduced 'HallucinationEval,' a unified framework designed for evaluating hallucinations in large language models.
The authors of the survey introduce an attribution framework that aims to solve the connection of prompting and model behavior to hallucinated text, noting that a single erroneous output may result from a combination of unclear prompting, model architectural biases, or training data limitations.
Large Language Model (LLM) hallucination is defined as the generation of content that may not be related to the input prompt or confirmed knowledge sources, despite the output appearing linguistically coherent.
Prompt design strongly influences hallucination rates in prompt-sensitive models such as LLaMA 2 and OpenChat.
Under the assumption of conditional independence, the analysis of hallucination events can be simplified to P(P, M|H) = P(P|H) * P(M|H), based on the work of Pearl (1988).
The authors of the survey introduce 'Prompt Sensitivity (PS)' as a concrete metric designed to systematically measure the effect of prompt changes on model hallucinations.
Hallucinations in Large Language Models are categorized into two primary sources: prompting-induced hallucinations caused by ill-structured or misleading prompts, and model-internal hallucinations caused by architecture, pretraining data distribution, or inference behavior.
Hallucination events in Large Language Models can be represented probabilistically as random events, where H denotes hallucination occurrence conditioned upon prompting strategy P and model characteristics M, expressed as P(P, M|H) = (P(H|P, M) * P(P, M)) / P(H).
Consistent hallucinations across different models suggest prompt-induced errors, while divergent hallucination patterns imply architecture-specific behaviors or training artifacts.
The paper 'Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior' was published in Frontiers in Artificial Intelligence on September 30, 2025, by authors Anh-Hoang D, Tran V, and Nguyen L-M.
Instruction-tuned models can still hallucinate, especially on long-context, ambiguous, or factual-recall tasks, as revealed by studies from OpenAI (2023a) and Bang and Madotto (2023).
Prompt engineering is a cost-effective, model-agnostic approach to reduce hallucinations at inference time without altering the underlying model parameters.
Weidinger et al. (2022) assert that the stakes of hallucination in high-risk domains such as medicine, law, and education are far higher than in open-domain tasks.
The authors of the 'Survey and analysis of hallucinations in large language models' define Prompt Sensitivity (PS) and Model Variability (MV) as metrics to quantify the contribution of prompts versus model-internal factors to hallucinations.
Hallucinations can be categorized into four attribution types based on Prompt Sensitivity (PS) and Model Variation (MV) scores: Prompt-dominant (high PS, low MV), Model-dominant (low PS, high MV), Mixed-origin (high PS, high MV), and Unclassified/noise (low PS, low MV).
Retrieval-Augmented Generation (RAG) (Lewis et al., 2020), Grounded pretraining (Zhang et al., 2023), and contrastive decoding techniques (Li et al., 2022) have been explored to counter hallucinations by integrating external knowledge sources during inference or introducing architectural changes that enforce factuality.
Intrinsic factors within model architecture, training data quality, and sampling algorithms significantly contribute to hallucination problems in large language models.
Model Variability (MV) is a metric that measures the difference in hallucination rates across different models for a fixed prompt, where high MV indicates that hallucinations are primarily model-intrinsic.
Quantifying hallucinations in large language models involves using targeted metrics such as accuracy-based evaluations on question-answering tasks, entropy-based measures of semantic coherence, and consistency checking against external knowledge bases.
Hallucinations in Large Language Models negatively impact the reliability and efficiency of AI systems in high-impact domains such as medicine (Lee et al., 2023), law (Bommarito and Katz, 2022), journalism (Andrews et al., 2023), and scientific communication (Nakano et al., 2021; Liu et al., 2023).
Hallucinations in large language models arise from both prompt-dependent factors and model-intrinsic factors, which requires the use of tailored mitigation approaches.
Larger models tend to hallucinate with 'confident nonsense', and model scaling alone does not eliminate hallucination but can amplify it in certain contexts, according to Kadavath et al. (2022).
Yao et al. (2022) proposed the integration of symbolic and neural knowledge modules to mitigate hallucinations.
Mitigation strategies for large language model hallucinations at the modeling level include Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., 2022), retrieval fusion (Lewis et al., 2020), and instruction tuning (Wang et al., 2022).
Techniques such as Reinforcement Learning with Human Feedback (RLHF) (Ouyang et al., 2022) and Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) are used to address model-level limitations regarding hallucinations.
Chain-of-Thought prompting and Instruction-based inputs are effective for mitigating hallucinations in Large Language Models but are insufficient in isolation.
Prompt tuning approaches, such as Chain-of-Thought prompting (Wei et al., 2022) and Self-Consistency decoding (Wang et al., 2022), aim to reduce hallucinations without altering the underlying model.
Lewis et al. (2020) demonstrated that integrating knowledge retrieval into generation workflows, known as Retrieval-Augmented Generation (RAG), shows promising results in hallucination mitigation.
A positive Joint Attribution Score (JAS) indicates that specific prompt-model combinations amplify hallucinations beyond what would be expected from individual prompt or model effects alone, suggesting the prompt and model jointly contribute to the error.
Attribution-based metrics, specifically PS and MV, provide a novel method for classifying and addressing the sources of hallucinations in large language models.
Bang and Madotto (2023) developed neural attribution predictors to identify whether a hallucination originates from the prompt or the model.
Zero-shot and few-shot prompting, popularized by GPT-3 (Brown et al., 2020), expose models to minimal task examples but tend to be prone to hallucination when the task is not explicitly structured.
Mitigation strategies for large language model hallucinations at the prompting level include prompt calibration, system message design, and output verification loops.
Hallucination in Large Language Models refers to outputs that appear fluent and coherent but are factually incorrect, logically inconsistent, or entirely fabricated.
Positive Joint Attribution Score (JAS) values indicate joint amplification of hallucinations by prompts and models.
Prompt Sensitivity (PS) is a metric that measures the variation in output hallucination rates under different prompt styles for a fixed model, where high PS indicates that hallucinations are primarily prompt-induced.
Self-Consistency decoding (Wang et al., 2022), ReAct prompting (Yao et al., 2022), and Instruct-tuning (Ouyang et al., 2022) reduce hallucination rates by influencing how a model organizes its internal generation paths, though these methods are heuristic and do not universally prevent hallucinations across all domains or tasks.
Structured prompt strategies, such as chain-of-thought (CoT) prompting, significantly reduce hallucinations in prompt-sensitive scenarios, although intrinsic model limitations persist in some cases.
The attribution framework categorizes hallucinations in Large Language Models into four types: prompt-dominant, model-dominant, mixed-origin, or unclassified.
Prompting methods, as researched by Wei et al. (2022), Zhou et al. (2022), and Yao et al. (2022), reduce hallucination by guiding reasoning and structure.
Li et al. (2022) proposed fine-tuning methods that incorporate retrieved factual context to reduce hallucinations.
Some hallucinations in Large Language Models persist regardless of prompting structure, indicating inherent model biases or training artifacts, as observed in the DeepSeek model.
The study uses a controlled multi-factor experiment that varies prompts systematically across models to attribute causes of hallucinations, distinguishing it from prior evaluations.
Hallucinations in Large Language Models (LLMs) are categorized into two dimensions: prompt-level issues and model-level behaviors.
Recent studies by Ji et al. (2023) and Kazemi et al. (2023) categorize hallucinations into four types: intrinsic, extrinsic, factual, and logical.
Chain-of-Thought prompting can backfire by making hallucinations more elaborate if a model fundamentally lacks knowledge on a query, as the model may rationalize a falsehood in detail.
Mitigation strategies for hallucinations in large language models are categorized into two types: prompt-based interventions and model-based architectural or training improvements.
Zhang et al. (2023) found that grounded language model training reduces the occurrence of hallucinations.
HallucinationEval (Wu et al., 2023) provides a framework for measuring different types of hallucinations in large language models.
Hallucinations in Large Language Models create risks for misinformation, reduced user trust, and accountability gaps (Bommasani et al., 2021; Weidinger et al., 2022).
RealToxicityPrompts (Gehman et al., 2020) is a benchmark used to investigate how large language models hallucinate toxic or inappropriate content.
Hallucination in large language models is linked to pretraining biases and architectural limits, according to research by Kadavath et al. (2022), Bang and Madotto (2023), and Chen et al. (2023).
Mitigation of hallucinations in Large Language Models requires multi-layered, attribution-aware pipelines, as no single approach can entirely eliminate the phenomenon.
The authors of the paper 'Survey and analysis of hallucinations in large language models' conducted controlled experiments using open-source models and standardized prompts to classify hallucination origins as prompt-dominant, model-dominant, or mixed.
Grounded pretraining reduces hallucination during generation in large language models, though it requires significant data and compute resources.
Least-to-Most prompting (Zhou et al., 2022) mitigates hallucination in multi-hop reasoning tasks by decomposing complex queries into simpler steps.
Hallucinations in Large Language Models occur when the probabilistic model incorrectly favors a hallucinatory output (yhalluc) over a factually correct response (yfact), representing a mismatch between the model's internal probability distributions and real-world factual distributions.
There is currently no widely acceptable metric or dataset that fully captures the multidimensional nature of hallucinations in Large Language Models.
The authors of the survey claim their work is the first to formalize a probabilistic attribution model for hallucinations, noting that prior surveys by Ji et al. (2023) and Chen et al. (2023) categorized causes generally but did not propose an attribution methodology.
If a hallucinated answer disappears when a question is asked more explicitly or via Chain-of-Thought, the cause is likely prompt-related; if the hallucination persists across all prompt variants, the cause likely lies in the model's internal behavior.
The authors propose the Joint Attribution Score (JAS) metric to quantify prompt-model interaction effects in LLM hallucinations, defined as JAS = Cov(P, M) / (σP * σM), where σP and σM are the standard deviations of hallucination rates across all prompts and all models, respectively.