concept

medical hallucination

Also known as: medical hallucinations, medical hallucination detection

Facts (57)

Sources

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 26 facts

claimMedical hallucination is defined as any instance in which a foundation model generates misleading medical content.

referenceAgarwal et al. (2024) propose a framework for classifying medical hallucinations that emphasizes the severity of errors and their root causes.

claimMedical hallucinations in Large Language Models (LLMs) pose serious risks because incorrect dosages, drug interactions, or diagnostic criteria can lead to life-threatening outcomes.

claimThe detectability of medical hallucinations depends on the domain expertise of the audience and the quality of the prompting provided to the model; domain experts are more likely to identify subtle inaccuracies than non-experts, according to Asgari et al. (2024) and Liu et al. (2024).

claimThe impact of medical hallucinations is severe because errors in clinical reasoning or misleading treatment recommendations can directly harm patients by delaying proper care or leading to inappropriate interventions, as documented by Miles-Jay et al. (2023), Xia et al. (2024), and Mehta and Devarakonda (2018).

claimSurvey respondents identified limitations in training data and model architectures as key factors contributing to medical hallucinations in AI/LLM tools.

claimThe authors of 'Medical Hallucination in Foundation Models and Their ...' contributed a taxonomy for understanding and addressing medical hallucinations, benchmarked models using a medical hallucination dataset and physician-annotated LLM responses to real medical cases, and conducted a multi-national clinician survey on experiences with medical hallucinations.

claimMedical hallucinations frequently use domain-specific terms and appear to present coherent logic, making them difficult to recognize without expert scrutiny, as reported by Asgari et al. (2024) and Liu et al. (2024).

claimMedical hallucinations in LLMs can manifest as incorrect diagnoses, the use of confusing or inappropriate medical terminology, or the presentation of contradictory findings within a patient’s case.

claimMedical hallucinations in LLMs manifest across various clinical tasks, including symptom diagnosis, patient management, and the interpretation of lab results and visual data.

claimThe authors define medical hallucination in Foundation Models as a distinct concept from general hallucinations, characterized by unique risks within the healthcare domain.

claimMedical hallucinations arise within specialized tasks such as diagnostic reasoning, therapeutic planning, or interpretation of laboratory findings, where inaccuracies have immediate implications for patient care, according to Xu et al. (2024b), Miles-Jay et al. (2023), and Xia et al. (2024).

claimMedical hallucinations are defined as factually incorrect yet plausible outputs with medical relevance generated by AI/LLM tools.

claimLarge Language Models (LLMs) exhibit systematic errors known as medical hallucinations, where the models generate incorrect or misleading medical information that can adversely affect clinical decision-making and patient outcomes.

procedureThe authors conducted a survey aimed at individuals in the medical, research, and analytical fields to investigate perceptions and experiences regarding the use of AI and LLM tools, specifically concerning medical hallucinations.

claimThe causes of medical hallucinations in Foundation Models are driven by data quality, model limitations, and healthcare domain complexities.

claimThe primary challenge in medical annotation of LLM outputs is the nuanced distinction between bona fide medical hallucinations and less critical errors, such as temporal discrepancies in patient timelines.

claimThe Med-HALT dataset is a publicly available resource for studying medical hallucinations in AI models.

referenceAhmad et al. (2023) propose a framework for classifying medical hallucinations that focuses on preserving clinician trust by identifying the types of misinformation that most erode confidence.

claimThe authors introduce a taxonomy for medical hallucination in Large Language Models to provide a structured framework for categorizing AI-generated medical misinformation.

accountThe authors of 'Medical Hallucination in Foundation Models and Their ...' conducted a clinician survey to understand healthcare professionals' perceptions and experiences regarding AI/LLM adoption and the challenges of medical hallucinations in practice.

claimMedical hallucinations can undermine patient safety and erode trust in AI-assisted clinical systems, as noted by Miles-Jay et al. (2023), Xia et al. (2024), Mehta and Devarakonda (2018), Asgari et al. (2024), Liu et al. (2024), and Pal et al. (2023).

claimMedical hallucinations in Foundation Models are categorized into a taxonomy ranging from factual inaccuracies to complex reasoning errors.

claimDetection and mitigation strategies for medical hallucinations in Foundation Models include factual verification, consistency checks, uncertainty quantification, and prompt engineering.

claimThe taxonomy of medical hallucinations in foundation models clusters errors into five main categories: factual errors, outdated references, spurious correlations, incomplete chains of reasoning, and fabricated sources or guidelines.

claimMedical hallucinations are more challenging to detect than general-purpose hallucinations because the language used often appears clinically valid while containing critical inaccuracies, as noted by Singhal et al. (2022) and Mohammadi et al. (2023).

Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org medRxiv Nov 2, 2025 25 facts

claimMedical hallucinations in large language models manifest across various clinical tasks, including symptom diagnosis, patient management, the interpretation of lab results, and the interpretation of visual data.

claimThe authors of the study define medical hallucination as a reasoning-driven failure mode of foundation models that is distinct from general hallucinations in both its origin and clinical consequence.

claimMedical hallucinations in LLMs pose serious risks because incorrect medical information, such as dosages, drug interactions, or diagnostic criteria, can lead to life-threatening outcomes.

claimMedical-purpose LLMs are specifically adapted or trained for medical and biomedical tasks to evaluate whether domain-specific training or fine-tuning effectively mitigates medical hallucinations compared to general-purpose models.

claimThe authors explored structured prompting and reasoning scaffolds as mitigation strategies to assess their ability to reduce medical hallucination rates in LLMs.

claimSurvey respondents in the study 'Medical Hallucination in Foundation Models and Their Impact on ...' identified limitations in training data and model architectures as key factors contributing to medical hallucinations.

measurementThe survey conducted by the authors regarding AI/LLM tools and medical hallucinations spanned a 94-day period.

claimThe authors of the study 'Medical Hallucination in Foundation Models and Their Impact on ...' define medical hallucination as any model-generated output that is factually incorrect, logically inconsistent, or unsupported by authoritative clinical evidence in ways that could alter clinical decisions.

perspectiveThe authors of the study 'Medical Hallucination in Foundation Models and Their Impact on ...' argue that medical hallucination is a reasoning-driven failure mode rather than a knowledge deficit, and that safety emerges from sophisticated reasoning capabilities and broad knowledge integration rather than narrow optimization.

claimMedical hallucinations frequently occur in specialized tasks such as diagnostic reasoning, therapeutic planning, or the interpretation of laboratory findings, where inaccuracies have immediate implications for patient care.

claimThe study's empirical evaluation, utilizing a physician-audited benchmark, indicates that most medical hallucinations in foundation models stem from failures in causal and temporal reasoning rather than missing medical knowledge.

claimMedical hallucinations differ from general-purpose hallucinations because they often use domain-specific terminology and appear clinically valid, making them difficult to detect without expert scrutiny.

claimThe taxonomy presented in the study clusters medical hallucinations into five categories: factual errors, outdated references, spurious correlations, incomplete chains of reasoning, and fabricated sources or guidelines.

claimThe primary challenge in evaluating medical hallucinations is the nuanced distinction between clinically significant factual inaccuracies, such as the omission of a major surgery, and less critical errors, such as stylistic or minor factual deviations.

measurementStructured prompting and retrieval-augmented generation can reduce medical hallucinations in foundation models by over 10%, according to the study's empirical evaluation.

measurementIn an evaluation of 11 foundation models (7 general-purpose, 4 medical-specialized) across seven medical hallucination tasks, general-purpose models achieved a median of 76.6% hallucination-free responses, while medical-specialized models achieved a median of 51.3%.

measurementThe authors conducted a survey of 75 professionals, primarily holding MD and/or PhD degrees, to investigate perceptions and experiences regarding AI/LLM tools and medical hallucinations.

claimThe impact of medical hallucinations is more severe than general hallucinations because errors in clinical reasoning or misleading treatment recommendations can directly harm patients by delaying proper care or leading to inappropriate interventions.

claimSummary consistency methods are used to detect medical hallucinations, which occur when important clinical details are omitted, distorted, or fabricated during the summarization process.

claimMedical hallucinations in foundation models manifest as misordered symptom progression, flawed diagnostic logic, or misplaced causal inference, and these errors persist even in large-scale models.

claimThe authors propose a systematic framework for evaluating medical hallucinations in LLMs that aligns with the taxonomy presented in Table 2 of the paper.

claimA substantial proportion of survey respondents reported encountering medical hallucinations—defined as factually incorrect yet plausible outputs with medical relevance—in critical tasks such as literature reviews and clinical decision-making.

measurementA global survey of 70 clinicians across 15 specialties found that 91.8% had encountered medical hallucinations, and 84.7% considered them capable of causing patient harm.

claimMedical hallucinations in large language models are exacerbated by the complexity and specificity of medical knowledge, where subtle differences in terminology or reasoning can lead to significant misunderstandings.

claimThe authors define 'medical hallucinations' as instances where an LLM produces incorrect, misleading, or unsupported medical information that could influence clinical judgment or patient outcomes.

A Comprehensive Benchmark for Detecting Medical Hallucinations ... researchgate.net ResearchGate 2 facts

claimMedHallu is the first benchmark specifically designed for medical hallucination detection in large language models.

measurementMedHallu comprises 10,000 samples for evaluating medical hallucination detection.

A Comprehensive Benchmark for Detecting Medical Hallucinations ... aclanthology.org Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding · ACL Anthology 2 facts

claimMedHallu is a benchmark designed for detecting medical hallucinations in large language models, consisting of 10,000 high-quality question-answer pairs derived from PubMedQA.

claimIncorporating domain-specific knowledge and introducing a 'not sure' category as an answer option improves precision and F1 scores by up to 38% relative to baselines in medical hallucination detection.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact

referenceThe MedHallu benchmark, derived from PubMedQA, contains 10,000 question-answer pairs with deliberately planted plausible hallucinations to evaluate medical hallucination detection.

[PDF] MedHallu: A Comprehensive Benchmark for Detecting Medical ... aclanthology.org ACL Anthology Nov 4, 2025 1 fact

claimMedHallu integrates a fine-grained categorization system for medical hallucination types.