emissions ↔ hallucination

Relations (1)

related 4.52 — strongly supporting 22 facts

Hallucination and emissions are related as paired error types evaluated together in LLM clinical note generation experiments, with shared measurements showing their co-occurrence and comparative reductions, such as 1 major hallucination and 10 major omissions in Experiment 8 [1] and 75% reduction in major hallucinations alongside 58% in major omissions from Experiment 3 to 8 [2]. The study defines hallucinations as unsupported text and omissions (emissions) as missed details [3], and claims both may be intrinsic LLM properties [4].

Facts (22)

Sources

A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature 22 facts

claimThe researchers determined that the changes tested in Experiment 5 were not suitable for clinical safety evaluation because the resulting increase in hallucinations and omissions was too large to be considered useful.

claimThe researchers built CREOLA, an in-house platform designed to enable clinicians to identify and label relevant hallucinations and omissions in clinical text to inform future experiments and implement the researchers' framework at scale.

procedureExperiment 15 evaluated the mitigation of errors in 'Bad SOAP' notes, which contained hallucinations and omissions, by applying the revised generation process from Experiment 14.

procedureThe study compared clinician-created notes with LLM-generated notes by using a framework to identify hallucinations and omissions in both sets of notes.

claimExperiment 16 introduced a template-driven method for generating customized clinical outputs, but comparison with baseline results from Experiment 8 showed an increase in major hallucinations and minor omissions.

measurementIn the study on LLM clinical note generation, comparing Experiment 5 to Experiment 3 (which used structured prompts) resulted in an increase in major hallucinations from 4 to 25, minor hallucinations from 5 to 29, major omissions from 24 to 47, and minor omissions from 114 to 188.

procedureThe researchers classified clinical risk from major hallucinations and omissions using a framework inspired by protocols in medical device certifications.

measurementIn the study's experiments, omissions occurred at a rate of 3.45%, while hallucinations occurred at a rate of 1.47%.

claimThe study defines hallucinations as instances of text unsupported by associated clinical documentation and omissions as instances where relevant details are missed in the supporting evidence.

measurementIn the clinical summarization task, Experiment 8 resulted in 1 major hallucination and 10 major omissions, while Experiment 11 resulted in 2 major hallucinations and 0 major omissions over 25 notes.

accountThe researchers identified Experiments 8 and 11 as the best-performing experiments for LLM clinical note generation, having the fewest hallucinations and omissions, and subsequently analyzed them to determine the types of hallucinations produced and their typical sentence positions.

claimThe authors propose a multi-component framework that combines the assessment of hallucinations and omissions with an evaluation of their impact on clinical safety to serve as a governance and clinical safety assessment template for organizations.

claimThe framework developed by the researchers quantifies the clinical impact and implications of LLM omissions and hallucinations, which is a necessary step to meaningfully address clinical safety.

claimExperiment 17 compared clinician-written notes against LLM-generated notes, finding that clinician-written notes contained slightly more hallucinations but fewer omissions than LLM-generated summaries.

measurementChanging the prompt from Experiment 3 to Experiment 8 reduced the incidence of major hallucinations by 75% (from 4 to 1), major omissions by 58% (from 24 to 10), and minor omissions by 35% (from 114 to 74).

measurementThe study defines a percentage-based metric for the likelihood of hallucinations and omissions: 'Very High' likelihood represents error rates >90%, 'Very Low' likelihood represents error rates <1%, and 'Medium' likelihood covers a range of 10–60% to account for output variability and unpredictability.

procedureThe annotation process for LLM outputs involves tasking volunteer doctors to classify sub-sections of output for hallucinations or omissions based on a specific taxonomy and providing free-text explanations for their classifications.

measurementIn the study on LLM clinical note generation, iterative prompt improvements (Experiments 6 to 11) eliminated major omissions (decreasing from 61 to 0), reduced minor omissions by 58% (from 130 to 54), and lowered the total number of hallucinations by 25% (from 4 to 3).

claimThe study on LLM clinical note generation supports the theory that hallucinations and omissions may be intrinsic theoretical properties of current Large Language Models.

claimModifying the prompt from the baseline used in Experiment 1 to include a style update used in Experiment 8 resulted in a reduction of both major and minor omissions, though it caused a slight increase in minor hallucinations.

claimIn Experiment 5, incorporating a chain-of-thought prompt to extract facts from the transcript (atomisation) before generating the clinical note led to an increase in major hallucinations and omissions.

measurementHallucinations were classified as 'major' errors 44% of the time, whereas omissions were classified as 'major' errors 16.7% of the time.