hallucination ↔ training data

Relations (1)

related 3.17 — strongly supporting 8 facts

Hallucinations are considered a structural limitation of AI models that arise directly from the composition and quality of their training data, as described in [1] and [2]. Furthermore, the relationship is defined by the role of training data in both causing hallucinations through noise or lack of representation [3], [4] and serving as a primary mitigation strategy through improved curation and scaling [5], [6].

Facts (8)

Sources

Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com M. Brenndoerfer · mbrenndoerfer.com 3 facts

claimScaling up large language model size and training data simultaneously tends to reduce hallucinations regarding well-documented facts because larger models have greater capacity to memorize and recall high-frequency information.

claimHallucination in large language models is a structural issue originating from how training data is collected, how the optimization objective is constructed, the limitations of what knowledge the model can represent, and how the generation process converts probability distributions into words.

claimFor the long tail of entities and facts, increasing the volume of training data does not reduce hallucinations if the additional data contains noise levels similar to the existing training corpus.

On Hallucinations in Artificial Intelligence–Generated Content ... jnm.snmjournals.org The Journal of Nuclear Medicine 1 fact

perspectiveAI models are inherently probabilistic and rely on pattern recognition and statistical inference from training data without true understanding, making hallucinations an inevitable limitation of data-driven learning systems.

LLM Hallucinations: Causes, Consequences, Prevention - LLMs llmmodels.org llmmodels.org 1 fact

claimStrategies to mitigate hallucinations in large language models include using high-quality training data, employing contrastive learning, implementing human oversight, and utilizing uncertainty estimation.

A Knowledge-Graph Based LLM Hallucination Evaluation Framework themoonlight.io The Moonlight 1 fact

claimThe authors of the GraphEval framework focus on detecting hallucinations within a defined context rather than identifying discrepancies between LLM responses and broader training data.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv 1 fact

claimEnhancing data quality and curation is critical for reducing hallucinations in AI models because inaccuracies or inconsistencies in training data can propagate errors in model outputs.

vectara/hallucination-leaderboard - GitHub github.com Vectara 1 fact

perspectiveThe author of the Vectara hallucination-leaderboard argues that testing models by providing a list of well-known facts is a poor method for detecting hallucinations because the model's training data is unknown, the definition of 'well known' is unclear, and most hallucinations arise from rare or conflicting information rather than common knowledge.