concept

Hallucination Evaluation Model

Also known as: hallucination evaluation

Facts (10)

Sources
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org arXiv Feb 23, 2026 4 facts
claimThe calibration dataset for hallucination evaluation captures the entity ID, entity statistics, entity type, question relation types, relation scores, and the overall question score for each question.
procedureThe supplementary context provided in the question template for hallucination evaluation varies by entity type: for humans, it is the lifespan (e.g., 1934-2023 or 1968-), and for all other entity types, it is the entity type itself.
referenceThe paper 'DefAn: definitive answer dataset for llms hallucination evaluation' is a cited reference regarding hallucination evaluation datasets.
measurementThe authors created a calibration dataset for hallucination evaluation by evaluating 34 models (ranging from 1B to 685B parameters) across 10 runs of 150 questions, generating 51,000 data points.
Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org arXiv Jun 14, 2024 2 facts
referenceMed-HallMark is a benchmark designed for hallucination detection and evaluation within the medical multimodal domain, providing multi-tasking hallucination support, multifaceted hallucination data, and hierarchical hallucination categorization.
claimExisting hallucination evaluation methods like CHAIR and POPE are limited to object hallucinations in general domains and cannot accommodate the multi-layered complexities of hallucinations in the medical field.
On Hallucinations in Artificial Intelligence–Generated Content ... jnm.snmjournals.org The Journal of Nuclear Medicine 1 fact
claimEffective detection and evaluation of hallucinations in artificial intelligence–generated content for nuclear medicine imaging require multifaceted frameworks, including image-based, dataset-based, and clinical task–based metrics, as well as automated detectors trained on hallucination-annotated datasets.
Survey and analysis of hallucinations in large language models frontiersin.org Frontiers Sep 29, 2025 1 fact
referenceResearch directions for hallucination evaluation include the development of integrated, multi-task, multilingual benchmarks with unified annotation schemas (Liu et al., 2023) and the use of attribution-aware metrics incorporating Prompt Sensitivity (PS) and Model Variability (MV).
vectara/hallucination-leaderboard - GitHub github.com Vectara 1 fact
claimA hallucination evaluation model can serve as a valid proxy for human judges provided that the evaluation model is highly correlated with human raters' judgments.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact
referenceThe Hallucination Evaluation Model is a resource available on HuggingFace for hallucination detection.