Relations (1)

related 2.58 — strongly supporting 5 facts

MedHallu is a benchmark specifically designed to evaluate and improve hallucination detection in large language models, as evidenced by its use of controlled pipelines [1], prompt templates [2], and performance experiments {fact:3, fact:4, fact:5} focused on this task.

Facts (5)

Sources
MedHallu: Benchmark for Medical LLM Hallucination Detection emergentmind.com Emergent Mind 1 fact
claimThe MedHallu benchmark exposes current limitations in Large Language Model hallucination detection.
MedHallu - GitHub github.com GitHub 1 fact
measurementAdding a 'not sure' response option to Large Language Models improves hallucination detection precision by up to 38% in the MedHallu benchmark.
MedHallu: A Comprehensive Benchmark for Detecting Medical ... researchgate.net ResearchGate 1 fact
referenceThe MedHallu research paper includes prompt templates used for hallucination detection experiments in sections 2.5 and 4.4.
[Literature Review] MedHallu: A Comprehensive Benchmark for ... themoonlight.io The Moonlight 1 fact
claimGeneral-purpose large language models often outperform specialized medical models in hallucination detection tasks according to experiments conducted for the MedHallu benchmark.
A Comprehensive Benchmark for Detecting Medical Hallucinations ... aclanthology.org Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding · ACL Anthology 1 fact
procedureThe MedHallu benchmark generates hallucinated answers through a controlled pipeline to create a dataset for binary hallucination detection.