Relations (1)
related 2.58 — strongly supporting 5 facts
MedHallu is a benchmark specifically designed to evaluate and improve hallucination detection in large language models, as evidenced by its use of controlled pipelines [1], prompt templates [2], and performance experiments {fact:3, fact:4, fact:5} focused on this task.
Facts (5)
Sources
MedHallu: Benchmark for Medical LLM Hallucination Detection emergentmind.com 1 fact
claimThe MedHallu benchmark exposes current limitations in Large Language Model hallucination detection.
MedHallu - GitHub github.com 1 fact
measurementAdding a 'not sure' response option to Large Language Models improves hallucination detection precision by up to 38% in the MedHallu benchmark.
MedHallu: A Comprehensive Benchmark for Detecting Medical ... researchgate.net 1 fact
referenceThe MedHallu research paper includes prompt templates used for hallucination detection experiments in sections 2.5 and 4.4.
[Literature Review] MedHallu: A Comprehensive Benchmark for ... themoonlight.io 1 fact
claimGeneral-purpose large language models often outperform specialized medical models in hallucination detection tasks according to experiments conducted for the MedHallu benchmark.
A Comprehensive Benchmark for Detecting Medical Hallucinations ... aclanthology.org 1 fact
procedureThe MedHallu benchmark generates hallucinated answers through a controlled pipeline to create a dataset for binary hallucination detection.