Relations (1)

related 2.58 — strongly supporting 5 facts

Med-HALT is a benchmark specifically designed to evaluate and mitigate hallucinations in Large Language Models [1], [2], and [3]. Researchers utilize this framework to assess the inherent reasoning and memory-related inaccuracies of these models through zero-shot querying and specific task sampling [4], [5].

Facts (5)

Sources
Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv 3 facts
claimMed-HALT is a framework designed to evaluate the multifaceted nature of medical hallucinations in Large Language Models by assessing both reasoning and memory-related inaccuracies.
procedureThe 'Base' method for evaluating Large Language Models involves querying the models directly with questions from the Med-HALT benchmark without additional context or instructions to assess inherent hallucination tendencies in a zero-shot setting.
referenceThe Med-HALT benchmark (Pal et al., 2023) is used to evaluate the effectiveness of various hallucination mitigation techniques on Large Language Models.
Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org medRxiv 1 fact
procedureThe authors evaluated the effectiveness of hallucination mitigation techniques on Large Language Models using the Med-HALT benchmark by sampling 50 examples from each of seven medical reasoning tasks, totaling 350 cases.
A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature 1 fact
referenceMed-HALT is a medical domain hallucination test designed for large language models, introduced by Pal, Umapathi, and Sankarasubbu in 2023.