claim
The MedHALT benchmark is limited to assessing the reasoning capabilities of Large Language Models over the medical domain in a Question Answering (QA) format.
Authors
Sources
- A framework to assess clinical safety and hallucination rates of LLMs ... www.nature.com via serper
Referenced by nodes (2)
- Question Answering concept
- Med-HALT concept