measurement
The MedHallu dataset consists of 10,000 high-quality question-answering pairs derived from PubMedQA, which include systematically generated hallucinated answers.
Authors
Sources
- MedHallu - GitHub github.com via serper
- [2502.14302] MedHallu: A Comprehensive Benchmark for Detecting ... arxiv.org via serper
- [Literature Review] MedHallu: A Comprehensive Benchmark for ... www.themoonlight.io via serper