Fact — claim — Knowledge Tree

The MedHallu benchmark evaluates the effectiveness of general-purpose large language models, such as GPT-4o, Qwen, and Gemma, alongside medically fine-tuned models in detecting hallucinations.

Authors

Person: Not available Organization: The Moonlight
[Literature Review] MedHallu: A Comprehensive Benchmark for ...

Sources

[Literature Review] MedHallu: A Comprehensive Benchmark for ... www.themoonlight.io The Moonlight via serper

Referenced by nodes (3)

Large Language Models concept
GPT-4 concept
MedHallu concept