measurement
The highest-performing models, including gemini-2.0-thinking, gemini-2.0, and deepseek-r1, cluster in the high similarity score range of 0.7-0.9, indicating a strong semantic alignment of their outputs with ground truth medical information.
Authors
Sources
- Medical Hallucination in Foundation Models and Their ... www.medrxiv.org via serper
Referenced by nodes (2)
- Gemini concept
- DeepSeek-R1 concept