measurement
The highest-performing models, including gemini-2.0-thinking, gemini-2.0, and deepseek-r1, cluster in the high similarity score range of 0.7-0.9, indicating a strong semantic alignment of their outputs with ground truth medical information.

Authors

Sources

Referenced by nodes (2)