claim
Larger language models like Med-Gemini and GPT-4 achieve the highest accuracy and F1 scores on the MEDQA benchmark but require significantly larger parameter sizes.

Authors

Sources

Referenced by nodes (2)