claim
Larger language models like Med-Gemini and GPT-4 achieve the highest accuracy and F1 scores on the MEDQA benchmark but require significantly larger parameter sizes.
Authors
Sources
- Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org via serper