Fact — reference — Knowledge Tree

Table 1 in the paper 'Bridging the Gap Between LLMs and Evolving Medical Knowledge' compares state-of-the-art language models on the MEDQA benchmark, showing that Med-Gemini (1800B) achieved 91.1% accuracy, GPT-4 (1760B) achieved 90.2% accuracy, Med-PaLM 2 (340B) achieved 85.4% accuracy, AMG-RAG (8B) achieved 73.9% accuracy, and BioMedGPT (10B) achieved 50.4% accuracy.

Authors

Person: Not available Organization: arXiv
Bridging the Gap Between LLMs and Evolving Medical Knowledge

Sources

Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org arXiv via serper

Referenced by nodes (3)

GPT-4 concept
AMG-RAG concept
MEDQA concept