Relations (1)
Facts (4)
Sources
Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org 4 facts
referenceTable 1 in the paper 'Bridging the Gap Between LLMs and Evolving Medical Knowledge' compares state-of-the-art language models on the MEDQA benchmark, showing that Med-Gemini (1800B) achieved 91.1% accuracy, GPT-4 (1760B) achieved 90.2% accuracy, Med-PaLM 2 (340B) achieved 85.4% accuracy, AMG-RAG (8B) achieved 73.9% accuracy, and BioMedGPT (10B) achieved 50.4% accuracy.
claimAMG-RAG, which has 8B parameters, delivers competitive results compared to much larger models like Med-Gemini (1800B) and GPT-4 (1760B).
claimClinical experts and expert LLMs like GPT-4 validated the correctness of the Medical Knowledge Graph used in the AMG-RAG system.
measurementExpert LLMs like GPT-4 achieved an accuracy of 9/10 when validating knowledge extracted for the AMG-RAG Medical Knowledge Graph.