Relations (1)

related 2.32 — strongly supporting 4 facts

GPT-4 is used as a benchmark for performance comparison against AMG-RAG in [1] and [2], and it serves as an expert validator for the medical knowledge graph utilized by the AMG-RAG system as described in [3] and [4].

Facts (4)

Sources
Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org arXiv 4 facts
referenceTable 1 in the paper 'Bridging the Gap Between LLMs and Evolving Medical Knowledge' compares state-of-the-art language models on the MEDQA benchmark, showing that Med-Gemini (1800B) achieved 91.1% accuracy, GPT-4 (1760B) achieved 90.2% accuracy, Med-PaLM 2 (340B) achieved 85.4% accuracy, AMG-RAG (8B) achieved 73.9% accuracy, and BioMedGPT (10B) achieved 50.4% accuracy.
claimAMG-RAG, which has 8B parameters, delivers competitive results compared to much larger models like Med-Gemini (1800B) and GPT-4 (1760B).
claimClinical experts and expert LLMs like GPT-4 validated the correctness of the Medical Knowledge Graph used in the AMG-RAG system.
measurementExpert LLMs like GPT-4 achieved an accuracy of 9/10 when validating knowledge extracted for the AMG-RAG Medical Knowledge Graph.