concept

AMG-RAG

Also known as: Agentic Medical Graph-RAG

Facts (53)

Sources

Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org arXiv Jun 29, 2025 53 facts

accountIn the context of managing acute coronary syndrome (ACS), the AMG-RAG model synthesizes evidence-based knowledge to justify the selection of Clopidogrel as the correct answer while explaining why alternatives like Nifedipine, Enoxaparin, Spironolactone, and Propranolol are not suitable.

measurementThe AMG-RAG model achieves 95% accuracy in generating interpretable responses for hypertension management using Propranolol, which is rated 9.5/10 for direct relevance in cardiovascular treatment protocols.

measurementAMG-RAG achieves an F1 score of 74.1% on the MEDQA benchmark and an accuracy of 66.34% on the MEDMCQA benchmark.

measurementThe AMG-RAG system configured with the PubMed-MKG and an 8B LLM backbone achieves an accuracy of 73.92% on the MEDQA benchmark, surpassing baseline models including Self-RAG (Asai et al., 2023), HyDE (Gao et al., 2022), GraphRAG (Edge et al., 2024), and MedRAG (Zhao et al., 2025).

claimIn the AMG-RAG system, the Medical Knowledge Graph (MKG) created using PubMed data (PubMed-MKG) is more effective in enhancing system performance than the version created using Wikipedia data (Wiki-MKG), as demonstrated by ablation studies in Table 3.

referenceAgentic Medical Graph-RAG (AMG-RAG) features autonomous Knowledge Graph (KG) evolution through Large Language Model (LLM) agents that extract entities and relations from live sources with provenance tracking; graph-conditioned retrieval that maps queries onto the Medical Knowledge Graph (MKG) to guide evidence selection; and reasoning over structured context where the answer generator utilizes both textual passages and traversed sub-graphs for transparent, multi-hop reasoning.

referenceAgentic Medical Graph-RAG (AMG-RAG) is a framework that dynamically generates a confidence-scored Medical Knowledge Graph (MKG) tightly coupled to a Retrieval Augmented Generation (RAG) and Chain-of-Thought (CoT) pipeline.

claimAMG-RAG is an advanced question-answering system that dynamically constructs a Medical Knowledge Graph (MKG) while integrating structured reasoning for medical question-answering tasks.

procedureThe AMG-RAG pipeline follows a specific procedure: (1) Question parsing, where an LLM agent extracts medical terms from the user query; (2) Node exploration, where the system queries the knowledge graph for each term using a confidence threshold to filter relationships; (3) Knowledge traversal, supporting both breadth-first and depth-first strategies until a cumulative confidence threshold or document limit is reached; (4) Chain-of-thought generation, which synthesizes reasoning traces for each entity by integrating information from connected nodes; (5) Answer synthesis, which aggregates reasoning traces to produce a final output with an associated confidence score.

measurementThe AMG-RAG model, which has 8 billion parameters, achieves an F1 score of 74.1% on the MEDQA benchmark without requiring fine-tuning, surpassing the performance of the 70 billion parameter Meditron model.

measurementThe AMG-RAG framework achieved an F1 score of 74.1% on the MEDQA benchmark and an accuracy of 66.34% on the MEDMCQA benchmark, outperforming comparable models and models 10 to 100 times larger.

claimThe AMG-RAG system design combines Chain-of-Thought (CoT) reasoning with structured knowledge graph integration and retrieval mechanisms to maintain high accuracy across diverse datasets.

claimThe AMG-RAG system stores the Medical Knowledge Graph (MKG) within a Neo4j database to leverage its graph query engine for efficient retrieval and analysis during inference.

claimThe AMG-RAG system addresses challenges such as inaccurate knowledge updating, noisy retrieval results, and LLM hallucinations in healthcare applications by implementing a dynamic Medical Knowledge Graph (MKG) construction approach.

claimAMG-RAG demonstrates superior performance in rapidly evolving subfields like Neurology and Genetics due to real-time PubMed integration during inference, combined with structured reasoning and knowledge graph grounding.

referenceThe AMG-RAG model is designed to retrieve relevant content, structure key information, and formulate reasoning to guide answer selection when applied to the MEDQA dataset.

claimThe AMG-RAG framework dynamically creates a Medical Knowledge Graph (MKG) that adapts to new queries and evidence, unlike traditional static knowledge bases.

claimThe AMG-RAG system embeds confidence scoring mechanisms that explicitly model information uncertainty to provide transparent reliability assessments for medical information.

procedureThe AMG-RAG knowledge graph creation process operates independently from the question-answering process, enabling continuous background updates of the Medical Knowledge Graph using search tools like PubMedSearch or WikiSearch.

procedureThe AMG-RAG framework utilizes LLM-driven agents assisted by domain-specific search tools to generate graph entities enriched with metadata, confidence scores, and relevance indicators.

claimThe AMG-RAG framework assigns a confidence score to each edge in the Medical Knowledge Graph to indicate the reliability of each relationship.

measurementThe AMG-RAG model demonstrated 93% query accuracy in connecting treatment protocols for Nitroglycerin and Propranolol, with a relevance rating of 9.0/10 for importance in cardiovascular multi-drug therapy.

procedureThe Medical Entity Recognizer (MER) agent within the AMG-RAG framework identifies domain-specific terms in user queries to establish foundational nodes in the knowledge graph.

claimThe applicability of the AMG-RAG system to non-medical tasks remains unexplored.

claimThe Agentic Medical Graph-RAG (AMG-RAG) framework automates the construction and continuous updating of Medical Knowledge Graphs (MKGs) and integrates reasoning to retrieve external evidence for medical question answering.

claimAblating either Chain-of-Thought (CoT) or Medical Knowledge Graph (MKG) integration in the AMG-RAG system causes a considerable degradation in accuracy and F1 score, demonstrating that structured multi-hop reasoning and medical knowledge grounding are indispensable for delivering accurate and evidence-based answers.

referenceTable 1 in the paper 'Bridging the Gap Between LLMs and Evolving Medical Knowledge' compares state-of-the-art language models on the MEDQA benchmark, showing that Med-Gemini (1800B) achieved 91.1% accuracy, GPT-4 (1760B) achieved 90.2% accuracy, Med-PaLM 2 (340B) achieved 85.4% accuracy, AMG-RAG (8B) achieved 73.9% accuracy, and BioMedGPT (10B) achieved 50.4% accuracy.

procedureThe AMG-RAG system constructs a Medical Knowledge Graph (MKG) dynamically for each question by integrating search items, contextual information, and relationships extracted from medical textbooks and search tools, specifically Wikipedia (Wiki-MKG) and PubMed (PubMed-MKG).

claimThe study utilized GPT-4o-mini as the backbone for both the Medical Knowledge Graph (MKG) and AMG-RAG implementations, serving as the core component for reasoning, RAG, and structured knowledge integration.

measurementThe AMG-RAG model achieved 90% accuracy in supporting effective multi-drug therapy reasoning for Nitroglycerin and Labetalol, with a relevance rating of 8.7/10 for acute hypertension protocols.

claimAMG-RAG outperforms other approaches of similar model size or models 10 to 100 times larger in accuracy and reasoning capabilities for medical question-answering tasks.

measurementThe AMG-RAG system uses a confidence threshold of 8 on a 10-point scale to retain only high-reliability nodes and edges in the Medical Knowledge Graph, a value empirically determined to yield the best benchmark performance.

claimThe AMG-RAG system relies on external search tools which introduce latency during the initial creation of the Medical Knowledge Graph (MKG) when it is built from scratch.

measurementThe AMG-RAG system built on the GPT4o-mini LLM backbone with PubMed-MKG achieves an accuracy of 73.92% on the MEDQA benchmark, which is higher than the performance achieved when using LLaMA 3.1 or Mixtral backbones with the same retrieval pipeline.

claimThe Medical Knowledge Graph (MKG) serves as the core knowledge source for the AMG-RAG inference pipeline.

claimAMG-RAG, which has 8B parameters, delivers competitive results compared to much larger models like Med-Gemini (1800B) and GPT-4 (1760B).

codeThe AMG-RAG source code and implementation are available at https://github.com/MrRezaeiUofT/AMG-RAG.

measurementThe AMG-RAG model demonstrated effective differentiation of clinical uses for Lanolin and Fluorometholone eye drops with high interpretability and a relevance rating of 8.5/10.

measurementRemoving search functionality from the AMG-RAG system drops accuracy to 67.16%, and removing Chain-of-Thought (CoT) reasoning drops accuracy to 66.69% on the MEDQA benchmark.

procedureThe developers of AMG-RAG implement a confidence scoring mechanism into the Medical Knowledge Graph (MKG) to validate retrieved information and mitigate risks of inaccuracy and bias.

measurementOn the MedMCQA benchmark, AMG-RAG achieves an accuracy of 66.34%, outperforming Meditron-70B (66.0%), Codex 5-shot CoT (59.7%), VOD (58.3%), Flan-PaLM (57.6%), PaLM (54.5%), GAL (120B, 52.9%), PubmedBERT (40.0%), SciBERT (39.0%), BioBERT (38.0%), and BERT (35.0%).

claimAMG-RAG reduces latency during question answering by retrieving information from a pre-populated Medical Knowledge Graph instead of performing new searches.

procedureThe AMG-RAG framework uses specialized medical search tools to retrieve contextual descriptions for each identified entity, which are then added to the knowledge graph to provide semantic context.

referenceAMG-RAG combines dynamically synthesized Medical Knowledge Graphs (MKG) with multi-step reasoning, guided by confidence scores and adaptive traversal strategies, as described by Trivedi et al. (2022).

claimAMG-RAG utilizes tools such as PubMedSearch and WikiSearch to dynamically integrate domain-specific knowledge, which improves its ability to answer medical questions.

claimClinical experts and expert LLMs like GPT-4 validated the correctness of the Medical Knowledge Graph used in the AMG-RAG system.

claimIn the AMG-RAG system, the PubMed-MKG (Medical Knowledge Graph created via PubMedSearch) consistently outperforms the Wiki-MKG (Medical Knowledge Graph created via WikiSearch) on the MEDQA benchmark, likely due to the domain-specific nature of PubMed content.

claimAMG-RAG maintains a balanced minimum dependency on computational resources and search tools during the test phase by keeping the Medical Knowledge Graph updated.

procedureThe AMG-RAG system employs a dynamic Medical Knowledge Graph (MKG) construction method characterized by six key innovations: (1) Dynamic Node and Relationship Creation using semantic templates; (2) Bidirectional Relationships for flexible traversal; (3) Confidence-Based Relevance Scoring using textual annotations and quantitative scores; (4) Summarization with Reliability Indicators; (5) Thresholding for Quality Control; and (6) Integration with Neo4j for storage and querying.

measurementThe AMG-RAG model improved query relevance for multi-drug therapy in eye care by 19% when processing Fluorometholone and Ketotifen eye drops, with a relevance rating of 8.8/10.

measurementExpert LLMs like GPT-4 achieved an accuracy of 9/10 when validating knowledge extracted for the AMG-RAG Medical Knowledge Graph.

claimThe AMG-RAG system leverages explicit relationships in a graph-centric retrieval process to synthesize information across domains including drug interactions, clinical trials, patient histories, and guidelines.

claimThe AMG-RAG framework integrates explicit relationships and structured knowledge representations to improve intelligent question-answering systems, ensuring robustness and scalability.