multi-hop knowledge base question answering
Also known as: Multi-Hop QA, multi-hop QA, multi-hop knowledge base question answering, Multi-hop question answering
Facts (38)
Sources
Large Language Models Meet Knowledge Graphs for Question ... arxiv.org Sep 22, 2025 27 facts
referenceThe EFSUM method, proposed by Ko et al. in 2024, performs KG Fact Summarization and uses KG Helpfulness and Faithfulness Filters with GPT-3.5-Turbo, Flan-T5-XL, and Llama-2-7B-Chat models and dataset-inherent knowledge graphs (Freebase, Wikidata) for KGQA and Multi-hop QA, evaluated using Accuracy (Acc) on the WQSP and Mintaka datasets.
referencePanda et al. (2024) published 'HOLMES: Hyper-relational knowledge graphs for multi-hop question answering using LLMs' in ACL, pages 13263β13282, which introduces the HOLMES framework for multi-hop question answering using hyper-relational knowledge graphs.
claimMulti-hop question-answering involves decomposing complex questions into multiple single-hop questions, generating answers for each, and integrating those answers.
referenceThe KELDaR method, proposed by Li et al. in 2024, employs a question decomposition tree and atomic knowledge graph retrieval using GPT-3.5-Turbo and GPT-4-Turbo models to perform KGQA and multi-hop QA tasks, evaluated using the EM metric on WQSP and CWQ datasets.
referenceJing Zhang et al. (2022) proposed a subgraph retrieval enhanced model for multi-hop knowledge base question answering.
referenceFanOutQA (Zhu et al., 2024) is a multi-hop question-answering dataset that includes multi-hop questions requiring information from multiple documents.
referenceQiao et al. (2024) published 'GraphLLM: A general framework for multi-hop question answering over knowledge graphs using large language models' in NLPCC, pages 136β148, detailing a framework for multi-hop reasoning.
referenceThe KG2RAG method, proposed by Zhu et al. in 2025, utilizes graph-guided chunks expansion with the Llama-3-8B language model, incorporating dataset-inherent knowledge graphs to perform multi-hop QA tasks on the HotpotQA dataset, evaluated using F1, P, and R metrics.
referenceLongRAG, as described by Zhao et al. (2024a), utilizes domain-specific fine-tuning for RAG and CoT-guided filtering with models including ChatGLM3-6B, Qwen1.5-7B, Vicuna-v1.5-7B, Llama-3-8B, GPT-3.5-Turbo, and GLM-4, applied to Wikidata for KBQA and Multi-hop QA tasks.
referenceSaleh et al. (2024) published 'SG-RAG: Multi-hop question answering with large language models through knowledge graphs' in ICNLSP, pages 439β448, presenting a method for multi-hop QA using knowledge graphs.
referenceThe LPKG method, proposed by Wang et al. in 2024, involves Planning LLM Tuning, Inference, and Execution using GPT-3.5-Turbo, CodeQwen1.5-7B-Chat, and Llama-3-8B-Instruct models with dataset-inherent knowledge graphs (Wikidata) and Wikidata15K for KGQA and Multi-hop QA, evaluated using EM, P, and R metrics on the HotpotQA, 2WikiMQA, Bamboogle, MuSiQue, and CLQA-Wiki datasets.
referenceHOLME utilizes a context-aware retrieved and pruned hyper-relational knowledge graph, constructed based on an entity-document graph, to enhance large language models for generating answers in multi-hop question-answering.
claimHybrid methods for synthesizing LLMs and KGs support multi-doc, multi-modal, multi-hop, conversational, XQA, and temporal QA tasks.
claimThe combination of knowledge fusion, Retrieval-Augmented Generation (RAG), Chain-of-Thought (CoT) reasoning, and ranking-based refinement accelerates complex question decomposition for multi-hop Question Answering, enhances context understanding for conversational Question Answering, facilitates cross-modal interactions for multi-modal Question Answering, and improves the explainability of generated answers.
referenceThe Oreo method, proposed by Hu et al. in 2022, uses knowledge interaction, injection, and knowledge graph random walks with RoBERTA-base and T5-base models to perform CBQA, OBQA, and multi-hop QA tasks, evaluated using accuracy on NQ, WQ, WQSP, TriviaQA, CWQ, and HotpotQA datasets.
claimMulti-hop Question Answering involves decomposing complex questions and generating answers based on multi-step and iterative reasoning over a factual Knowledge Graph.
claimApproaches using Knowledge Graphs as reasoning guidelines support multi-doc, multi-modal, multi-hop, XQA, and temporal QA tasks.
referenceCoT-RAG, as described by Li et al. (2025a), utilizes KG-driven CoT generation and knowledge-aware RAG with pseudo-program KGs, employing ERNIE-Speed-128K and GPT-4o-mini models for KGQA and multi-hop QA tasks.
referenceThe KG-CoT method, proposed by Zhao et al. in 2024, uses chain-of-thought-based joint reasoning between knowledge graphs and LLMs (GPT-4, GPT-3.5-Turbo, Llama-7B, Llama-13B) to perform KBQA and multi-hop QA tasks, evaluated using Acc and Hit@K metrics on WQSP, CWQ, SQ, and WQ datasets.
claimFusing knowledge from LLMs and Knowledge Graphs augments question decomposition in multi-hop Question Answering, facilitating iterative reasoning to generate accurate final answers.
claimApproaches that leverage retrieved factual evidence from knowledge graphs for refinement and validation are designed to augment Large Language Model capabilities in understanding user interactions and verifying intermediate reasoning for multi-hop question-answering (Chen et al., 2024b) and conversational question-answering (Xiong et al., 2024).
referenceGu et al. (2024) introduced PokeMQA, a method for programmable knowledge editing for multi-hop question answering, published in the ACL proceedings.
referencePIP-KAG, as described by Huang et al. (2025), uses parametric pruning for KAG with Llama-3-8B-Instruct on dataset-inherent knowledge graphs for KGQA and multi-hop QA tasks.
referenceMINTQA (He et al., 2024) is a multi-hop question-answering dataset designed to support the evaluation of Large Language Models on new and tail knowledge.
referenceGMeLLo integrates explicit knowledge from knowledge graphs with linguistic knowledge from large language models for multi-hop question-answering by introducing fact triple extraction, relation chain extraction, and query and answer generation.
claimApproaches using Knowledge Graphs as background knowledge support multi-doc, multi-modal, multi-hop, conversational, and XQA tasks.
claimJoint reasoning over factual knowledge graphs and LLMs can mitigate challenges related to knowledge retrieval, conflicts across modalities and knowledge sources, and complex reasoning in multi-document, multi-modal, and multi-hop question answering.
LLM-KG4QA: Large Language Models and Knowledge Graphs for ... github.com 4 facts
referenceResearch on integrating Large Language Models with Knowledge Graphs is categorized into several distinct approaches: Pre-training, Fine-Tuning, KG-Augmented Prompting, Retrieval-Augmented Generation (RAG), Graph RAG, KG RAG, Hybrid RAG, Spatial RAG, Offline/Online KG Guidelines, Agent-based KG Guidelines, KG-Driven Filtering and Validation, Visual Question Answering (VQA), Multi-Document QA, Multi-Hop QA, Conversational QA, Temporal QA, Multilingual QA, Index-based Optimization, and Natural Language to Graph Query Language (NL2GQL).
referenceThe paper 'FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models' was published at ACL in 2024, utilizes the FanOutQA dataset, and is categorized under Multi-hop QA.
referenceThe paper 'How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Evaluation With Multi-Hop QA' published in ACL ARR in 2024 investigates the credibility of answers from retrieval-augmented LLMs.
referenceThe paper 'MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge' was published on arXiv in 2024, utilizes the MINTQA dataset, and is categorized under Multi-hop QA.
Knowledge Graphs: Opportunities and Challenges - Springer Nature link.springer.com Apr 3, 2023 2 facts
claimSaxena et al. (2020) proposed EmbedKGQA to perform multi-hop question answering over sparse knowledge graphs by utilizing knowledge graph embeddings to reduce sparsity, creating embeddings for entities, selecting the embedding of a given question, and combining them to predict the answer.
claimKnowledge graph-based question-answering systems enable multi-hop question answering, allowing for the production of more complex and sophisticated answers by combining facts and concepts from knowledge graphs.
How to Improve Multi-Hop Reasoning With Knowledge Graphs and ... neo4j.com Jun 18, 2025 2 facts
perspectiveMany multi-hop question-answering issues can be resolved by preprocessing data before ingestion and connecting it to a knowledge graph, rather than relying solely on query-time processing.
claimMulti-hop question-answering tasks require a system to retrieve several documents and break a single question down into multiple sub-questions to derive an accurate answer.
Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org 1 fact
referenceWang et al. (2024) developed 'Llm-kgmqa', a large language model-augmented multi-hop question-answering system based on knowledge graphs in the medical field.
Knowledge Graph Combined with Retrieval-Augmented Generation ... drpress.org Dec 2, 2025 1 fact
referenceThe paper 'Multi-hop question answering under temporal knowledge editing' by Cheng K, Lin G, Fei H, et al. was published as an arXiv preprint (arXiv: 2404.00492) in 2024.
A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Nov 4, 2024 1 fact
claimMetaQA is a benchmark for evaluating multi-hop question answering over knowledge graphs by testing a model's ability to perform multi-step reasoning over structured data.