Large Language Models (LLMs) struggle with complex question-answering tasks due to limited reasoning capability, lack of up-to-date or domain-specific knowledge, and a tendency to generate hallucinated content.
The KG-CoT method (Zhao et al., 2024b) leverages external knowledge graphs to generate reasoning paths for joint reasoning of Large Language Models and knowledge graphs to enhance reasoning capabilities for question answering.
The survey titled 'Large Language Models Meet Knowledge Graphs for Question Answering' introduces a structured taxonomy that categorizes state-of-the-art works on synthesizing Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA).
Sequeda et al. (2024) published 'A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise SQL databases' in GRADES-NDA@SIGMOD/PODS, pages 1–12, which evaluates LLM accuracy on enterprise SQL databases using knowledge graphs.
The paper 'Large Language Models Meet Knowledge Graphs for Question Answering' provides details on evaluation metrics, benchmark datasets, and industrial and scientific applications for synthesizing Large Language Models and Knowledge Graphs for Question Answering.
Knowledge Graphs can serve as reasoning guidelines for LLMs in Question Answering tasks by providing structured real-world facts and reliable reasoning paths, which improves the explainability of generated answers.
PG-RAG (Liang et al., 2024b) proposes dynamic and adaptable knowledge retrieval indexes based on Large Language Models to handle complex queries and improve the performance of Retrieval-Augmented Generation (RAG) systems in Question Answering tasks.
Li et al. (2025b) introduced a graph neural network-enhanced retrieval method for question answering in large language models, published in NAACL (pages 6612–6633).
Christmann and Weikum (2024) proposed a method for RAG-based question answering over heterogeneous data and text, as detailed in their paper 'RAG-based question answering over heterogeneous data and text' (arXiv:2412.07420).
PoG (Chen et al., 2024a) integrates reflection and self-correction mechanisms to adaptively explore reasoning paths over a knowledge graph via an LLM agent, augmenting the LLM in complex reasoning and question answering.
The evaluation metrics for synthesizing Large Language Models (LLMs) with Knowledge Graphs (KGs) for Question Answering (QA) are categorized into three types: Answer Quality (AnsQ), Retrieval Quality (RetQ), and Reasoning Quality (ReaQ).
Leveraging Knowledge Graphs to augment Large Language Models can help overcome challenges such as hallucinations, limited reasoning capabilities, and knowledge conflicts in complex Question Answering scenarios.
Talmor et al. (2019) introduced 'CommonsenseQA', a question answering challenge specifically targeting commonsense knowledge.
SPOKE KG-RAG (Soman et al., 2024) implements a token-based optimized Knowledge Graph Retrieval-Augmented Generation framework that integrates explicit and implicit knowledge from Knowledge Graphs to enable cost-effective Question Answering.
The survey on Large Language Models and Knowledge Graphs for Question Answering highlights alignments between recent methodologies and the challenges of complex question-answering tasks, while noting that taxonomies from different perspectives are non-exclusive and may overlap.
KG-Rank, proposed by Yang et al. (2024), uses re-ranking techniques based on relevance and redundancy scores to rank top triples from Knowledge Graphs, which are then combined with prompts to generate answers for Question Answering tasks.
Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, and Jianhua Tao published the paper 'KS-LLM: Knowledge selection of large language models with evidence document for question answering' in 2024.
Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, and Jure Leskovec published the paper 'GreaseLM: Graph reasoning enhanced language models for question answering' in 2021.
Question answering (QA) is a fundamental component in artificial intelligence, natural language processing, information retrieval, and data management, with applications including text generation, chatbots, dialog generation, web search, entity linking, natural language query, and fact-checking.
Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu (2025) identify that previous surveys on synthesizing Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA) have limitations in scope and task coverage, specifically noting that existing surveys focus on general knowledge-intensive tasks like extraction and construction, limit QA tasks to closed-domain scenarios, and approach the integration of LLMs, KGs, and search engines primarily from a user-centric perspective.
Hybrid methods for synthesizing LLMs and Knowledge Graphs for Question Answering utilize multiple roles for the Knowledge Graph, including background knowledge, reasoning guidelines, and refiner/validator.
QUASAR, proposed by Christmann and Weikum (2024), enhances RAG-based Question Answering by integrating unstructured text, structured tables, and Knowledge Graphs, while re-ranking and filtering relevant evidence.
KGQA (Ji et al., 2024) integrates Chain-of-Thought (CoT) prompting with graph retrieval to enhance retrieval quality and multi-hop reasoning capabilities of Large Language Models in Question Answering tasks.
Michael Zhang and Eunsol Choi (2021) introduced SituatedQA, a method for incorporating extra-linguistic contexts into question answering.
Ma et al. (2025a) published 'Unifying large language models and knowledge graphs for question answering: Recent advances and opportunities' in EDBT, pages 1174–1177, which reviews the integration of LLMs and knowledge graphs for question answering.
Linders and Tomczak (2025) proposed a knowledge graph-extended retrieval augmented generation method for question answering (arXiv:2504.08893).
Remaining challenges in the synthesis of Large Language Models and Knowledge Graphs include efficient knowledge retrieval, dynamic knowledge integration, effective reasoning over knowledge at scale, and explainable and fairness-aware Question Answering.
The survey on Large Language Models and Knowledge Graphs for Question Answering underemphasizes quantitative and experimental evaluation of different methodologies due to variations in implementation details, the diversity of benchmark datasets, and non-standardized evaluation metrics.
Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu (2025) conducted a literature review by retrieving research papers published since 2021 using Google Scholar and PaSa, utilizing search phrases such as 'knowledge graph and language model for question answering' and 'KG and LLM for QA', while extending the search scope for benchmark dataset papers to 2016.
Knowledge Graphs can act as refiners and validators for LLMs in Question Answering tasks, allowing LLMs to verify initial answers against factual knowledge and filter out inaccurate responses.