Ensuring an effective entity linking pipeline is a critical subproblem in integrating Large Language Models and knowledge graphs, as noted by Shen et al. (2021), due to challenges like lexical ambiguity, long-tail entities, and incomplete context in open-domain or multi-turn settings.
Knowledge-Driven Fine-Tuning is a research approach that incorporates structured knowledge from knowledge graphs during large language model (LLM) adaptation to improve generalization and knowledge-awareness.
Cao and Liu (2023) proposed RELMKG, a method for reasoning with pre-trained language models and knowledge graphs for complex question answering, published in Applied Intelligence.
The BDMG framework (Du et al., 2024) utilizes a bi-directional multi-granularity generation approach to construct sentence-level generation multiple times based on ternary components, ultimately generating graph-level text.
There are three primary strategies for fusing Knowledge Graphs and Large Language Models: LLM-Enhanced KGs (LEK), KG-Enhanced LLMs (KEL), and Collaborative LLMs and KGs (LKC).
SAC-KG (Chen S. et al., 2024) uses large language models to construct million-scale, high-precision knowledge graphs.
Large Language Models demonstrate utility in performing key tasks for Knowledge Graphs, such as KG embedding, completion, construction, and question answering, which enhances the overall quality and applicability of Knowledge Graphs.
Wang et al. (2024) developed 'Llm-kgmqa', a large language model-augmented multi-hop question-answering system based on knowledge graphs in the medical field.
ReLMKG, proposed by Cao and Liu in 2023, uses a language model to encode complex questions and guides a graph neural network in message propagation and aggregation through outputs from different layers.
Knowledge graphs rely on structured data expressed as entities, relationships, and attributes using manually designed patterns, whereas Large Language Models derive knowledge from large-scale text corpora using unsupervised learning to create high-dimensional continuous vector spaces.
Anelli et al. (2021) introduced sparse feature factorization for recommender systems utilizing knowledge graphs in the Proceedings of the 15th ACM Conference on Recommender Systems.
Figure 11 illustrates the interaction between Large Language Models and Knowledge Graphs, while Figure 12 presents a framework for collaborative knowledge representation and reasoning.
H. Li, G. Appleby, and A. Suh published 'A preliminary roadmap for LLMs as assistants in exploring, analyzing, and visualizing knowledge graphs' as an arXiv preprint in 2024.
KG-CoT, proposed by Zhao et al. in 2024, utilizes a small-scale incremental graph reasoning model for inference on knowledge graphs and generates inference paths to create high-confidence knowledge chains for large-scale LLMs.
Knowledge graphs are labor-intensive to construct, face scalability challenges as they grow, struggle to integrate with unstructured data sources, and have limited knowledge coverage.
Knowledge graphs derived from multiple sources often contain conflicting or redundant facts, such as contradictory treatments for the same disease or disagreements on causality in the biomedical domain, which makes it difficult for Large Language Models to determine which facts to trust or prioritize.
Approaches like K-BERT and BERT-MK face limitations including potential latency and conflicts when integrating knowledge graphs with language models.
Dynamic reasoning systems for knowledge graph question answering include DRLK (Zhang M. et al., 2022), which extracts hierarchical QA context features, and QA-GNN (Yasunaga et al., 2021), which performs joint reasoning by scoring knowledge graph relevance and updating representations through graph neural networks.
Contextual enhancement, when empowered by knowledge graphs, serves as a strategy to overcome knowledge bottlenecks in large language models and enables them to handle intricate tasks more effectively.
Knowledge graphs typically exist as static structured data, relying on manual design and rule-driven processes for updates, which results in long update cycles.
The study 'Practices, opportunities and challenges in the fusion of knowledge' identifies three approaches for integrating knowledge graphs and Large Language Models: KG-enhanced LLMs (KEL), LLM-enhanced KGs (LEK), and collaborative LLMs and KGs (LKC).
The paper 'Kg-cot: chain-of-thought prompting of large language models over knowledge graphs for knowledge-aware question answering' was published in the Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24) in 2024.
The SimKGC model (Wang L. et al., 2022) enhances entity representations by employing contrastive learning with in-batch, pre-batch, and self-negatives.
P. Ke, H. Ji, Y. Ran, X. Cui, L. Wang, L. Song, et al. published 'Jointgt: Graph-text joint representation learning for text generation from knowledge graphs' as an arXiv preprint in 2021.
Shen et al. (2022) optimize semantic representations from language models and structural knowledge in knowledge graphs through a probabilistic loss.
Knowledge Graphs excel at symbolic reasoning and evolve as new knowledge is discovered, making them well-suited for providing domain-specific information.
Path learning in knowledge graphs treats connection paths between entities as the basis to capture both explicit information and implicit relationships.
Recent methods to bridge the semantic gap between knowledge graphs and natural language, such as joint graph-text embeddings, prompt-based schema alignment, and co-training frameworks, often require extensive tuning and are task-specific, lacking robust generalization, according to Peng et al. (2024).
Knowledge Graphs can be used to inject external knowledge during both the pre-training and inference phases of Large Language Models, offering an additional layer of factual grounding and improving interpretability.
Traditional knowledge graphs are static snapshots that lack mechanisms to represent temporal dependencies or model dynamic updates, which causes knowledge graph-enhanced large language models to struggle with reasoning over sequences of events, causal relationships, or time-sensitive information.
Knowledge graph error validation is the process of checking and confirming data within knowledge graphs to ensure accuracy and consistency.
Lukovnikov et al. (2019) investigated the use of pretrained transformers for simple question answering over knowledge graphs in a paper presented at the 18th International Semantic Web Conference in Auckland, New Zealand.
AgentTuning enables Large Language Models to interact with knowledge graphs as active environments, allowing models to identify task-relevant knowledge structures, plan multi-step actions, and dynamically query knowledge graph APIs.
Large language models can improve knowledge graphs by using semantic understanding and generation capabilities to extract knowledge, thereby increasing coverage and accuracy.
The authors of 'Practices, opportunities and challenges in the fusion of knowledge...' observe that most existing surveys focus primarily on the use of Knowledge Graphs to enhance Large Language Models (KEL).
Inconsistent answers from different system components, such as Knowledge Graphs and Large Language Models, degrade the perceived coherence of an AI system, which is particularly critical in sensitive applications like healthcare and finance.
Knowledge graph-to-text is a method that generates natural language text from structured knowledge graphs by leveraging models to map graph data into coherent, informative sentences.
Knowledge graphs, while structured and factual, often require natural language capabilities to achieve flexible interaction and knowledge understanding.
Large Language Models (LLMs) excel in reasoning and inference, while Knowledge Graphs (KGs) provide robust frameworks for knowledge representation due to their structured nature.
The fusion of Knowledge Graphs (KGs) and Large Language Models (LLMs) is categorized into three primary strategies: KG-enhanced LLMs (KEL), LLM-enhanced KGs (LEK), and collaborative LLMs and KGs (LKC).
The paper 'Knowledge solver: Teaching LLMs to search for domain knowledge from knowledge graphs' (arXiv:2309.03118) describes a method for teaching large language models to retrieve domain-specific knowledge from knowledge graphs.
GNP (Tian et al., 2024) bridges large language models and knowledge graphs through a technique called graph neural prompting.
Guo, Cao, and Yi (2022) created a medical question answering system that utilizes both large language models and knowledge graphs.
Multimodal integration in knowledge graphs improves accuracy but consumes a significant amount of resources.
Traditional knowledge graphs face significant challenges, specifically regarding data incompleteness and the under-utilization of available textual data.
In the field of education, knowledge graphs help organize and visualize complex learning content, while integration with large language models enables intelligent systems to provide precise learning guidance and personalized recommendations.
The structured format of knowledge graphs often fails to capture the richness and flexibility of natural language, creating a semantic gap that leads to poor retrieval of relevant knowledge and ineffective reasoning by Large Language Models.
Knowledge graph question answering (KGQA) systems leverage natural language processing techniques to transform natural language queries into structured graph queries.
LLM4EA (Chen S. et al., 2024) aligns Knowledge Graphs using Large Language Model-generated annotations, employing active learning to reduce annotation space and a label refiner to correct noisy labels.
The paper 'Joint knowledge graph and large language model for fault diagnosis and its application in aviation assembly' by Peifeng, L., Qian, L., Zhao, X., Tao, B. presents a joint approach using knowledge graphs and large language models for fault diagnosis in aviation assembly.
Large-scale Knowledge Graphs often exhibit limited representation in specialized domains such as medicine and law, where many entities and relations are missing or weakly connected, creating a coverage gap and structural sparsity that limits their usefulness in tasks requiring nuanced domain-specific reasoning.
Abu-Rasheed et al. (2024) proposed using knowledge graphs as factual background prompts for large language models, where the models fill text templates to provide accurate and easily understandable learning suggestions.
Most existing knowledge graphs are predominantly constructed from textual data and encode information using structured triples, failing to capture real-world knowledge that exists in multimodal formats like images, audio, and videos.
KG-CoT (Zhao et al., 2024) is constrained by the completeness of knowledge graphs, and local correctness in the system does not guarantee global logical consistency.
Collaborative approaches between Large Language Models and Knowledge Graphs aim to combine the advantages of both to create a unified model capable of performing well in both knowledge representation and reasoning.
Integrating Knowledge Graphs with Large Language Models allows LLMs to benefit from a foundation of explicit knowledge that is reliable and interpretable.
Large language models improve the output quality of knowledge graphs by generating more coherent and innovative content and help integrate and classify unstructured data.
The MADLINK model (Biswas et al., 2024) uses an attention-based encoder-decoder to combine knowledge graph structure with textual entity descriptions.
Joint training or optimization approaches train Large Language Models (LLMs) and Knowledge Graphs (KGs) together to align them into a unified representation space, allowing language and structured knowledge to mutually reinforce each other.
In the financial field, the combination of knowledge graphs and large language models provides technological support for financial risk control, fraud detection, and intelligent investment advisory services.
Knowledge graphs are composed of entities (primary objects or concepts represented as nodes), relationships (connections between entities specifying interactions), attributes (properties or characteristics of entities), triples (facts represented as subject-predicate-object), and an ontology (the schema or structure organizing the graph).
Kim et al. (2020) integrate relation prediction and relevance ranking tasks with link prediction to improve the learning of relational attributes in knowledge graphs.
Constructing and maintaining high-quality knowledge graphs typically involves significant human effort, including data cleaning, entity alignment, relation labeling, and expert validation, which is particularly labor-intensive in domains requiring expert knowledge.
The integration of knowledge graphs and large language models has been successfully applied in five key fields: medical, industrial, education, financial, and legal.
Ibrahim et al. (2024) published a survey on augmenting knowledge graphs with large language models, covering models, evaluation metrics, benchmarks, and challenges.
Aligning knowledge graphs and Large Language Models is difficult because knowledge graphs use discrete structures that are hard to embed into the vectorized representations of Large Language Models, and Large Language Models' knowledge is difficult to map back to the discrete structures of knowledge graphs.
Zhang M. et al. (2024) proposed an LLM-enhanced embedding framework for knowledge graph error validation that uses graph structure information to identify suspicious triplet relations and then uses a language model for validation.
Knowledge Graph Reasoning (KGR) improves the reliability and relevance of LLM responses by autonomously integrating real-time knowledge from Knowledge Graphs.
In the medical domain, integrating knowledge graphs with large language models improves medical question answering by providing more accurate and contextually relevant answers to complex queries, as demonstrated by systems like MEG and LLM-KGMQA.
Inherent training data biases, domain adaptation challenges, and coverage gaps for long-tail relationships undermine the reliability of constructed knowledge graphs, particularly in professional domains where precision is required.
Manual verification and the use of domain-specific knowledge bases create scalability bottlenecks that limit the practical implementation of knowledge graphs.
Low precision and noisy data in knowledge graphs degrade the reliability of the knowledge graph itself and reduce the effectiveness of downstream KG-enhanced Large Language Models, which may propagate errors during inference, according to Yang et al. (2024a).
KG-Agent, proposed by Jiang J. et al. in 2024, utilizes programming languages to design multi-hop reasoning processes on knowledge graphs and synthesizes code-based instruction datasets for fine-tuning base LLMs.
The integration of symbolic logic from knowledge graphs with deep neural networks in large language models creates hybrid models where decisions emerge from entangled attention weights and vector operations, making reasoning paths difficult to trace.
GAP, proposed by Colas et al. in 2022, utilizes a masking structure to capture neighborhood information and introduces a novel type encoder that biases graph attention weights based on connection types.
AutoAlign (Zhang R. et al., 2023) performs entity alignment by constructing a predicate proximity graph to capture predicate similarity between Knowledge Graphs and uses the TransE model (Bordes et al., 2013) to compute entity embeddings, aligning entities into a shared vector space.
Collaborative reasoning models aim to leverage the structured, factual nature of knowledge graphs alongside the deep contextual understanding of Large Language Models to achieve more robust reasoning capabilities.
The GenKGC model (Xie et al., 2022) leverages pre-trained language models to convert the knowledge graph completion task into a sequence-to-sequence generation task.
Large Language Models (LLMs) often struggle with tasks requiring deep knowledge and complex reasoning due to limitations in their internal knowledge bases, a gap that can be bridged by integrating structured knowledge from Knowledge Graphs (KGs).
Knowledge graphs are classified into four types based on content patterns: encyclopedic (general knowledge), commonsense (everyday reasoning), domain-specific (specialized fields like medicine or finance), and multi-modal.
QA-GNN (Yasunaga et al., 2021) utilizes Graph Neural Networks (GNNs) to reason over knowledge graphs while incorporating LLM-based semantic reasoning. The model uses relevance scoring to estimate the importance of knowledge graph nodes concerning a given question and applies GNN reasoning to integrate those nodes into the LLM's answer generation.
Generation-retrieval frameworks for knowledge graph question answering, such as ChatKBQA (Luo H. et al., 2023) and GoG (Xu et al., 2024), use a two-stage approach that generates logical forms or new triples before retrieving relevant knowledge graph elements.
ERNIE (Zhang et al., 2019) enhances natural language processing capabilities by integrating knowledge graphs.
Knowledge graphs may contain fuzzy or incomplete data, such as entities with inconsistent attributes, while Large Language Models provide context-sensitive knowledge that varies based on training corpora and model architecture, leading to potential contradictions in reasoning paths or question-answering tasks as cited by Zhang X. et al. (2022).
Multi-task learning approaches for knowledge graph completion, such as MT-DNN and LP-BERT, fail to resolve the fundamental scalability gap in large-scale knowledge graphs, where latency grows polynomially with graph density.
Real-time updating of knowledge graphs faces scale limitations because increasing data size and complexity requires significant computing and storage resources, which limits dynamic capabilities.
The paper 'Large language models and knowledge graphs: opportunities and challenges' by Pan, J. Z., Razniewski, S., Kalo, J.-C., Singhania, S., Chen, J., Dietze, S. et al. examines the opportunities and challenges associated with combining large language models and knowledge graphs.
ProLINK (Wang K. et al., 2024) is a pre-training and hinting framework designed for low-resource inductive reasoning in arbitrary knowledge graphs without requiring additional training.
Yang et al. (2024) published 'Give us the facts: enhancing large language models with knowledge graphs for fact-aware language modeling'.
LKPNR (Runfeng et al., 2023) combines multi-hop reasoning across knowledge graphs with LLM context understanding.
Li et al. (2021) introduced a breadth-first search (BFS) strategy with a relationship bias for knowledge graph linearization and employed multi-task learning with knowledge graph reconstruction.
Wang B. et al. (2021) employ Siamese networks to learn structured representations in knowledge graphs while avoiding combinatorial explosion.
The fusion of large language models (LLMs) and knowledge graphs (KGs) encounters representational conflicts between the implicit statistical patterns of LLMs and the explicit symbolic structures of KGs, which disrupts entity linking consistency.
Saxena et al. (2022) propose transforming knowledge graph link prediction into a sequence-to-sequence task, replacing traditional triple scoring methods with auto-regressive decoding.
The KG-BERT model (Yao et al., 2019) treats knowledge graph triples as textual sequences and encodes them using BERT-style architectures.
Knowledge Tracing empowered by knowledge graphs allows large language models (LLMs) to track knowledge evolution, fill in knowledge gaps, and improve the accuracy of responses.
Biswas, Sack, and Alam (2024) introduced MADLINK, a method using attentive multihop and entity descriptions for link prediction in knowledge graphs, published in Semantic Web.
The paper 'Unifying large language models and knowledge graphs: a roadmap' by Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X. provides a roadmap for unifying large language models and knowledge graphs.
The article 'Practices, opportunities and challenges in the fusion of knowledge graphs and large language models' was published in Frontiers in Computer Science in 2025.
JAKET (Yu et al., 2022) enables bidirectional enhancement between knowledge graphs and language models.
Knowledge graphs contain discrete, explicitly defined relationships, while Large Language Models contain implicit, distributed semantic relationships, creating consistency issues when the two are integrated.
KC-GenRe, proposed by Wang Y. et al. in 2024, transforms the knowledge graph completion re-ranking task into a candidate ranking problem solved by a generative LLM and addresses missing issues using a knowledge-enhanced constraint reasoning method.
Knowledge Graphs support applications such as question answering, recommendation systems, and web search by linking entities and relationships in a structured format.
KSL (Feng et al., 2023) empowers LLMs to search for essential knowledge from external knowledge graphs, transforming retrieval into a multi-hop decision-making process.
Entity Association Analysis with the aid of Knowledge Graphs provides a powerful means to identify and utilize entity associations, filling knowledge gaps and promoting more accurate and intelligent responses in Large Language Models.
BERT-MK (He et al., 2019) employs a dual-encoder system that embeds both entities and their neighboring context from knowledge graphs to improve factual consistency and entity disambiguation.
Knowledge graph reasoning leverages graph structures and logical rules to infer new information or relationships from existing knowledge.
The integration of knowledge graphs and Large Language Models faces key challenges including efficiency issues in real-time knowledge updating and representational consistency in cross-modal learning, due to inherent differences in their knowledge representation and processing methodologies.
Collaborative representations between Large Language Models and Knowledge Graphs are increasingly demanded in interactive settings like conversational decision support, where users expect both accurate facts and transparent reasoning traces.
Pre-trained transformer-based methods, such as the model by Lukovnikov et al. (2019) and ReLMKG (Cao and Liu, 2023), use language models to bridge semantic gaps between questions and knowledge graph structures.
The integration of Knowledge Graphs into Large Language Models can be categorized into three types based on the effect of the enhancement: pre-training, reasoning methods (including supervised fine-tuning and alignment fine-tuning), and model interpretability.
Knowledge Graphs store factual knowledge in a structured manner, typically in the form of a 3-tuple containing a head entity, a relation, and a tail entity.
Liu et al. (2020) introduced 'K-BERT', a method for enabling language representation with knowledge graphs, in the Proceedings of the AAAI Conference on Artificial Intelligence.
The paper 'Two heads are better than one: Integrating knowledge from knowledge graphs and large language models for entity alignment' was published as an arXiv preprint (arXiv:2401.16960) in 2024.
Knowledge Graph Reasoning (KGR) helps counterbalance biases in LLM training data by relying on Knowledge Graphs as an objective source of factual information.
Failures in aligning Large Language Models and knowledge graphs can reduce system explainability and negatively impact user trust.
Knowledge graph-based retrofitting (KGR) incorporates knowledge graphs into large language models to verify responses and reduce hallucinations.
Hao et al. (2022) introduced 'Bertnet', a system for harvesting knowledge graphs with arbitrary relations from pre-trained language models.
In the industrial domain, the integration of knowledge graphs and large language models advances intelligent systems for quality testing, maintenance, fault diagnosis, and process optimization.
KGValidator, proposed by Boylan et al. in 2024, is a consistency and validation framework for knowledge graphs that uses generative models and supports any external knowledge source.
Sun et al. (2021a) proposed 'Jointlk', a method for joint reasoning with language models and knowledge graphs for commonsense question answering.
Jiang et al. (2024) developed 'KG-Agent', an efficient autonomous agent framework designed for complex reasoning over knowledge graphs.
Wang et al. (2024) introduced 'LLM as Prompter', a technique for low-resource inductive reasoning on arbitrary knowledge graphs.
KGFlex (Anelli et al., 2021) integrates Knowledge Graphs with a sparse factorization approach to analyze the dimensions of user decision-making and model user-item interactions.
Abu-Rasheed, Weber, and Fathi (2024) propose using knowledge graphs as context sources for large language model-based explanations of learning recommendations in their arXiv preprint arXiv:2403.03008.
Knowledge graphs provide structured and explicit knowledge representation, support enhanced reasoning and multi-hop queries, offer domain-specific precision, ensure consistency and reusability, and provide high explainability for transparent decision-making.
KGPT, proposed by Chen et al. in 2020, comprises a generative model for producing knowledge-enriched text and a pre-training paradigm on a large corpus of knowledge text crawled from the web.
The paper 'Llm-align: utilizing large language models for entity alignment in knowledge graphs' (arXiv:2412.04690) investigates the use of large language models for entity alignment tasks within knowledge graphs.
LLM-based knowledge graph completion methods, such as the sequence-to-sequence model GenKGC and the text-graph hybrid model MADLINK, require exhaustive text processing and candidate scoring, resulting in high computational costs for large knowledge graphs.
Knowledge graphs often undergo offline batch updates, preventing the timely inclusion of new knowledge in rapidly changing fields such as finance, news, and epidemics.
BERTRL, proposed by Zha et al. in 2022, leverages pre-trained language models and fine-tunes them using relation instances and reasoning paths as training samples.
Recent research integrates Large Language Models with Knowledge Graphs to address traditional Knowledge Graph limitations by incorporating text data and improving performance across various tasks.