concept

KG-IRAG

Facts (55)

Sources

KG-IRAG: A Knowledge Graph-Based Iterative Retrieval-Augmented ... arxiv.org arXiv Mar 18, 2025 47 facts

procedureFor Questions 2 and 3 in the evaluation, hallucination is detected if the LLM generates an answer indicating an abnormal event at an incorrect time, produces an answer not present in the provided data, or, in the case of the KG-IRAG system, if the second LLM (LLM2) fails to decide when to stop the exploration process.

measurementThe Irish Weather Dataset used to evaluate KG-IRAG contains data from January 2017 to December 2019, covering 25 stations across 15 counties in Ireland on an hourly basis, with 227,760 entities, 876,000 relations, and 219,000 records.

claimThe research on KG-IRAG utilized the Wolfpack computational cluster, which is supported by the School of Computer Science and Engineering at UNSW Sydney.

claimKG-IRAG employs a step-by-step retrieval mechanism that guides Large Language Models (LLMs) in determining when to stop exploration, which improves response accuracy compared to traditional Retrieval-Augmented Generation (RAG) methods.

claimIn the KG-IRAG study, F1 Score and Hit Rate metrics are excluded for the Q1 dataset because it contains less temporal reasoning compared to the Q2 and Q3 datasets.

claimKG-IRAG has three identified limitations: (1) the reasoning mechanism requires further enhancement for accurate and efficient data management; (2) imperfect retrieval of extra data remains a challenge that impacts performance; and (3) there is a potential need to balance the use of internal and external knowledge, as discussed by Wang et al. (2024).

referenceThe KG-IRAG system is compared against two other RAG methods: a standard RAG method without exploration (where LLMs decide the data needed, retrieve it, and then process it) and KG-RAG, which utilizes a Chain of Exploration for Knowledge Graph retrieval as described by Sanmartin (2024).

procedureThe KG-IRAG evaluation process uses 'standard data,' defined as the minimal subset of information necessary to answer a query, ensuring that only relevant data is included in the input provided to the Large Language Models.

measurementThe Sydney Weather Dataset used to evaluate KG-IRAG contains data from January 2022 to mid-August 2024, collected every 30 minutes, with 332,433 entities, 559,673 relations, and 279,837 records.

referenceThe LLMs utilized to benchmark the KG-IRAG model include Llama-3-8B-Instruct, GPT-3.5-turbo-0125, GPT-4o-mini-2024-07-18, and GPT-4o-2024-08-06, as referenced in Touvron et al. (2023) and Achiam et al. (2023).

claimThe authors of the KG-IRAG paper introduced three new datasets—weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW—designed to test the ability of Large Language Models to answer queries requiring the retrieval of temporal information and mathematical reasoning.

claimQuestion 1 (Q1) in the KG-IRAG evaluation is designed to identify whether an abnormal event, such as rainfall or traffic congestion, occurred during a specific time slot, relying primarily on entity recognition and retrieval of static information from the knowledge graph.

claimThe KG-IRAG system addresses two limitations in current GraphRAG methods: (1) few methods address queries highly dependent on temporal reasoning, and (2) no existing temporal QA dataset requires consecutive retrieval of uncertain amounts of data from a temporal knowledge base.

procedureThe KG-IRAG system utilizes the same LLM1 and LLM2 models for both data retrieval and iterative reasoning tasks.

procedureThe KG-IRAG iterative RAG process proceeds as follows: (1) identify the starting time and location, (2) perform KG exploration to retrieve relevant triplets, (3) evaluate the retrieved data using LLM2 to determine if the problem is solved, (4) if unresolved, adjust search criteria by moving to a different time or location, and (5) continue retrieving new triplets until the answer is generated.

measurementThe performance of KG-IRAG is evaluated using three datasets: Irish weather data, Sydney weather data, and Traffic Volume of Transport for New South Wales (TFNSW) data.

measurementIn the KG-IRAG experimental setup, the maximum early or late time for trip planning is set to 12 hours for weather datasets and 9 hours for the TFNSW dataset.

claimThe KG-IRAG research uses three datasets for experiments: weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW, which are designed to allow models to perform both entity-based retrieval and time-dependent reasoning.

procedureThe triplet retrieval process in the KG-IRAG framework involves: (1) LLM1 identifies the starting time and location, (2) the system searches the knowledge graph for triplets matching these initial conditions, (3) LLM1 provides a reasoning prompt to guide the search, (4) the system retrieves a set of triplets relevant to the query, (5) LLM2 assesses whether the retrieved triplets and reasoning prompt are sufficient to answer the query, and (6) if insufficient, the system iterates by moving to the next time slot or location to retrieve additional triplets.

procedureIn the KG-IRAG system, if data is insufficient to answer a query or is out of boundary during a knowledge graph lookup, the system returns 'no answer' as the result.

claimKG-IRAG handles complex temporal queries by adjusting search parameters; for example, if a user wants to avoid rain during a trip and the system detects rainfall at 9:00 AM, the system retrieves data from earlier or later time slots to identify an optimal departure time.

procedureFor the KG-IRAG experimental datasets, 200 time slots are chosen randomly for each year to generate questions, where Question 1 asks if abnormal activity (rainfall or traffic jam) occurred, and Questions 2 and 3 ask for the latest time to head off early and the earliest time to head off late for a trip of the same length.

claimQuestions 2 (Q2) and 3 (Q3) in the KG-IRAG evaluation require the model to infer optimal departure times by reasoning over a temporal range, with Q2 asking for the latest time to leave early to avoid an abnormal event and Q3 focusing on the earliest time to leave late.

claimThe 'Late Stop' phenomenon in the KG-IRAG system occurs when the LLM decides to retrieve one or two extra rounds of data, leading to an information overload, though this rarely results in wrong answers because sufficient data is already included in the retrieval plan.

claimThe research on KG-IRAG is partially supported by the Technology Innovation Institute in Abu Dhabi, UAE.

claimThe KG-IRAG framework was evaluated using three new datasets: weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW, which are designed to test Large Language Models on time-sensitive and event-based queries requiring temporal reasoning and logical inference.

claimIn the KG-IRAG framework, locations and time points are modeled as entities, while attributes such as rain status and traffic volume are modeled as relations connecting those entities.

measurementThe Traffic Volume of Transport for New South Wales (TFNSW) Dataset used to evaluate KG-IRAG contains hourly traffic volume data from 2015-2016, with 132,042 entities, 683,002 relations, and 683,002 records.

procedureKG-IRAG evaluation comparisons are conducted by feeding standard data into Large Language Models in three formats: raw data (data frame), context-enhanced data, and Knowledge Graph (KG) triplet representations.

referenceThe KG-IRAG paper includes a case study simulating an initial travel plan to the Sydney Opera House, which involves defining whether the initial plan is valid and determining an optimal time to adjust the trip.

claimIn baseline RAG systems, hallucinations often lead to the generation of wrong answers due to the use of insufficient data, which is considered more harmful than the extra data retrieval observed in KG-IRAG.

procedureThe answer generation process in KG-IRAG consists of: (1) combining triplets retrieved from various iterations into a coherent RAG Data Prompt, and (2) processing the sufficient triplet evidence together with the initial reasoning prompt to generate the final answer.

referenceThe KG-IRAG evaluation uses three question types: Q1, which is a fundamental entity recognition and retrieval task; and Q2 and Q3, which introduce logical reasoning by incorporating time-dependent queries to test iterative reasoning over time.

measurementThe KG-IRAG system exhibits a higher tendency for hallucination when processing datasets containing many numerical values, such as the TrafficQA-TFNSW dataset.

claimIn the KG-IRAG knowledge graph construction, time, location, and event status (such as rainfall or traffic volume) are treated as key entities, with time specifically treated as an entity to facilitate retrieval and reasoning.

claimThe Standard Graph-RAG, KG-RAG, and KG-IRAG systems perform similarly on Question 1 because Large Language Models can reliably identify entity names and times in queries to retrieve correct data through a single Question-Answering interaction without requiring step-by-step reasoning.

procedureThe iterative reasoning process for temporal queries in KG-IRAG functions as follows: (1) after each retrieval of triplets, LLM2 evaluates if the current set of triplets combined with the reasoning prompt from LLM1 is sufficient to answer the query, (2) if LLM2 determines the data is insufficient, the system shifts to a different time or location to refine search parameters, and (3) this process repeats until LLM2 confirms the query can be resolved.

claimThe KG-IRAG framework includes a case study in Appendix A to explain the details of the 'iteration' process.

claimThe KG-IRAG (Knowledge Graph-Based Iterative Retrieval-Augmented Generation) system outperforms standard Graph-RAG and KG-RAG methods by using reasoning prompts to guide data retrieval based on whether current data is sufficient to generate an answer and identifying which time and entity to explore next based on abnormal events.

procedureThe KG-IRAG framework utilizes two LLMs, LLM1 and LLM2, to perform iterative retrieval-augmented generation. LLM1 identifies the initial plan (start time and location) and generates a reasoning prompt explaining the information required to answer the query. LLM2 evaluates whether the retrieved data, combined with the reasoning prompt, is sufficient to resolve the query or if further retrieval steps are required.

claimThe KG-IRAG design for questions Q2 and Q3 utilizes dynamic problem decomposition, requiring Large Language Models (LLMs) to perform time-based reasoning and handle temporal logic beyond standard entity recognition.

procedureIn the second stage of experiments, the KG-IRAG framework is compared against Graph-RAG and KG-RAG (Sanmartin, 2024) by evaluating generated answers against true answers using exact match, F1 Score, and Hit Rate metrics, while hallucinations are judged based on the answers generated by the LLMs under each framework.

procedureThe first round of experiments in the KG-IRAG study tests four LLMs on three QA datasets (weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW) using three data formats: raw data (table) format, text data (transferred into text descriptions), and triplet format (KG structure). To minimize irrelevant information, input prompts are restricted to questions and the least amount of necessary data, with final answers compared against correct answers using exact match (EM) values.

procedureIn the KG-IRAG experimental setup, all datasets are converted into Knowledge Graphs (KGs) that capture location relationships and temporal records, with time treated as an entity to enhance retrieval capabilities.

claimThe KG-IRAG (Knowledge Graph-Based Iterative Retrieval-Augmented Generation) framework operates by transforming a larger temporal question into multiple fixed-time subproblems, which are solved via entity and temporal information extraction.

referenceThe study evaluated three Retrieval-Augmented Generation (RAG) systems: 1) a Standard Graph-RAG system where LLMs determine necessary data, retrieve it, and provide an answer; 2) KG-RAG (Sanmartin, 2024), which uses a Chain of Exploration to retrieve data step-by-step over three steps; and 3) the proposed KG-IRAG system.

procedureThe KG-IRAG system evaluates weather-related queries by iteratively retrieving data and using an LLM to judge the presence of abnormal events like rain within specific time slots.

KG-IRAG with Iterative Knowledge Retrieval - arXiv arxiv.org arXiv Mar 18, 2025 4 facts

procedureKG-IRAG incrementally gathers relevant data from external Knowledge Graphs through iterative retrieval steps, enabling step-by-step reasoning.

claimKG-IRAG is designed for scenarios requiring reasoning alongside dynamic temporal data extraction, such as determining optimal travel times based on weather conditions or traffic patterns.

claimThe paper introduces three new datasets, weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW, to evaluate the performance of KG-IRAG.

claimExperimental results indicate that KG-IRAG improves accuracy in complex reasoning tasks by integrating external knowledge with iterative, logic-based retrieval.

Large Language Models Meet Knowledge Graphs for Question ... arxiv.org arXiv Sep 22, 2025 3 facts

referenceKG-IRAG (Yang et al., 2025) enables large language models to incrementally retrieve knowledge and evaluate its sufficiency to answer time-sensitive and event-based queries involving temporal dependencies.

referenceRuiyi Yang et al. (2025) proposed KG-IRAG, a knowledge graph-based iterative retrieval-augmented generation framework designed for temporal reasoning.

referenceKG-IRAG, as described by Yang et al. (2025), utilizes incremental retrieval and iterative reasoning with Llama-3-8B-Instruct, GPT-3.5-Turbo, GPT-4o-mini, and GPT-4o models on self-constructed knowledge graphs for temporal QA tasks.

Unknown source 1 fact

claimKG-IRAG is a Knowledge Graph-Based Iterative Retrieval-Augmented Generation framework designed for temporal reasoning.