concept

Question Answering

Also known as: QA

synthesized from dimensions

Question Answering (QA) is a foundational capability within artificial intelligence and natural language processing, serving as a primary mechanism for information retrieval, fact-checking, and complex reasoning fundamental component in. Its significance is underscored by its prevalence in machine learning development, where it accounts for approximately 40% of the training corpora in many modern systems dominant training task.

While Large Language Models (LLMs) have become the primary engines for contemporary QA, they frequently encounter limitations in reasoning, domain-specific accuracy, and factual consistency, often manifesting as hallucinations LLM task limitations. Traditional neural models, such as Word2Vec and GloVe, are similarly constrained by their inability to perform structured, multi-hop reasoning traditional model limitations. To overcome these hurdles, the field has shifted toward the synthesis of LLMs with Knowledge Graphs (KGs). This integration provides a structured semantic foundation that acts as a source of background knowledge, a reasoning guide, and a validator to improve factual correctness, explainability, and interpretability Knowledge Graphs serve as, KG-LLM benefits.

Methodologies for implementing these hybrid systems are diverse, ranging from Retrieval-Augmented Generation (RAG) to specialized graph-enhanced architectures. Notable frameworks include KG-RAG KG-RAG integration, GraphRAG Microsoft’s GraphRAG implementation, and reasoning-focused approaches like PoG PoG reasoning, KG-CoT KG-CoT reasoning, and QA-GNN QA-GNN by Yasunaga et al.. These systems leverage techniques such as graph augmentation, pruning, and multi-hop reasoning to process complex, domain-specific queries that standard models struggle to resolve Graph reasoning enhanced language, graph augmentation and pruning.

Evaluation remains a critical and evolving challenge in QA. Researchers have moved beyond simple overlap-based metrics like ROUGE, which are criticized for potentially overestimating performance and creating "illusory progress" hallucination detection bias. Modern evaluation frameworks now categorize performance into three distinct domains: Answer Quality, Retrieval Quality, and Reasoning Quality QA evaluation taxonomy. A wide array of benchmarks is used to assess these qualities across different contexts, including Natural Questions Natural Questions by Kwiatkowski et al., HotpotQA designed for diverse, explainable, MedMCQA medical-choice question-answering dataset, and HaluEval for hallucination detection QA benchmarks.

Despite significant progress, the field continues to grapple with the scalability of reasoning, the necessity for dynamic knowledge integration, and the lack of standardized evaluation metrics across disparate research implementations Remaining challenges in the. Future development is expected to focus on bridging these gaps to ensure that QA systems remain reliable, interpretable, and effective in high-stakes environments like scientific discovery and medical diagnostics.

Model Perspectives (3)

openrouter/google/gemini-3.1-flash-lite-preview definitive 100% confidence

Question Answering (QA) is a foundational task in artificial intelligence, currently accounting for 40% of training corpora in some systems dominant training task. While Large Language Models (LLMs) are used for QA, they often face significant challenges, including limited reasoning, hallucinations, and a lack of domain-specific knowledge LLM task limitations. Traditional neural models, such as Word2Vec and GloVe, are similarly limited by their inability to perform structured reasoning traditional model limitations. To address these deficiencies, current research focuses on synthesizing LLMs with Knowledge Graphs (KGs). This integration provides LLMs with structured real-world facts and reliable reasoning paths, which improves factual correctness, explainability, and interpretability KG-LLM benefits. Various frameworks have been developed to implement this, such as KG-RAG KG-RAG integration, PoG PoG reasoning, and KG-CoT KG-CoT reasoning. These approaches vary in their methodologies, ranging from Retrieval-Augmented Generation (RAG) to graph-neural-network-enhanced retrieval graph-enhanced retrieval. Evaluation of QA systems has become increasingly complex, with researchers highlighting that standard metrics like ROUGE and certain overlap-based methods may systematically overestimate performance, leading to 'illusory progress' hallucination detection bias. Consequently, the field has adopted diverse benchmarks—such as SimpleQuestions, FreebaseQA, Natural Questions, and MedHALT—to standardize assessment across common sense, medical, and large-scale structured data tasks QA benchmarks. Modern evaluation frameworks often categorize metrics into three specific domains: Answer Quality, Retrieval Quality, and Reasoning Quality QA evaluation taxonomy.

openrouter/google/gemini-3.1-flash-lite-preview definitive 95% confidence

Question Answering (QA) is a foundational capability within artificial intelligence, natural language processing, and information retrieval, supporting diverse applications ranging from chatbots and fact-checking to tactical planning and scientific discovery fundamental component in. While Large Language Models (LLMs) like GPT-3 provide context-dependent answering capabilities context-dependent question-answering capabilities, they often struggle with complex, domain-specific, or ambiguous queries traditional methods for medical. To address these limitations, recent research emphasizes the synthesis of LLMs with Knowledge Graphs (KGs) comprehensive taxonomy of research. This integration utilizes KGs as a structured semantic foundation that acts as a source of background knowledge, a reasoning guide, and a refiner or validator to mitigate hallucinations and improve factual consistency Knowledge Graphs serve as, Knowledge Graphs can act. Frameworks such as Retrieval-Augmented Generation (RAG) and its graph-enhanced variants (GraphRAG, KG-RAG) have become central to this evolution Retrieval Augmented Generation (RAG), Microsoft’s GraphRAG implementation. Methodologies for enhancing QA performance include: - Graph-Based Reasoning: Techniques like GreaseLM Graph reasoning enhanced language, KGQA integrates Chain-of-Thought (CoT), and GrapeQA graph augmentation and pruning leverage structured data to boost multi-hop reasoning. - Architectural Enhancements: The use of encoder-decoder models (e.g., T5, BART) Encoder-decoder architectures, such and memory networks large-scale simple question support complex sequence processing. - Evaluation Benchmarks: Researchers utilize specialized datasets like MedMCQA medical-choice question-answering dataset, HotpotQA designed for diverse, explainable, and WebQuestionsSP evaluating question answering over to assess system performance. Despite these advances, challenges persist regarding the scalability of reasoning, dynamic knowledge integration, and the lack of standardized evaluation metrics across different research implementations Remaining challenges in the, underemphasizes quantitative and experimental.

openrouter/x-ai/grok-4.1-fast 92% confidence

Question Answering (QA) is a core AI task heavily emphasized in training systems, comprising 40% of the training corpus according to Nature measurements QA dominant in training. It features numerous benchmarks for evaluation, including Natural Questions by Kwiatkowski et al., CommonsenseQA by Talmor et al., SimpleQuestions for KG QA, and FreebaseQA using Freebase. Advances often integrate Knowledge Graphs (KGs) with Large Language Models (LLMs), as in OntoRAG by Tiwari et al., QA-GNN by Yasunaga et al., KG-RAG for FMEA, and SPOKE KG-RAG by Soman et al.. Domain-specific methods include medical QA via MEG by Cabello et al. and MedDialogRubrics benchmark, plus RAG enhancements like Generate-on-Graph by Xu et al.. Procedures generate Q&A pairs from knowledge sources with expert review high-quality Q&A generation, while benchmarks like HaluEval and RAGTruth assess QA hallucinations.

Facts (110)

Sources

Large Language Models Meet Knowledge Graphs for Question ... arxiv.org arXiv Sep 22, 2025 30 facts

claimLarge Language Models (LLMs) struggle with complex question-answering tasks due to limited reasoning capability, lack of up-to-date or domain-specific knowledge, and a tendency to generate hallucinated content.

referenceThe KG-CoT method (Zhao et al., 2024b) leverages external knowledge graphs to generate reasoning paths for joint reasoning of Large Language Models and knowledge graphs to enhance reasoning capabilities for question answering.

claimThe survey titled 'Large Language Models Meet Knowledge Graphs for Question Answering' introduces a structured taxonomy that categorizes state-of-the-art works on synthesizing Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA).

referenceSequeda et al. (2024) published 'A benchmark to understand the role of knowledge graphs on large language model’s accuracy for question answering on enterprise SQL databases' in GRADES-NDA@SIGMOD/PODS, pages 1–12, which evaluates LLM accuracy on enterprise SQL databases using knowledge graphs.

referenceThe paper 'Large Language Models Meet Knowledge Graphs for Question Answering' provides details on evaluation metrics, benchmark datasets, and industrial and scientific applications for synthesizing Large Language Models and Knowledge Graphs for Question Answering.

claimKnowledge Graphs can serve as reasoning guidelines for LLMs in Question Answering tasks by providing structured real-world facts and reliable reasoning paths, which improves the explainability of generated answers.

referencePG-RAG (Liang et al., 2024b) proposes dynamic and adaptable knowledge retrieval indexes based on Large Language Models to handle complex queries and improve the performance of Retrieval-Augmented Generation (RAG) systems in Question Answering tasks.

referenceLi et al. (2025b) introduced a graph neural network-enhanced retrieval method for question answering in large language models, published in NAACL (pages 6612–6633).

referenceChristmann and Weikum (2024) proposed a method for RAG-based question answering over heterogeneous data and text, as detailed in their paper 'RAG-based question answering over heterogeneous data and text' (arXiv:2412.07420).

claimPoG (Chen et al., 2024a) integrates reflection and self-correction mechanisms to adaptively explore reasoning paths over a knowledge graph via an LLM agent, augmenting the LLM in complex reasoning and question answering.

claimThe evaluation metrics for synthesizing Large Language Models (LLMs) with Knowledge Graphs (KGs) for Question Answering (QA) are categorized into three types: Answer Quality (AnsQ), Retrieval Quality (RetQ), and Reasoning Quality (ReaQ).

claimLeveraging Knowledge Graphs to augment Large Language Models can help overcome challenges such as hallucinations, limited reasoning capabilities, and knowledge conflicts in complex Question Answering scenarios.

referenceTalmor et al. (2019) introduced 'CommonsenseQA', a question answering challenge specifically targeting commonsense knowledge.

referenceSPOKE KG-RAG (Soman et al., 2024) implements a token-based optimized Knowledge Graph Retrieval-Augmented Generation framework that integrates explicit and implicit knowledge from Knowledge Graphs to enable cost-effective Question Answering.

claimThe survey on Large Language Models and Knowledge Graphs for Question Answering highlights alignments between recent methodologies and the challenges of complex question-answering tasks, while noting that taxonomies from different perspectives are non-exclusive and may overlap.

referenceKG-Rank, proposed by Yang et al. (2024), uses re-ranking techniques based on relevance and redundancy scores to rank top triples from Knowledge Graphs, which are then combined with prompts to generate answers for Question Answering tasks.

claimXinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, and Jianhua Tao published the paper 'KS-LLM: Knowledge selection of large language models with evidence document for question answering' in 2024.

claimXikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, and Jure Leskovec published the paper 'GreaseLM: Graph reasoning enhanced language models for question answering' in 2021.

claimQuestion answering (QA) is a fundamental component in artificial intelligence, natural language processing, information retrieval, and data management, with applications including text generation, chatbots, dialog generation, web search, entity linking, natural language query, and fact-checking.

claimXiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu (2025) identify that previous surveys on synthesizing Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA) have limitations in scope and task coverage, specifically noting that existing surveys focus on general knowledge-intensive tasks like extraction and construction, limit QA tasks to closed-domain scenarios, and approach the integration of LLMs, KGs, and search engines primarily from a user-centric perspective.

claimHybrid methods for synthesizing LLMs and Knowledge Graphs for Question Answering utilize multiple roles for the Knowledge Graph, including background knowledge, reasoning guidelines, and refiner/validator.

referenceQUASAR, proposed by Christmann and Weikum (2024), enhances RAG-based Question Answering by integrating unstructured text, structured tables, and Knowledge Graphs, while re-ranking and filtering relevant evidence.

referenceKGQA (Ji et al., 2024) integrates Chain-of-Thought (CoT) prompting with graph retrieval to enhance retrieval quality and multi-hop reasoning capabilities of Large Language Models in Question Answering tasks.

referenceMichael Zhang and Eunsol Choi (2021) introduced SituatedQA, a method for incorporating extra-linguistic contexts into question answering.

referenceMa et al. (2025a) published 'Unifying large language models and knowledge graphs for question answering: Recent advances and opportunities' in EDBT, pages 1174–1177, which reviews the integration of LLMs and knowledge graphs for question answering.

referenceLinders and Tomczak (2025) proposed a knowledge graph-extended retrieval augmented generation method for question answering (arXiv:2504.08893).

claimRemaining challenges in the synthesis of Large Language Models and Knowledge Graphs include efficient knowledge retrieval, dynamic knowledge integration, effective reasoning over knowledge at scale, and explainable and fairness-aware Question Answering.

claimThe survey on Large Language Models and Knowledge Graphs for Question Answering underemphasizes quantitative and experimental evaluation of different methodologies due to variations in implementation details, the diversity of benchmark datasets, and non-standardized evaluation metrics.

procedureXiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu (2025) conducted a literature review by retrieving research papers published since 2021 using Google Scholar and PaSa, utilizing search phrases such as 'knowledge graph and language model for question answering' and 'KG and LLM for QA', while extending the search scope for benchmark dataset papers to 2016.

claimKnowledge Graphs can act as refiners and validators for LLMs in Question Answering tasks, allowing LLMs to verify initial answers against factual knowledge and filter out inaccurate responses.

A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Springer Nov 4, 2024 13 facts

claimBenchmarks like SimpleQuestions and FreebaseQA provide standardized datasets and evaluation metrics for consistent and comparative assessment of LLMs integrated with knowledge graphs, covering tasks such as natural language understanding, question answering, commonsense reasoning, and knowledge graph completion.

referenceTalmor A, Herzig J, Lourie N, and Berant J authored 'Commonsenseqa: a question answering challenge targeting commonsense knowledge', published as an arXiv preprint in 2018 (arXiv:1811.00937).

referenceZhang Y, Dai H, Kozareva Z, Smola A, and Song L published 'Variational reasoning for question answering with knowledge graph' in the Proceedings of the AAAI Conference on Artificial Intelligence in 2018.

referenceThe SimpleQuestions benchmark evaluates simple question answering over knowledge graphs by testing the ability of models to answer straightforward, single-hop questions, providing a measure of basic query handling capabilities.

referenceZhang, Dai, Kozareva, Smola, and Song authored 'Variational reasoning for question answering with knowledge graph', published in the Proceedings of the AAAI Conference on Artificial Intelligence in 2018 (Volume 32, Issue 1).

referenceThe FreebaseQA benchmark evaluates question answering using the Freebase knowledge graph by testing the ability of models to answer questions through querying, providing a measure of their ability to handle large-scale structured data.

claimBERT utilizes deep contextual understanding for question answering and named entity recognition (NER) task completion.

claimLarge Language Models (LLMs) provide context-dependent question-answering capabilities suitable for virtual assistants and customer support.

claimLLMs facilitate KG-to-text generation and question-answering by generating human-like descriptions of facts stored within a knowledge graph.

referenceEncoder-decoder architectures, such as T5 or BART (Bidirectional and Auto-Regressive Transformers), use an encoder to create a context-rich representation of the input sequence, which the decoder then uses to generate an output sequence, making them flexible for tasks like translation, summarization, and question answering.

measurementOpenAI's GPT-3 model contains 175 billion parameters and is known for high-quality text generation, translation, question answering, and summarization.

claimWebQuestionsSP is a benchmark for evaluating question answering over knowledge graphs by testing a model's ability to answer questions by querying structured data.

claimEffective prompting techniques require clarity, conciseness, relevance, and specificity, and are applied in question-answering, content generation, and interactive dialogue systems to increase the usefulness of generated answers.

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 6 facts

claimLarge Language Models demonstrate utility in performing key tasks for Knowledge Graphs, such as KG embedding, completion, construction, and question answering, which enhances the overall quality and applicability of Knowledge Graphs.

referenceXu et al. (2024) introduced 'Generate-on-Graph', a method that treats large language models as both an agent and a knowledge graph for incomplete knowledge graph question answering.

referenceLukovnikov et al. (2019) investigated the use of pretrained transformers for simple question answering over knowledge graphs in a paper presented at the 18th International Semantic Web Conference in Auckland, New Zealand.

referenceCabello et al. (2024) introduced MEG, a framework for medical knowledge-augmented large language models for question answering, in arXiv preprint arXiv:2411.03883.

referenceThe paper 'Greaselm: graph reasoning enhanced language models for question answering' was published as an arXiv preprint in 2022.

claimKnowledge Graphs support applications such as question answering, recommendation systems, and web search by linking entities and relationships in a structured format.

The construction and refined extraction techniques of knowledge ... nature.com Nature Feb 10, 2026 6 facts

measurementThe question answering task accounts for 40% of the training corpus, making it the dominant task in the training system.

referenceSingh, K. et al. published 'No one is perfect: analysing the performance of question answering components over the dbpedia knowledge graph' in J. Web Semant. 65, 100594 (2020).

procedureThe question answering task generates high-quality Q&A pairs by utilizing knowledge sources such as operational orders, equipment technical white papers, and historical campaign reviews, combined with manual expert reviews and automated knowledge extraction.

claimCore features of the question answering task include authoritative clause references and multi-condition applicable rule descriptions.

claimThe task branch set module supports five core tasks: tactical planning, threat assessment, equipment configuration decision-making, instruction parsing, and question answering, by dividing input-output structures based on functional task types.

claimThe global situational module provides a description of battlefield environment features and force composition, while the task branch module defines input constraints and output specifications for tasks like question answering and tactical planning.

Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org arXiv Jun 29, 2025 5 facts

referenceXiaofeng Huang, Jixin Zhang, Zisang Xu, Lu Ou, and Jianbin Tong published 'A knowledge graph based question answering method for medical domain' in 2021.

referenceKG-Rank (Huang et al., 2021) and KG-RAG (Sanmartin, 2024) harness ontologies to re-rank evidence or enforce logical constraints, which improves factual consistency in long-form Question Answering (QA).

claimTraditional methods for medical or scientific Question Answering (QA) often fail because they are unable to capture intricate domain-specific relationships or handle ambiguous queries.

referenceThe MedMCQA dataset is a multiple-choice question-answering dataset tailored for medical QA that offers a broad variety of question types, encompassing both foundational and clinical knowledge across diverse medical specialties.

claimRetrieval Augmented Generation (RAG) is a framework designed to enhance Question Answering (QA) by integrating relevant external knowledge into the generation process.

LLM-KG4QA: Large Language Models and Knowledge Graphs for ... github.com GitHub 5 facts

referenceThe paper 'XplainLLM: A Knowledge-Augmented Dataset for Reliable Grounded Explanations in LLMs' published in EMNLP in 2024 introduces the XplainLLM dataset for LLM and Knowledge Graph integration in question answering.

referenceThe Docugami Knowledge Graph Retrieval Augmented Generation (KG-RAG) datasets were released in 2023 for LLM and Knowledge Graph integration in question answering.

referenceThe paper 'Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities' by Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, and Haofen Wang (2025) provides a comprehensive taxonomy of research integrating Large Language Models (LLMs) and Knowledge Graphs (KGs) for question answering.

referenceThe paper 'A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases' published in GRADES-NDA in 2024 introduces the ChatData benchmark for LLM and Knowledge Graph integration in question answering.

referenceThe paper 'Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering' published in SEMANTICS in 2023 introduces the LLM-KG-Bench benchmark for LLM and Knowledge Graph integration in question answering.

A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org arXiv Feb 23, 2026 5 facts

referenceThe paper 'Natural questions: a benchmark for question answering research' introduces the Natural Questions dataset for evaluating question answering systems.

referenceThe paper 'RealTime qa: what’s the answer right now?' introduces a benchmark for real-time question answering.

referenceThe paper 'Dynamic-kgqa: a scalable framework for generating adaptive question answering datasets' introduces a framework for creating adaptive datasets for question answering.

referenceThe paper 'Large-scale simple question answering with memory networks' discusses memory networks for large-scale question answering tasks.

referenceHotpotQA is a dataset designed for diverse, explainable multi-hop question answering.

Unknown source 4 facts

claimThe authors introduce the KG-enhanced RAG (KG-RAG) framework, which is designed for analytical and semantic question answering on Failure Mode and Effects Analysis (FMEA) data.

accountThe authors of the LinkedIn article 'Enhancing LLMs with Knowledge Graphs: A Case Study' established a pipeline for question-answering and response validation.

claimThe KG-RAG framework integrates knowledge graphs to enable question answering (QA) on Failure Mode and Effects Analysis (FMEA) data.

claimRetrieval-Augmented Generation (RAG) is well-suited for use cases that require knowledge-intensive question answering, code documentation, and engineering tasks.

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv Aug 13, 2025 4 facts

claimThe authors of the paper 'Re-evaluating Hallucination Detection in LLMs' demonstrate that prevailing overlap-based metrics systematically overestimate hallucination detection performance in Question Answering tasks, which leads to illusory progress in the field.

referenceKwiatkowski et al. (2019) developed 'Natural Questions', a benchmark designed for question answering research.

claimLLM-as-Judge methods offer a more reliable alternative for factual evaluation in question-answering tasks because they show strong agreement with human judgments.

claimReference-based metrics like ROUGE show a clear misalignment with human judgments when identifying hallucinations in Question Answering tasks, as they consistently reward fluent yet factually incorrect responses.

Knowledge Graph Combined with Retrieval-Augmented Generation ... drpress.org Academic Journal of Science and Technology Dec 2, 2025 4 facts

referenceYasunaga et al. introduced QA-GNN, a method for reasoning with language models and knowledge graphs for question answering, in an arXiv preprint in 2021.

referenceThe paper 'Graph reasoning for question answering with triplet retrieval' by Li S, Gao Y, Jiang H, et al. was published as an arXiv preprint (arXiv: 2305.18742) in 2023.

referenceHe et al. introduced G-retriever, a retrieval-augmented generation framework for textual graph understanding and question answering, in an arXiv preprint in 2024.

referenceTaunk et al. introduced GrapeQA, a method for graph augmentation and pruning to enhance question-answering, presented at the Companion Proceedings of the ACM Web Conference 2023.

Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org arXiv 3 facts

claimCombining knowledge graphs with Large Language Models (LLMs) like ChatGPT improves factual correctness and explanations in question-answering, thereby promoting the quality and interpretability of AI decision-making.

referenceR. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin published 'Knowledge base completion via search-based question answering' in the proceedings of the 23rd International World Wide Web Conference (WWW '14) in 2014.

claimThe SAGA system supports live graph curation through a human-in-the-loop approach and powers question answering, entity summarization, and text annotation (NER) services.

KG-IRAG: A Knowledge Graph-Based Iterative Retrieval-Augmented ... arxiv.org arXiv Mar 18, 2025 3 facts

referencePeiyun Wu, Xiaowang Zhang, and Zhiyong Feng authored the paper 'A survey of question answering over knowledge base', published in the proceedings of the 4th China Conference, CCKS 2019, in Hangzhou, China, August 24–27, 2019.

claimStandard evaluation metrics for Question Answering (QA) systems include Exact Match (EM), F1 Score, and Hit Rate (HR).

referenceThe research paper 'Factify5wqa: Fact verification through 5w question-answering' is available as arXiv preprint arXiv:2410.04236.

LLM-empowered knowledge graph construction: A survey - arXiv arxiv.org arXiv Oct 23, 2025 2 facts

referenceYash Tiwari, Owais Ahmad Lone, and Mayukha Pal proposed OntoRAG, a system that enhances question-answering by automating ontology derivation from unstructured knowledge bases, as detailed in their 2025 arXiv preprint.

claimKnowledge Graphs serve as a fundamental infrastructure for structured knowledge representation and reasoning, providing a unified semantic foundation for applications such as semantic search, question answering, and scientific discovery.

Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org arXiv Feb 16, 2025 1 fact

referenceKomal Gupta, Tirthankar Ghosal, and Asif Ekbal authored 'A neuro-symbolic approach for question answering on research articles', published in the Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation in 2021.

A Comprehensive Review of Neuro-symbolic AI for Robustness ... link.springer.com Springer Dec 9, 2025 1 fact

claimTraditional neural models such as Word2Vec and GloVe are limited in their interpretability and generalization for question answering because they lack the ability to perform structured reasoning.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 1 fact

claimSummary consistency verification methods evaluate whether a generated summary faithfully reflects the source content and are divided into question-answering (QA)-based and entailment-based approaches.

Knowledge Graph-RAG: Bridging the Gap Between LLMs ... - Medium medium.com Medium Apr 25, 2025 1 fact

claimKG-RAG is an AI technique that enhances Large Language Models for Question Answering by integrating Knowledge Graphs without requiring additional training.

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv Jan 6, 2026 1 fact

referenceMedDialogRubrics is a benchmark and evaluation framework designed to assess the multi-turn inquiry abilities of medical Large Language Models (LLMs) by focusing on fine-grained, human-aligned evaluation of the diagnostic process rather than just single-turn QA or final diagnosis accuracy.

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS 1 fact

claimThe Shapley-value Guided Rationale Editor (SHARE) is adaptable for tasks including sentiment analysis, claim verification, and question answering, and can integrate with various language models.

[PDF] Large Language Models Meet Knowledge Graphs for Question ... aclanthology.org ACL Anthology Nov 4, 2025 1 fact

claimGraphRAG and KG-RAG based question answering approaches incorporate modules including knowledge integration, knowledge fusion, and reasoning guidelines.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact

claimROUGE-based evaluation systematically overestimates hallucination detection performance in Question Answering tasks.

The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Hugging Face Jan 29, 2024 1 fact

measurementHaluEval includes 5,000 general user queries with ChatGPT responses and 30,000 task-specific examples across three tasks: question answering (HaluEval QA), knowledge-grounded dialogue (HaluEval Dialogue), and summarisation (HaluEval Summarisation).

Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org arXiv Feb 22, 2023 1 fact

claimKnowledge graphs are increasingly central to applications such as recommender systems and question answering, creating a growing need for generalized pipelines to construct and continuously update them.

Knowledge Graphs: Opportunities and Challenges - Springer Nature link.springer.com Springer Apr 3, 2023 1 fact

claimKnowledge graphs are widely employed in AI systems such as recommender systems, question answering, and information retrieval, as well as in fields like education and medical care.

Neuro-Symbolic AI: Explainability, Challenges, and Future Trends arxiv.org arXiv Nov 7, 2024 1 fact

referenceHu et al. (2022b) proposed a method for empowering language models by integrating knowledge graph reasoning for question answering tasks.

A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature May 13, 2025 1 fact

claimThe MedHALT benchmark is limited to assessing the reasoning capabilities of Large Language Models over the medical domain in a Question Answering (QA) format.

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog Aug 25, 2025 1 fact

referenceRAGTruth is a human-labeled benchmark for hallucination detection that covers three tasks: question answering, summarization, and data-to-text translation.

Empowering GraphRAG with Knowledge Filtering and Integration arxiv.org arXiv Mar 18, 2025 1 fact

claimThe study compares QA performance between LLM-only models and LLM models using GraphRAG to determine the importance of retrieving external information.

KA-RAG: Integrating Knowledge Graphs and Agentic Retrieval ... semanticscholar.org Yuan Gao, Yuxuan Xu · Semantic Scholar 1 fact

claimKA-RAG is a course-oriented question answering (QA) framework that integrates a structured knowledge graph with agentic retrieval-augmented generation.

A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org arXiv 1 fact

claimSelf-reflection is defined as introspection focused on the self-concept and has been used to guide Large Language Model enhancements in hallucination mitigation, translation, question-answering, and math reasoning.

Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org arXiv Jul 9, 2024 1 fact

claimCurrent Large Language Models have a wide range of applications including question answering, code generation, text recognition, summarization, translation, and prediction.

Combining large language models with enterprise knowledge graphs frontiersin.org Frontiers Aug 26, 2024 1 fact

procedureRelation Extraction tasks are often rephrased as question-answering (Levy et al., 2017), which involves injecting latent knowledge contained in relation labels into prompt construction (Chen et al., 2022) and iteratively fine-tuning prompts to enhance the model's ability to focus on semantic cues (Son et al., 2022).

Efficient Knowledge Graph Construction and Retrieval from ... - arXiv arxiv.org arXiv Aug 7, 2025 1 fact

claimMicrosoft’s GraphRAG implementation improves question answering performance by constructing entity–relation graphs from retrieved passages and summarizing them into semantic communities.