Visual Question Answering
Facts (14)
Sources
A Comprehensive Review of Neuro-symbolic AI for Robustness ... link.springer.com Dec 9, 2025 6 facts
referenceKabir et al. (2024) published a comprehensive survey on datasets and algorithms for visual question answering.
claimNeural Logic Machines (NLM) and Neuro-Symbolic Concept Learners (NSCL) demonstrate the potential of generalized reasoning over structured and unstructured data, with applications ranging from array sorting to visual question answering.
referenceEiter, T., Higuera, N., Oetsch, J., and Pritz, M. developed a neuro-symbolic answer set programming (ASP) pipeline specifically for visual question answering, as published in Theory and Practice of Logic Programming in 2022.
claimNeurASP has been applied to visual question answering, where the neural component interprets an image and the symbolic component ensures the answer satisfies logical constraints, enabling the system to express uncertainty if multiple stable models exist.
claimNeuro-symbolic methods unify sensory data with linguistic and logical processing in applications including spatial-temporal analysis, mental image generation, and multimodal tasks like visual question answering (VQA).
claimThe neuro-symbolic AI community is developing challenge tasks to address evaluation gaps, including systematic generalization tests, visual question answering, and the calibration of concepts and operations.
Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org Jun 14, 2024 2 facts
claimThe hallucination detection instruction pair data used for training MediHallDetector is divided into two parts: instructions for detecting hallucinations in Visual Question Answering (VQA) tasks and instructions for detecting hallucinations in Image Report Generation (IRG) tasks.
procedureIn fine-grained single-dimension Visual Question Answering (VQA) scenarios, each Large Vision-Language Model (LVLM) response is labeled with a single hallucination category.
Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org Jul 9, 2024 2 facts
claimHybrid approaches combining LLMs and Knowledge Graphs demonstrate improved performance on tasks requiring semantic understanding, such as entity typing and visual question answering.
referenceKenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, and Marcus Rohrbach developed 'KRISP', a method for integrating implicit and symbolic knowledge for open-domain knowledge-based visual question answering.
Large Language Models Meet Knowledge Graphs for Question ... arxiv.org Sep 22, 2025 2 facts
referenceThe MMJG method, introduced by Wang et al. (2022), enhances visual Question Answering by using adaptive knowledge selection to jointly select knowledge from visual and text sources based on knowledge-aware attention and multi-modal guidance.
claimMulti-modal Question Answering (QA) involves answering questions over multi-modal data, with visual Question Answering (VQA) serving as a typical example.
LLM-KG4QA: Large Language Models and Knowledge Graphs for ... github.com 1 fact
referenceResearch on integrating Large Language Models with Knowledge Graphs is categorized into several distinct approaches: Pre-training, Fine-Tuning, KG-Augmented Prompting, Retrieval-Augmented Generation (RAG), Graph RAG, KG RAG, Hybrid RAG, Spatial RAG, Offline/Online KG Guidelines, Agent-based KG Guidelines, KG-Driven Filtering and Validation, Visual Question Answering (VQA), Multi-Document QA, Multi-Hop QA, Conversational QA, Temporal QA, Multilingual QA, Index-based Optimization, and Natural Language to Graph Query Language (NL2GQL).
Neuro-Symbolic AI: Explainability, Challenges, and Future Trends arxiv.org Nov 7, 2024 1 fact
referenceNeuro-symbolic AI research is categorized into several domains: mathematics and symbolic regression (e.g., Majumdar et al., 2023; Petersen et al., 2019), logic and knowledge processing including concept/rule learning (e.g., Aspis et al., 2022) and logical reasoning (e.g., Cunnington et al., 2023), and applications such as visual question answering (e.g., Mao et al., 2019), medical (e.g., Jain et al., 2023), communication (e.g., Thomas and Saad, 2023), programming (e.g., Hu et al., 2022a), recommendation systems (e.g., Carraro, 2023), and security (e.g., Wang et al., 2018).