concept

multi-modal question answering

Also known as: multi-modal QA

Facts (15)

Sources
Large Language Models Meet Knowledge Graphs for Question ... arxiv.org arXiv Sep 22, 2025 12 facts
claimApproaches using Knowledge Graphs as refiners and validators support multi-modal QA tasks.
referenceOMG-QA (Nan et al., 2024) is a multi-modal question-answering dataset that evaluates the retrieval and reasoning capabilities of question answering across different modalities.
claimHybrid methods for synthesizing LLMs and KGs support multi-doc, multi-modal, multi-hop, conversational, XQA, and temporal QA tasks.
referenceScienceQA (Lu et al., 2022) is a multi-modal question-answering dataset that supports multi-choice questions across various science disciplines.
claimThe combination of knowledge fusion, Retrieval-Augmented Generation (RAG), Chain-of-Thought (CoT) reasoning, and ranking-based refinement accelerates complex question decomposition for multi-hop Question Answering, enhances context understanding for conversational Question Answering, facilitates cross-modal interactions for multi-modal Question Answering, and improves the explainability of generated answers.
claimApproaches using Knowledge Graphs as reasoning guidelines support multi-doc, multi-modal, multi-hop, XQA, and temporal QA tasks.
claimMulti-modal Question Answering (QA) involves answering questions over multi-modal data, with visual Question Answering (VQA) serving as a typical example.
claimMulti-modal Question Answering (QA) involves performing question answering over data and knowledge that includes multiple modalities, such as text, audio, images, and video.
claimCross-modal reasoning facilitates interaction and alignment for multi-modal Question Answering.
claimApproaches using Knowledge Graphs as background knowledge support multi-doc, multi-modal, multi-hop, conversational, and XQA tasks.
referenceM3SciQA (Li et al., 2024a) is a multi-modal question-answering dataset that supports question answering over contexts spanning multiple documents.
claimJoint reasoning over factual knowledge graphs and LLMs can mitigate challenges related to knowledge retrieval, conflicts across modalities and knowledge sources, and complex reasoning in multi-document, multi-modal, and multi-hop question answering.
LLM-KG4QA: Large Language Models and Knowledge Graphs for ... github.com GitHub 3 facts
referenceThe paper 'M2QA: Multi-domain Multilingual Question Answering' published in EMNLP in 2024 introduces the M2QA dataset for multi-modal question answering.
referenceThe paper 'MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation' published on arXiv in 2025 introduces the MRAMG benchmark for multi-modal question answering.
referenceThe paper 'M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models' published in ACL Findings in 2024 introduces the M3SciQA benchmark for multi-modal question answering.