multimodal knowledge graphs
Also known as: multi-modal knowledge graph, multimodal knowledge graphs, multi-modal knowledge graphs, multimodal knowledge graph, Multimodal Knowledge Graph construction
Facts (19)
Sources
Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org 7 facts
referenceHuang et al. (2022) developed a method for endowing language models with multimodal knowledge graph representations.
referenceEarly attempts at constructing multimodal knowledge graphs face challenges in modality alignment, semantic consistency, and large-scale deployment, as noted by Chen et al. (2023).
referenceThe paper 'Continual multimodal knowledge graph construction' (arXiv:2305.08698) addresses the challenges of constructing multimodal knowledge graphs in a continual learning setting.
claimMulti-modal knowledge graphs incorporate diverse data types, including text, images, and videos, to provide a holistic understanding of knowledge across multiple forms of media.
referenceJ. Lee, Y. Wang, J. Li, and M. Zhang published 'Multimodal reasoning with multimodal knowledge graph' as an arXiv preprint in 2024.
referenceMRMKG (Lee et al., 2024) uses Relational Graph Attention Networks (RGAT) to encode multimodal knowledge graphs and includes a cross-modal module for image-text alignment.
claimThe integration of multimodal knowledge graphs and language models aims to build intelligent systems capable of understanding and reasoning across text, images, audio, and sensor data.
LLM-empowered knowledge graph construction: A survey - arXiv arxiv.org Oct 23, 2025 4 facts
claimMultimodal Knowledge Graph construction aims to integrate heterogeneous modalities, including text, images, audio, and video, into unified, structured representations to enable richer reasoning and cross-modal alignment.
claimKey challenges in the construction of multimodal knowledge graphs include modality heterogeneity, alignment noise, scalability, and robustness under missing or imbalanced modalities.
referenceJunming Liu, Siyuan Meng, Yanting Gao, Song Mao, Pinlong Cai, Guohang Yan, Yirong Chen, Zilin Bian, Ding Wang, and Botian Shi authored the paper 'Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning,' which was published as an arXiv preprint in July 2025.
referenceThe VaLiK (Vision-align-to-Language integrated KG) framework, proposed by Liu et al. in 2025, cascades pretrained Vision-Language Models to translate visual features into textual form and uses a cross-modal verification module to filter noise and assemble multimodal knowledge graphs without manual annotation.
Knowledge Graphs: Opportunities and Challenges - Springer Nature link.springer.com Apr 3, 2023 3 facts
claimThe construction of multi-modal knowledge graphs is complicated and inefficient because it requires the exploration of entities across different modalities, such as texts and images.
claimMulti-modal knowledge fusion aims to find equivalent entities by integrating their multi-modal features to generate a multi-modal knowledge graph.
referenceWang et al. (2020e) developed a knowledge discovery framework called COVID-KG to generate COVID-19-related drug repurposing reports by constructing multimedia knowledge graphs from images and texts.
Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org 2 facts
Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org Jul 9, 2024 2 facts
claimFuture research could explore the potential use of multimodal knowledge graphs when combined with Large Language Models to advance the field of multimodal models.
referenceThe research paper titled 'Knowphish: Large language models meet multimodal knowledge graphs for enhancing reference-based phishing detection' was authored by Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Bryan Hooi, and Hoon Wei Lim in 2024 (arXiv:2403.02253).
The construction and refined extraction techniques of knowledge ... nature.com Feb 10, 2026 1 fact
claimThe current implementation of the knowledge graph framework is text-centric and does not yet constitute a fully multimodal knowledge graph, despite being designed to support multimodal fusion.