reference
The VaLiK (Vision-align-to-Language integrated KG) framework, proposed by Liu et al. in 2025, cascades pretrained Vision-Language Models to translate visual features into textual form and uses a cross-modal verification module to filter noise and assemble multimodal knowledge graphs without manual annotation.
Authors
Sources
- LLM-empowered knowledge graph construction: A survey - arXiv arxiv.org via serper
Referenced by nodes (1)
- multimodal knowledge graphs concept