SAGA
Facts (18)
Sources
Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org 18 facts
claimSAGA's ingestion component requires predicate mappings from new data to the internal knowledge graph ontology, which are mostly manually defined and stored as supplementary configuration files.
claimThe SAGA internal data model extends standard RDF to capture one-hop relationships among entities, data provenance, and the trustworthiness of values.
claimThe pipeline or toolset implementations for NELL and AI-KG, as well as five of the seven toolsets analyzed (including Amazon's AutoKnow and Apple's SAGA), are closed-source.
referenceI.F. Ilyas et al. introduced 'Saga', a platform designed for the continuous construction and serving of knowledge at scale, in a 2022 arXiv preprint.
claimThe SAGA toolset allows data to be reprocessed with the HoloClean tool for the purpose of data repair.
claimSAGA is a closed-source toolset that supports multi-source data integration for both batch-like incremental knowledge graph construction and continuous knowledge graph updates.
claimDBpedia Live and SAGA are the only two projects that support the continuous consumption of event streams.
referenceThe SAGA system maintains a 'Live Graph' that continuously integrates streaming data and references stable entities from a batch-based Knowledge Graph, utilizing an inverted index and a key-value store for scalability and near-real-time query performance.
procedureThe SAGA system performs deduplication by grouping entities by type and using simple blocking to partition data into smaller buckets, followed by a matching model that computes similarity scores using machine-learning or rule-based methods, and finally utilizing correlation clustering to determine matching entities.
claimThe SAGA knowledge graph construction solution utilizes several truth discovery and source reliability-based fusion methods for entity fusion.
claimThe SAGA system tracks 'same-as' links to original source entities to support debugging processes.
claimThe DRKG, HKGB, and SAGA knowledge graph construction solutions use machine learning-based link prediction on graph embeddings to find further knowledge for knowledge completion.
claimThe SAGA knowledge graph construction solution attempts to automatically detect potential errors or vandalism, quarantining them for human curation where changes are applied directly to the live graph before being applied to the stable graph.
claimKnowledge graph pipelines that employ entity resolution often use sophisticated methods such as blocking to address scalability issues, as seen in ArtistKG and SAGA, or machine-learning-based matchers, as seen in SAGA.
procedureEntity fusion in the SAGA system involves harmonizing conflicting entity attribute values based on truth discovery methods and source reliability to create consistent entities.
claimThe SAGA knowledge graph construction system manages incremental integration by updating a stable knowledge graph in batches while simultaneously serving a live knowledge graph that prioritizes data freshness over certain quality assurance steps.
claimThe SAGA system supports live graph curation through a human-in-the-loop approach and powers question answering, entity summarization, and text annotation (NER) services.
procedureSAGA supports source change detection and delta computation using last snapshots, executing parallel batch jobs to integrate updated or new sources into the target graph based on detected changes.