concept

distant supervision

Also known as: DS

Facts (21)

Sources

Combining large language models with enterprise knowledge graphs frontiersin.org Frontiers Aug 26, 2024 12 facts

procedureEarly distant supervision approaches to relation extraction use supervised methods to align positive and negative pair relations for pre-training language models, followed by few-shot learning to extract relations.

claimKnowledge Graph Embedding (KGE) relying solely on Distant Supervision (DS) is inadequate for predicting new types because weak annotations are limited to existing Knowledge Graph entities and relations.

claimWan et al. (2022) focused on mitigating distant supervision label noise and improving results in Named Entity Recognition.

procedureMulti-instance learning (MIL), as proposed by Zeng et al. (2015), is a method to address distant supervision noise in Relation Extraction (RE) that groups sentences into bags labeled as positive or negative with respect to a relation, shifting the task from single sentences to bags.

procedureThe proposed knowledge graph expansion solution for enterprises involves three main components: (1) creating customized datasets via distant supervision, (2) using lightweight supervised representation learning, and (3) integrating human feedback for high-quality updates.

claimDistant supervision (DS) is an automated data labeling technique that aligns knowledge bases with raw corpora to produce annotated data, used to address the lack of large annotated corpora for relation extraction and named entity recognition.

claimDistant supervision is useful when labeled data is scarce or expensive, but it can introduce incomplete and inaccurate labels into the training process.

procedureLiang et al. (2020) proposed a two-stage method for Named Entity Recognition using distant supervision: first, fine-tuning a Large Language Model (LLM) on distant supervision labels, followed by teacher-student system self-training using pseudo soft labels to improve performance.

referenceQian et al. (2020) proposed a method for disambiguating entity names using non-annotated examples, Distant Supervision (DS) to generate pseudo labels, and active learning to address deep learning model data requirements, which involves ranking predictions by model confidence and involving users in labeling top and bottom elements.

claimDistant supervision (DS) methods for Named Entity Recognition (NER) involve tagging text corpora using external knowledge sources such as dictionaries, knowledge bases, or knowledge graphs.

claimDistant Supervision (DS) principles struggle to accommodate the evolving nature of knowledge in free texts because text annotation is based on a static, pre-existing Knowledge Graph.

claimDistant Supervision (DS) can introduce errors in Knowledge Graph Extraction (KGE) because it relies on assumptions that are not always valid, particularly when knowledge graphs and the corpus do not align closely, leading to hallucinations, as noted by Riedel et al. (2010).

Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org arXiv 7 facts

procedureThe XI Pipeline uses distant supervision and an aggregated piecewise convolution network trained on existing knowledge graph relations for relation extraction.

referenceA.P. Aprosio, C. Giuliano, and A. Lavelli demonstrated a method for extending the coverage of DBpedia properties using distant supervision over Wikipedia in 2013.

referenceD. Zeng, K. Liu, Y. Chen, and J. Zhao published 'Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks' in the proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015) in Lisbon, Portugal, in September 2015.

referenceAutoKnow is a closed-source system used by Amazon to create a retail product knowledge graph by processing product catalogs and consumer shopping behavior logs using machine learning and distant supervision.

claimThe primary bottleneck for neural relation extraction is the availability of training data, which is often addressed using distant supervision by training models on statements from data sources like Wikipedia.

procedureDistant supervision is a common method for link prediction that involves linking knowledge graph entities to a text corpus using NLP approaches and identifying patterns between those entities within the text.

claimThe AutoKnow system derives most of its training and validation data from product catalogs or customer behavior logs by applying distant supervision and, in some cases, utilizing crowdsourcing via Amazon Mechanical Turk.

Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org arXiv Feb 16, 2025 1 fact

referenceBen Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, and Dan Roth authored 'Temporal reasoning on implicit events from distant supervision', published as an arXiv preprint (arXiv:2010.12753) in 2020.

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 1 fact

claimAutomated or semi-automated knowledge graph construction methods, such as distant supervision or neural triple extraction, often introduce noisy or redundant triples and suffer from low precision in complex contexts.