Wikipedia
Facts (61)
Sources
Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org 31 facts
referenceYAGO 4 constructs its knowledge graph by collecting data from Wikidata and forcing it into a taxonomy based on schema.org and Bioschemas, rather than using Wikipedia, WordNet, and GeoNames data.
claimBlevins et al. propose a link prediction approach that learns patterns from entire Wikipedia articles.
claimLange et al. utilize Conditional Random Fields to learn patterns for link prediction based on Wikipedia abstracts.
referenceThe Nell2RDF extension transforms NELL data into RDF format and annotates extracted relations with provenance information regarding the source Wikipedia articles.
referenceD. Milne and I.H. Witten developed a method for learning to link entities with Wikipedia, presented at the 17th ACM conference on Information and knowledge mining.
referenceA.P. Aprosio, C. Giuliano, and A. Lavelli demonstrated a method for extending the coverage of DBpedia properties using distant supervision over Wikipedia in 2013.
referenceD. Lange, C. Böhm, and F. Naumann published 'Extracting structured information from Wikipedia articles to populate infoboxes' in the Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM 2010) in 2010.
referenceThe authors of 'Construction of Knowledge Graphs: State and Challenges' recommend starting Knowledge Graph construction with large, curated 'premium' data sources such as Wikipedia and DBpedia.
claimHolistic entity linking approaches utilize background information beyond simple mention-entity similarity, such as the graph structure of Wikipedia links, to determine commonness and relatedness between entities.
procedureThe DBpedia Extraction Framework (DIEF) executes multiple extractors to target specific aspects of Wikipedia articles, such as type information from infobox templates or abstract paragraphs, including a specific extractor that maps infobox data to the DBpedia ontology.
claimTailored ontology matching approaches exist for knowledge graphs, such as mapping categories derived from Wikipedia to the Wordnet taxonomy to achieve an enriched knowledge graph ontology.
procedureDBpedia Live provides a real-time knowledge graph by continuously extracting changed Wikipedia content, fetching and reprocessing modified articles to update, add, or delete values in the knowledge graph.
claimThe DBpedia Extraction Framework (DIEF) extracts structured data from Wikipedia article dumps and has been forked by several other wiki-based knowledge graph projects.
referenceA tutorial-style overview of knowledge graph construction and curation, with a focus on integrating data from textual and semi-structured sources like Wikipedia, is provided in reference [12].
claimWikipedia's category system can be used to derive relevant classes for a knowledge graph through NLP-based 'category cleaning' techniques.
referenceVisualSem is a multilingual and multi-modal knowledge graph that interlinks images, their descriptions (glosses), and other attributes from Wikipedia articles, WordNet concepts, and ImageNet images.
referenceFrey et al. presented 'DBpedia FlexiFusion the best of Wikipedia> Wikidata> your data' at the International Semantic Web Conference in 2019.
referenceH. Paulheim and S.P. Ponzetto published 'Extending DBpedia with Wikipedia List Pages' at the NLP-DBPEDIA workshop at the International Semantic Web Conference in 2013.
claimThe primary bottleneck for neural relation extraction is the availability of training data, which is often addressed using distant supervision by training models on statements from data sources like Wikipedia.
referenceE. Munoz, A. Hogan, and A. Mileo published 'Triplifying wikipedia’s tables' at the LD4IE workshop at the International Semantic Web Conference in 2013.
claimDBpedia, YAGO, and NELL integrate information from Wikipedia as a primary source for knowledge.
referenceThe Yet Another Great Ontology (YAGO) project initially constructed its knowledge graph by extracting information about Wikipedia entities and combining them with an ontology derived from WordNet.
claimIn the context of Entity Linking, 'commonness' is defined as the probability that an entity mention links to the specific Wikipedia article of a candidate entity, while 'relatedness' measures the number of Wikipedia articles that link to both candidate entities.
claimSome knowledge graphs contain multilingual information by providing descriptive entity values in different languages, typically using translations taken directly from sources like Wikipedia or BabelNet rather than generating their own translations.
referenceYAGO version 2 extended its knowledge graph by incorporating temporal information from Wikipedia edit timestamps and spatial knowledge from GeoNames.
referenceO. Medelyan, I.H. Witten, and D. Milne published 'Topic Indexing with Wikipedia' in the proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI’08) in 2008.
referenceThe article 'Dbpedia and the live extraction of structured data from wikipedia' by M. Morsey, J. Lehmann, S. Auer, C. Stadler, and S. Hellmann, published in Program in 2012, discusses live extraction of structured data for DBpedia.
referenceYAGO 3 utilizes Wikipedia inter-language links to extend multilingual knowledge coverage.
claimThe Wikipedia page for 'Richard D James' redirects to the Wikipedia page for 'Aphex Twin'.
measurementDBpedia performs monthly batch extractions consuming data from up to 140 wikis, including various language-specific Wikipedia versions, Wikidata, and Wikimedia Commons, with each release undergoing completeness and quality validation.
claimUtilizing disambiguated aliases from high-quality sources, such as Wikipedia redirects or DBpedia’s dbo:alias property, increases the coverage of entity dictionaries.
Patterns in the Transition From Founder-Leadership to Community ... arxiv.org Feb 5, 2026 6 facts
claimAn analysis of Wikipedia's community-driven policy development process by Im et al. (2018) found that one-third of proposals are abandoned, which suggests failures to fully engage community members in its governance processes.
referenceThe paper 'Is to was: coordination and commemoration in posthumous activity on Wikipedia biographies' was published in the Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pages 533–546.
referenceThe article 'Decentralization in wikipedia governance' published in the Journal of Management Information Systems, volume 26, issue 1, pages 49–72, discusses the governance structure of Wikipedia.
referenceThe article 'The Evolution of Wikipedia’s Norm Network' published in Future Internet (vol 8, issue 4) analyzes the development of norms within Wikipedia.
referenceThe paper 'Deliberation and Resolution on Wikipedia: A Case Study of Requests for Comments' published in Proceedings of the ACM on Human-Computer Interaction examines the process of deliberation and resolution on Wikipedia.
accountThe institutional analysis lens has been used to understand the community-managed encyclopedia Wikipedia, which transitioned from management by benevolent dictator Jimmy Wales to community management, as documented by Forte et al. (2009), Heaberlin and DeDeo (2016), and Viegas et al. (2007).
Knowledge Graphs: Opportunities and Challenges - Springer Nature link.springer.com Apr 3, 2023 4 facts
referenceDBpedia is a knowledge graph that extracts semantically meaningful information from Wikipedia to create a structured ontological knowledge base.
referenceYago is a knowledge base containing a large number of entities and relationships extracted from sources including Wikipedia and WordNet.
referenceWikidata is a cross-lingual, document-oriented knowledge graph that supports sites and services such as Wikipedia.
referenceRebele et al. (2016) published 'Yago: a multilingual knowledge base from wikipedia, wordnet, and geonames' in the International Semantic Web Conference proceedings, which details the construction of the YAGO knowledge base.
Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com Mar 15, 2026 3 facts
measurementIn web-scale training data, high-accuracy sources like Wikipedia (95% accuracy) and academic papers (88% accuracy) contribute approximately 7% of the total corpus, while lower-accuracy sources like SEO content (35% accuracy) and general blogs (65% accuracy) constitute nearly 40% of the total token volume.
claimThe weighted factual accuracy of web training data is driven down by the high volume of low-accuracy source types, such as SEO content and social media, despite the presence of high-accuracy curated sources like Wikipedia and academic papers.
claimLarge language models lack a concept of source reliability because standard pretraining objectives treat all training data sources, such as Wikipedia articles, peer-reviewed papers, and social media posts, with equal weight per token.
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org 3 facts
claimAgarwal et al. (2023) employed CommonSense, WordNet, and Wikipedia knowledge graphs to generate paraphrases that held equivalent meanings but were perceived as distinct by AI agents.
procedureKnowLLMs (LLMs over KGs) train Large Language Models using knowledge graphs such as CommonSense, Wikipedia, and UMLS, with a training objective redefined as an autoregressive function coupled with pruning based on state-of-the-art KG embedding methods.
claimThe ReACT framework employs Wikipedia to address spurious generation and explanations in Large Language Models, though it relies on a prompting method rather than a well-grounded domain-specific approach.
Business model: Open Source - Learning Loop learningloop.io 3 facts
referenceThe Wikipedia entry 'Business models for open-source software' provides an overview of various economic models used in open-source software ecosystems.
accountWikipedia, launched in 2001, is an online encyclopedia written and edited by a global community and sustained through donations.
accountWikipedia's open source model disrupted the traditional encyclopedia market, compelling established publishers to modify their business models.
The construction and refined extraction techniques of knowledge ... nature.com Feb 10, 2026 2 facts
claimRule-based methods like DBpedia extracted triples from Wikipedia infoboxes using predefined rules, which provided efficiency for fixed-format data but struggled with the complex semantics of natural language.
claimThe REBEL framework generates entity-relation triples directly without predefined ontologies or rules, achieving 1.8 times the coverage of traditional approaches on Wikipedia.
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org Feb 23, 2026 2 facts
procedureIn the entity-level filtering task, human participants compared LLM responses against Wikipedia entity descriptions to determine if they referred to the same entity, while ignoring fact-level hallucinations.
procedureThe KGHaluBench Response Verification Module employs a two-layer framework: first, it checks the Large Language Model's response against the entity's Wikipedia description to ensure non-abstention and basic understanding; second, it verifies non-hallucinated responses at the fact level by comparing claims to Knowledge Graph triples.
What is open source software? oss-watch.ac.uk May 1, 2005 1 fact
claimOpen content is defined as content that can be edited, changed, and added to by any reader, with Wikipedia serving as an example of an online open content encyclopedia.
The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Jan 29, 2024 1 fact
procedureThe SelfCheckGPT benchmark requires an LLM to generate six Wikipedia passages for evaluation. The first passage is generated with a temperature of 0.0, and the remaining five are generated with a temperature of 1.0. The SelfCheckGPT-NLI method, using the 'potsawee/deberta-v3-large-mnli' NLI model, then assesses whether sentences in the first passage are supported by the other five; if a sentence is inconsistent, the instance is marked as hallucinated.
What is open hardware? | Opensource.com opensource.com 1 fact
referenceThe Open Source Hardware Association website and the Wikipedia entry on open source hardware are recommended resources for learning about open hardware.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 1 fact
claimThe authors of the Chain-of-Knowledge (CoK) framework validated its impact on reducing hallucinations using ProgramFC, a factual verification method based on Wikipedia, on single- and multi-step reasoning tasks.
Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org Nov 2, 2025 1 fact
claimThe Chain-of-Knowledge (CoK) framework utilizes ProgramFC, a factual verification method based on Wikipedia, to validate factual accuracy on single- and multi-step reasoning tasks.
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Apr 7, 2025 1 fact
referenceThe Discrete Reasoning Over Paragraphs (DROP) dataset consists of Wikipedia passages and questions that require discrete operations like counting, sorting, and mathematical reasoning.
Governance of open source software: state of the art - Springer Nature link.springer.com Jun 9, 2007 1 fact
claimThe Wikipedia community utilizes a governance structure that is modeled after GPL-like Open Source Software communities and licenses its content under GPL-like terms.