concept

BERT

Also known as: Bidirectional Encoder Representations from Transformers, BERTa

Facts (49)

Sources

A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Springer Nov 4, 2024 13 facts

claimBERT introduced bidirectional training, which allows the model to analyze context from both sides to improve language understanding.

claimGoogle's BERT model introduced bidirectional training for improved language understanding.

claimOpenAI's GPT-3 is designed to create coherent, relevant text, while Google's BERT focuses on understanding words in their context for NLP tasks.

claimMeta's RoBERTa model utilizes different pre-training strategies compared to BERT, resulting in better optimization and stronger performance across NLP benchmarks.

claimGoogle developed LLMs including BERT (Bidirectional Encoder Representations from Transformers), T5 (Text-To-Text Transfer Transformer), PaLM (Pathways Language Model), Gemini, and LaMDA (Language Model for Dialogue Applications).

claimBERT utilizes deep contextual understanding for question answering and named entity recognition (NER) task completion.

referenceLiu Z, Lin W, Shi Y, and Zhao J authored 'A robustly optimized bert pre-training approach with post-training', published in the proceedings of the China National Conference on Chinese Computational Linguistics in 2021.

claimModels such as KEPLER and Pretrain-KGE use BERT-like LLMs to encode textual descriptions of entities and relationships into vector representations, which are then fine-tuned on KG-related tasks.

claimOpenAI’s GPT series, Google’s BERT, T5, PaLM, and Gemini, and Meta’s RoBERTa, OPT, and LLaMA are recognized as state-of-the-art LLMs.

claimBERT and GPT-3 models have been employed to generate and optimize database queries, providing a user-friendly interface and enhancing query performance, as noted in [54].

referenceEncoder-only architectures, such as BERT, process the entire input sequence simultaneously to capture bidirectional context, making them useful for tasks like entity recognition or text classification.

measurementMedium Language Models (LMs) are defined as models containing between one billion and ten billion parameters, with GPT-2 and BERT serving as examples.

claimVaswani et al. introduced transformer models in 2017, which serve as the foundation for modern LLMs such as BERT and GPT.

Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org arXiv Jul 9, 2024 10 facts

referenceKnowBERT includes a Knowledge Attention and Recontextualization (KAR) component within the BERT architecture that computes a knowledge-enhanced representation using entity links from a Knowledge Graph and passes it to the next transformer block.

referenceLUKE (Language Understanding with Knowledge-based Embeddings) is an extension of BERT that uses an entity-aware self-attention mechanism to treat words and entities as independent tokens, outputting contextualized representations.

claimKnowBERT exhibits better performance on relation extraction, words in context, and entity typing tasks compared to standard BERT.

referenceERNIE (Enhanced Language RepresentatioN with Informative Entities) fuses lexical, syntactic, and knowledge information by stacking a textual T-Encoder (which functions like BERT) with a knowledge K-Encoder to represent word tokens and entities in a unified feature space.

claimIn 2022, Zhen et al. classified knowledge enhancement approaches into explicit methods, which modify model inputs and employ external memories, and implicit methods, which focus on knowledge contained within the model from training, such as in BERT.

referenceK-BERT is a joint model that addresses the lack of domain-specific knowledge in BERT by injecting domain knowledge from Knowledge Graphs into sentences.

referenceBERT (Bidirectional Encoder Representations from Transformers) was released in 2018 as a transformer-based model capable of understanding contexts bidirectionally by considering both preceding and following words in input text.

claimYang et al. demonstrated that knowledge graph-enhanced pre-trained language models (KGPLMs), which inject a knowledge encoder module into pre-trained language models, consistently exhibit longer running times than vanilla LLMs like BERT across pre-training, fine-tuning, and inference stages.

claimExamples of large language models include Google’s BERT, Google's T5, and OpenAI’s GPT series.

referenceKRISP uses a multimodal BERT-pretrained transformer to process question and image pairs in an implicit knowledge model, while a separate explicit knowledge model constructs a Knowledge Graph from question and image symbols to predict answers.

The construction and refined extraction techniques of knowledge ... nature.com Nature Feb 10, 2026 3 facts

claimPre-trained models like BERT optimize performance in Named Entity Recognition (NER) tasks, particularly in cross-lingual settings, while domain-specific fine-tuning enhances the recognition of specialized terminology.

referenceL. Sun et al. published 'RpBERT: a text-image relation propagation-based BERT model for multimodal NER' in the Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, Issue 15, pages 13860–13868, in 2021.

referenceY. Chang et al. published 'Chinese named entity recognition method based on BERT' in the 2021 IEEE International Conference on Data Science and Computer Application (ICDSCA), pages 294–299, in 2021.

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 3 facts

referenceThe BERT–BiLSTM–CRF model integrates BERT, BiLSTM, and CRF modules to identify power equipment entities from Chinese technical documents and extract semantic relationships between those entities.

referenceEncoder-only models, such as BERT, RoBERTa, and ALBERT, utilize bidirectional attention and techniques like masked language modeling and next sentence prediction to perform tasks requiring deep text comprehension, including classification, entity recognition, and reading comprehension.

referenceThe KG-BERT model (Yao et al., 2019) treats knowledge graph triples as textual sequences and encodes them using BERT-style architectures.

The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org arXiv Jul 11, 2024 2 facts

referenceJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova introduced the BERT model for deep bidirectional transformer-based language understanding in a 2018 arXiv preprint.

referenceTransformer-based pre-trained language models are categorized into encoder-only models (e.g., BERT) for understanding and classifying text, decoder-only models (e.g., GPT) for generating coherent text, and encoder-decoder models (e.g., T5) for tasks requiring both comprehension and generation.

Knowledge Graph Combined with Retrieval-Augmented Generation ... drpress.org Academic Journal of Science and Technology Dec 2, 2025 2 facts

referenceThe paper 'Roberta: A robustly optimized bert pretraining approach' by Liu Y. was published as an arXiv preprint (arXiv:1907.11692) in 2019.

referenceThe paper 'Bert: Pre-training of deep bidirectional transformers for language understanding' by Kenton J D M W C and Toutanova L K was published in the Proceedings of naacL-HLT in 2019.

Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org arXiv Jun 29, 2025 2 facts

referenceJacob Devlin published 'Bert: Pre-training of deep bidirectional transformers for language understanding' in 2018.

measurementOn the MedMCQA benchmark, AMG-RAG achieves an accuracy of 66.34%, outperforming Meditron-70B (66.0%), Codex 5-shot CoT (59.7%), VOD (58.3%), Flan-PaLM (57.6%), PaLM (54.5%), GAL (120B, 52.9%), PubmedBERT (40.0%), SciBERT (39.0%), BioBERT (38.0%), and BERT (35.0%).

Large Language Models Meet Knowledge Graphs for Question ... arxiv.org arXiv Sep 22, 2025 2 facts

referenceThe GAIL method, proposed by Zhang et al. in 2024, utilizes GAIL fine-tuning with Llama-2-7B and BERTa language models, incorporating the Freebase knowledge graph to perform KGQA tasks on the WQSP, CWQ, and GrailQA datasets, evaluated using EM, F1, and Hits@1 metrics.

referenceGoR, as described by Zhang et al. (2024b), optimizes node embeddings during graph indexing by leveraging GNN and BERT score-based objectives to address the complexity of creating vector indexes from long-range facts.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 1 fact

claimPretrained Large Language Models such as GPT-3, GPT-4, PaLM, LLaMA, and BERT have demonstrated advancements due to the extensive datasets used in their training.

A framework to assess clinical safety and hallucination rates of LLMs ... nature.com Nature May 13, 2025 1 fact

referenceThe BERTScore metric, detailed in 'BERTScore: Evaluating Text Generation with BERT' (arXiv:1904.09675, 2020), utilizes BERT embeddings to evaluate text generation quality.

bureado/awesome-software-supply-chain-security - GitHub github.com GitHub 1 fact

referenceSpecterOps/DeepPass2 is a multi-layer secrets detection tool that uses regex patterns, fine-tuned BERT, and LLM verification to identify structured tokens and context-dependent free-form passwords in documents.

The Synergy of Symbolic and Connectionist AI in LLM ... arxiv.org arXiv 1 fact

claimContemporary research in neuro-symbolic AI and large-scale pre-trained models, such as BERT, GPT, and hybrid reinforcement learning models, exemplifies the convergence of connectionist and symbolic paradigms.

Combining large language models with enterprise knowledge graphs frontiersin.org Frontiers Aug 26, 2024 1 fact

claimPrompting with large Large Language Models (like GPTs) can underperform in Named Entity Recognition compared to fine-tuned smaller Pre-trained Language Models (like BERT derivations), especially when more training data is available (Gutierrez et al., 2022; Keloth et al., 2024; Pecher et al., 2024; Törnberg, 2024).

Neuro-symbolic AI - Wikipedia en.wikipedia.org Wikipedia 1 fact

referenceThe 'Symbolic' approach in neuro-symbolic integration is used by many neural models in natural language processing, such as BERT, RoBERTa, and GPT-3, where words or subword tokens serve as the ultimate input and output.

Efficient Knowledge Graph Construction and Retrieval from ... - arXiv arxiv.org arXiv Aug 7, 2025 1 fact

referenceRodrigo Nogueira and Kyunghyun Cho published 'Passage Re-ranking with BERT' as an arXiv preprint in 2020.

Policymakers Overlook How Open Source AI Is Reshaping ... techpolicy.press Lucie-Aimée Kaffee, Shayne Longpre · Tech Policy Press Dec 9, 2025 1 fact

accountIn the early 2020s, American companies dominated the open-source AI landscape, with over half of all open-weight model downloads associated with United States industry models such as BERT, CLIP, and T5.

Knowledge Graphs: Opportunities and Challenges - Springer Nature link.springer.com Springer Apr 3, 2023 1 fact

claimYao L, Mao C, Luo Y published the paper 'Kg-bert: Bert for knowledge graph completion' as an arXiv preprint in 2019.

Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org arXiv 1 fact

claimLarge Language Models (LLMs) are successors to foundational language models like BERT (Bidirectional Encoder Representations from Transformers) and represent a combination of feedforward neural networks and transformers.

A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org arXiv Feb 23, 2026 1 fact

referenceThe FactCC benchmark, introduced by Kryscinski et al. in 2020, uses a BERT-based model to verify the factual overlap between a generated response and the source evidence.

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv Aug 13, 2025 1 fact

referenceThe paper 'Bertscore: Evaluating text generation with BERT' by Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi was published in the 8th International Conference on Learning Representations (ICLR 2020).