concept

self-attention mechanism

Also known as: self-attention

Facts (11)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 6 facts

claimZhou et al. (2022) approach the emerging visual grouping phenomenon in Vision Transformers from the perspective of the information bottleneck, showing that the iterative solution to the information bottleneck objective can be expressed as self-attention.

claimNichani et al. (2025) demonstrated that a single-layer Transformer with self-attention and MLP can achieve perfect prediction accuracy when the number of self-attention parameters or MLP parameters scales almost linearly with the number of facts.

referenceThe paper 'Are transformers with one layer self-attention using low-rank weight matrices universal approximators?' is an arXiv preprint (arXiv:2307.14023) cited in section 3.2.1 of 'A Survey on the Theory and Mechanism of Large Language Models'.

referenceThe paper 'Theoretical limitations of self-attention in neural sequence models' (Transactions of the Association for Computational Linguistics 8) is cited in the survey 'A Survey on the Theory and Mechanism of Large Language Models' regarding self-attention limitations.

claimOymak et al. (2023) theoretically establish that softmax prompt attention is more expressive than self-attention or linear prompt attention in the context of mixture models.

claimTian et al. (2023) revealed that the self-attention mechanism exhibits a 'scan and snap' dynamic, where the model initially distributes attention uniformly across all tokens and gradually focuses on distinctive tokens discriminative for predicting specific next tokens while reducing attention on common tokens.

(PDF) Self-consciousness, self-attention, and social interaction researchgate.net ResearchGate Feb 27, 2017 1 fact

measurementThe study titled "Self-consciousness, self-attention, and social interaction" conducted two experiments with a total of 128 female undergraduates to test the effects of self-focused attention on positive and negative social interactions.

The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org arXiv Jul 11, 2024 1 fact

claimSelf-attention mechanisms and transformer architectures, proposed in the late 2010s, revolutionized sequence modeling for natural language processing by allowing models to focus on different parts of the input sequence when generating output.

Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org arXiv Jul 9, 2024 1 fact

claimLarge Language Models (LLMs) are based on the transformer architecture, which excels in handling long sequences due to its self-attention mechanism.

A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Springer Nov 4, 2024 1 fact

claimTransformer models utilize a self-attention mechanism to process text more efficiently and accurately.

Are you hallucinated? Insights into large language models sciencedirect.com ScienceDirect 1 fact

claimHallucinations in large language models are the logical consequence of the transformer architecture's essential mathematical operation, known as the self-attention mechanism.