concept

Transformer architecture

Facts (9)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 4 facts

referenceThe paper 'On limitations of the transformer architecture' discusses the constraints inherent in the transformer architecture.

claimResearchers, including Yu et al. (2023a; 2024; b), Zhou et al. (2022), Wang et al. (2025b), Yang et al. (2022), and Ren et al. (2025), have attempted to understand the structure of the Transformer architecture from principled theoretical perspectives.

claimChen et al. (2024e) used gradient flow to analyze how a simplified Transformer architecture with two attention layers performs in-context learning, revealing the collaborative mechanism of its components.

referenceThe paper 'Approximation rate of the transformer architecture for sequence modeling' was published in Advances in Neural Information Processing Systems 37, pages 68926–68955.

Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org arXiv Jul 9, 2024 2 facts

claimLarge Language Models (LLMs) are based on the transformer architecture, which excels in handling long sequences due to its self-attention mechanism.

formulaFor Large Language Models based on the transformer architecture, the hidden state at step t is calculated using the current token and all previous hidden states: h_t = f(x_t, h_{t-1}, ..., h_1).

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 1 fact

claimThe transformer architecture was created to address the limitations of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks in managing long-range dependencies in sequential data.

The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org arXiv Jul 11, 2024 1 fact

referenceAshish Vaswani et al. introduced the Transformer architecture in the paper 'Attention is All You Need', published in the 2017 Advances in Neural Information Processing Systems.

Are you hallucinated? Insights into large language models sciencedirect.com ScienceDirect 1 fact

claimHallucinations in large language models are the logical consequence of the transformer architecture's essential mathematical operation, known as the self-attention mechanism.