claim
Nichani et al. (2025) demonstrated that a single-layer Transformer with self-attention and MLP can achieve perfect prediction accuracy when the number of self-attention parameters or MLP parameters scales almost linearly with the number of facts.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Transformer concept
- self-attention mechanism concept