claim
Akyürek et al. (2022) demonstrated that under certain constructions, Transformers can implement basic operations such as move, multiply, divide, and affine transformations, which can be combined to perform gradient descent.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Transformers concept
- gradient descent concept