claim
Zhang et al. (2024b) studied the training dynamics of a Transformer with a single linear attention layer during in-context learning for linear regression tasks and showed that the model can find the global minimum of the objective function.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- In-Context Learning concept
- Transformer concept