Fact — claim — Knowledge Tree

Zhang et al. (2024b) studied the training dynamics of a Transformer with a single linear attention layer during in-context learning for linear regression tasks and showed that the model can find the global minimum of the objective function.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

In-Context Learning concept
Transformer concept