reference
The paper 'Training dynamics of multi-head softmax attention for in-context learning: emergence, convergence, and optimality' was published in The Thirty Seventh Annual Conference on Learning Theory, page 4573.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- In-Context Learning concept