reference
The paper 'Transformers learn nonlinear features in context: nonconvex mean-field dynamics on the attention landscape' was published in the Forty-first International Conference on Machine Learning and is cited in section 3.2.2 of 'A Survey on the Theory and Mechanism of Large Language Models'.

Authors

Sources

Referenced by nodes (2)