reference
The paper 'Training dynamics of multi-head softmax attention for in-context learning: emergence, convergence, and optimality' was published in The Thirty Seventh Annual Conference on Learning Theory, page 4573.

Authors

Sources

Referenced by nodes (1)