claim
Chen et al. (2024e) used gradient flow to analyze how a simplified Transformer architecture with two attention layers performs in-context learning, revealing the collaborative mechanism of its components.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- In-Context Learning concept
- Transformer architecture concept