reference
The paper 'Disentangling feature structure: A mathematically provable two-stage training dynamics in transformers' is an arXiv preprint, identified as arXiv:2502.20681.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept