reference
The paper 'Disentangling feature structure: A mathematically provable two-stage training dynamics in transformers' is an arXiv preprint, identified as arXiv:2502.20681.

Authors

Sources

Referenced by nodes (1)