claim
Test-time training (regression) is a model architecture design framework utilized to address the quadratic complexity of Transformers with respect to sequence length, as discussed by Sun et al. (2024), Yang et al. (2023c), von Oswald et al. (2025), Wang et al. (2025a), and Behrouz et al. (2024; 2025).
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept