claim
Cheng et al. (2023) and Collins et al. (2024) explored the ability of Transformers to learn a wider range of nonlinear functions, extending considerations beyond linear attention settings.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept