reference
Jiang and Li (2024) derived Jackson-type approximation bounds for Transformers by introducing new complexity measures to construct approximation spaces, showing that Transformers approximate efficiently when the temporal dependencies of the target function exhibit a low-rank structure.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformer concept