reference
Yun et al. (2019) demonstrated that for any sequence-to-sequence function, there exists a Transformer model that can approximate it, provided the number of layers scales exponentially in model dimension or input sequence length, while the size of each layer remains independent of the dimension and length.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformer models concept