Fact — reference — Knowledge Tree

Yun et al. (2019) demonstrated that for any sequence-to-sequence function, there exists a Transformer model that can approximate it, provided the number of layers scales exponentially in model dimension or input sequence length, while the size of each layer remains independent of the dimension and length.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

Transformer models concept