reference
Yun et al. (2019) demonstrated that for any sequence-to-sequence function, there exists a Transformer model that can approximate it, provided the number of layers scales exponentially in model dimension or input sequence length, while the size of each layer remains independent of the dimension and length.

Authors

Sources

Referenced by nodes (1)