claim
Merrill et al. (2024) showed that the expressive power of linear RNNs with diagonal transition matrices is comparable to that of Transformers, but allowing data-dependent non-diagonal transitions enables linear RNNs to surpass that class of expressive power.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept