Fact — claim — Knowledge Tree

Merrill et al. (2024) showed that the expressive power of linear RNNs with diagonal transition matrices is comparable to that of Transformers, but allowing data-dependent non-diagonal transitions enables linear RNNs to surpass that class of expressive power.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

Transformers concept