claim
Merrill et al. (2024) showed that the expressive power of linear RNNs with diagonal transition matrices is comparable to that of Transformers, but allowing data-dependent non-diagonal transitions enables linear RNNs to surpass that class of expressive power.

Authors

Sources

Referenced by nodes (1)