claim
Linear models face two inherent difficulties: constant-size states cannot scale with sequence length, leading to information loss on long inputs, and compressed representations may fail if future patterns deviate from the prior encoded in the compression rule.

Authors

Sources

Referenced by nodes (1)