claim
Wen et al. (2024) demonstrated that generalized RNNs, even when equipped with chain-of-thought reasoning, cannot perform associative recall or tasks requiring precise contextual retrieval unless they are augmented with retrieval-augmented generation (RAG) or followed by a Transformer layer.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper