reference
The paper 'Gated delta networks: improving mamba2 with delta rule' is an arXiv preprint (arXiv:2412.06464) that proposes improvements to the Mamba2 architecture using the delta rule.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- arXiv entity