reference
The paper 'Gated linear attention transformers with hardware-efficient training' is an arXiv preprint (arXiv:2312.06635) that discusses gated linear attention transformers and their training efficiency.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- arXiv entity