reference
The paper 'Gated linear attention transformers with hardware-efficient training' is an arXiv preprint (arXiv:2312.06635) that discusses gated linear attention transformers and their training efficiency.

Authors

Sources

Referenced by nodes (1)