reference
The paper 'Why are adaptive methods good for attention models?' was published in Advances in Neural Information Processing Systems 33 (pp. 15383–15393) and is cited in section 4.3.2 of 'A Survey on the Theory and Mechanism of Large Language Models'.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper