reference
The paper 'Heavy-tailed class imbalance and why Adam outperforms gradient descent on language models' analyzes why the Adam optimizer performs better than standard gradient descent in the context of heavy-tailed class imbalance in language models.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Language Model concept
- gradient descent concept