Fact — reference — Knowledge Tree

The paper 'Heavy-tailed class imbalance and why Adam outperforms gradient descent on language models' analyzes why the Adam optimizer performs better than standard gradient descent in the context of heavy-tailed class imbalance in language models.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

Language Model concept
gradient descent concept