reference
The paper 'Heavy-tailed class imbalance and why Adam outperforms gradient descent on language models' analyzes why the Adam optimizer performs better than standard gradient descent in the context of heavy-tailed class imbalance in language models.

Authors

Sources

Referenced by nodes (2)