Fact — perspective — Knowledge Tree

Jiang et al. (2024b) argue that the formation of linear representations in high-dimensional settings for Large Language Models is naturally compelled by the interplay between the next-token prediction objective and the implicit bias of gradient descent.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

Large Language Models concept
gradient descent concept