Large Language Models ↔ gradient descent

Relations (1)

related 0.10 — supporting 1 fact

Large Language Models are related to gradient descent because the implicit bias of the latter is identified as a key mechanism that compels the formation of linear representations during the training of the former, as stated in [1].

Facts (1)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv 1 fact

perspectiveJiang et al. (2024b) argue that the formation of linear representations in high-dimensional settings for Large Language Models is naturally compelled by the interplay between the next-token prediction objective and the implicit bias of gradient descent.