reference
The paper 'Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers' is an arXiv preprint (arXiv:2212.10559).
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (3)
- Language Model concept
- In-Context Learning concept
- gradient descent concept