claim
Fast weight programmers and online learners are a family of linear models obtained by applying different gradient-descent algorithms in online or streaming settings (Schmidhuber, 1992; Yang et al., 2024b; Liu et al., 2024a; Yang et al., 2024c).
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- gradient descent concept
- linear models concept