reference
The paper 'GaLore: memory-efficient LLM training by gradient low-rank projection' is published in the International Conference on Machine Learning, pp. 61121–61143, and is cited in sections 1 and 7.2.2 of 'A Survey on the Theory and Mechanism of Large Language Models'.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper