formula
Hoffmann et al. (2022b) established that for compute-optimal training, model size (N) and the amount of training data (D) should be scaled proportionally with the compute budget (C), specifically N ∝ C^0.5 and D ∝ C^0.5.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- training data concept