formula
Hoffmann et al. (2022b) established that for compute-optimal training, model size (N) and the amount of training data (D) should be scaled proportionally with the compute budget (C), specifically N ∝ C^0.5 and D ∝ C^0.5.

Authors

Sources

Referenced by nodes (1)