Fact — formula — Knowledge Tree

Hoffmann et al. (2022b) established that for compute-optimal training, model size (N) and the amount of training data (D) should be scaled proportionally with the compute budget (C), specifically N ∝ C^0.5 and D ∝ C^0.5.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

training data concept