formula
Saunshi et al. (2020) proved that a language model achieving epsilon-optimal cross-entropy loss during pre-training enables a simple linear classifier to achieve an error rate of epsilon on downstream natural classification tasks.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Pre-training concept