Fact — formula — Knowledge Tree

Saunshi et al. (2020) proved that a language model achieving epsilon-optimal cross-entropy loss during pre-training enables a simple linear classifier to achieve an error rate of epsilon on downstream natural classification tasks.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

Pre-training concept