formula
Saunshi et al. (2020) proved that a language model achieving epsilon-optimal cross-entropy loss during pre-training enables a simple linear classifier to achieve an error rate of epsilon on downstream natural classification tasks.

Authors

Sources

Referenced by nodes (1)