procedure
During training, large language models use a technique called teacher forcing, where the model conditions the probability of the next token on ground-truth previous tokens from the training data rather than on its own previous predictions.

Authors

Sources

Referenced by nodes (1)