claim
Large language models experience a training-inference mismatch because the conditioning context during training is always perfect ground-truth tokens, whereas the conditioning context during inference is the model's own outputs, which may contain errors.

Authors

Sources

Referenced by nodes (1)