claim
Chu et al. (2025) provided empirical evidence that Supervised Fine-Tuning (SFT) tends to memorize training data, leading to poor performance on out-of-distribution (OOD) tasks, whereas Reinforcement Learning (RL) demonstrates superior generalization capabilities.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (5)
- reinforcement learning concept
- training data concept
- generalization concept
- supervised fine-tuning concept
- Out-of-Distribution concept