claim
Chu et al. (2025) provided empirical evidence that Supervised Fine-Tuning (SFT) tends to memorize training data, leading to poor performance on out-of-distribution (OOD) tasks, whereas Reinforcement Learning (RL) demonstrates superior generalization capabilities.

Authors

Sources

Referenced by nodes (5)