claim
Liu et al. (2025d) demonstrated that with sufficient training duration and periodic policy resets, Reinforcement Learning can drive Large Language Models to explore novel strategies absent in the base model, thereby expanding the reasoning boundary.

Authors

Sources

Referenced by nodes (2)