claim
Liu et al. (2025d) demonstrated that with sufficient training duration and periodic policy resets, Reinforcement Learning can drive Large Language Models to explore novel strategies absent in the base model, thereby expanding the reasoning boundary.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Large Language Models concept
- reinforcement learning concept