Fact — claim — Knowledge Tree

Liu et al. (2025d) demonstrated that with sufficient training duration and periodic policy resets, Reinforcement Learning can drive Large Language Models to explore novel strategies absent in the base model, thereby expanding the reasoning boundary.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

Large Language Models concept
reinforcement learning concept