claim
Yue et al. (2025) systematically evaluated RL with Verifiable Rewards (RLVR) and argued that while RL improves sampling efficiency, it does not introduce fundamentally new reasoning patterns, with performance ultimately bounded by the base model’s distribution.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- reinforcement learning concept