claim
Yue et al. (2025) systematically evaluated RL with Verifiable Rewards (RLVR) and argued that while RL improves sampling efficiency, it does not introduce fundamentally new reasoning patterns, with performance ultimately bounded by the base model’s distribution.

Authors

Sources

Referenced by nodes (1)