Fact — claim — Knowledge Tree

Yue et al. (2025) systematically evaluated RL with Verifiable Rewards (RLVR) and argued that while RL improves sampling efficiency, it does not introduce fundamentally new reasoning patterns, with performance ultimately bounded by the base model’s distribution.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

reinforcement learning concept