claim
Shao et al. (2025) found that even weak or random reward signals can significantly improve mathematical reasoning in Large Language Models because Reinforcement Learning activates valid reasoning modes, such as code-based reasoning, already present in the pre-trained model.

Authors

Sources

Referenced by nodes (2)