claim
Shao et al. (2025) found that even weak or random reward signals can significantly improve mathematical reasoning in Large Language Models because Reinforcement Learning activates valid reasoning modes, such as code-based reasoning, already present in the pre-trained model.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Large Language Models concept
- reinforcement learning concept