claim
Fan et al. (2025) attribute the tendency of reasoning models to fall into redundant loops of self-doubt and hallucination to current Reinforcement Learning (RL) mechanisms that over-reward detailed Chain-of-Thought.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- chain-of-thought concept
- reinforcement learning concept