claim
Fan et al. (2025) attribute the tendency of reasoning models to fall into redundant loops of self-doubt and hallucination to current Reinforcement Learning (RL) mechanisms that over-reward detailed Chain-of-Thought.

Authors

Sources

Referenced by nodes (2)