Fact — claim — Knowledge Tree

Fan et al. (2025) attribute the tendency of reasoning models to fall into redundant loops of self-doubt and hallucination to current Reinforcement Learning (RL) mechanisms that over-reward detailed Chain-of-Thought.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

chain-of-thought concept
reinforcement learning concept