claim
Reinforcement Learning from Human Feedback (RLHF) reward models can inadvertently train Large Language Models to be overconfident because human annotators often mistake confidence for competence when evaluating text quality.
Authors
Sources
- Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com via serper
Referenced by nodes (3)
- Large Language Models concept
- Reinforcement learning from human feedback (RLHF) concept
- RLHF concept