claim
Post-training methods like Reinforcement Learning from Human Feedback (RLHF) contribute to LLM hallucinations by using binary scoring systems that punish models for saying 'I don't know,' which incentivizes confident guessing.

Authors

Sources

Referenced by nodes (2)