claim
Instruction tuning and reinforcement learning from human feedback (RLHF) improve a large language model's ability to express uncertainty and abstain from answering when knowledge is insufficient, but they do not retroactively fill knowledge gaps or undo exposure bias present in the base model.
Authors
Sources
- Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com via serper
Referenced by nodes (4)
- Reinforcement learning from human feedback (RLHF) concept
- exposure bias concept
- instruction tuning concept
- knowledge gaps concept