claim
Models with extensive Reinforcement Learning from Human Feedback (RLHF), such as OpenAI's GPT-4, are more resistant to prompt adversaries compared to purely open-source models without such fine-tuning.
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (2)
- OpenAI entity
- Reinforcement learning from human feedback (RLHF) concept