claim
Models with extensive Reinforcement Learning from Human Feedback (RLHF), such as OpenAI's GPT-4, are more resistant to prompt adversaries compared to purely open-source models without such fine-tuning.

Authors

Sources

Referenced by nodes (2)