claim
Current alignment methodologies for Large Language Models, such as Reinforcement Learning from Human Feedback (RLHF), are empirically effective but theoretically fragile.

Authors

Sources

Referenced by nodes (2)