claim
Reinforcement Learning from Human Feedback (RLHF) in Large Language Model development operationalizes Operant Conditioning theory by using repeated feedback to adapt model behaviors to favor outputs that yield higher reward signals.

Authors

Sources

Referenced by nodes (1)