claim
Reinforcement Learning from Human Feedback (RLHF) in Large Language Model development operationalizes Operant Conditioning theory by using repeated feedback to adapt model behaviors to favor outputs that yield higher reward signals.
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper