claim
Recent developments in RLHF include incorporating human cognitive biases (Siththaranjan et al., 2024) and personalizing reward functions for individual values (Poddar et al., 2024).
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- cognitive bias concept
- RLHF concept