claim
Adding reward variability in reinforcement learning may reduce premature convergence and improve alignment with human intent.
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- reinforcement learning concept
- AI alignment concept