reference
Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, and Dacheng Tao introduced InfoRM, an information-theoretic reward modeling approach designed to mitigate reward hacking in Reinforcement Learning from Human Feedback (RLHF), in a 2024 paper presented at the 38th Annual Conference on Neural Information Processing Systems.
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper