reference
Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, and Dacheng Tao introduced InfoRM, an information-theoretic reward modeling approach designed to mitigate reward hacking in Reinforcement Learning from Human Feedback (RLHF), in a 2024 paper presented at the 38th Annual Conference on Neural Information Processing Systems.

Authors

Sources

Referenced by nodes (1)