reference
Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang authored 'Safe rlhf: Safe reinforcement learning from human feedback', published as an arXiv preprint in 2023.

Authors

Sources

Referenced by nodes (1)