claim
Cao et al. (2024) introduced a method for enhancing reinforcement learning by utilizing dense rewards derived from a language model critic.

Authors

Sources

Referenced by nodes (1)