claim
Gao et al. (2023) established a functional relationship between the golden reward and the KL divergence in large language models.

Authors

Sources

Referenced by nodes (1)