claim
Large-scale reinforcement learning in Large Language Models elicits reasoning behaviors such as hypothesis generation and self-criticism as emergent properties.

Authors

Sources

Referenced by nodes (2)