claim
Adding reward variability in reinforcement learning may reduce premature convergence and improve alignment with human intent.

Authors

Sources

Referenced by nodes (2)