claim
Stochastic multi-armed bandits (MABs) are a fundamental reinforcement learning model used to study sequential decision-making in uncertain environments.

Authors

Sources

Referenced by nodes (1)