claim
Stochastic multi-armed bandits (MABs) are a fundamental reinforcement learning model used to study sequential decision-making in uncertain environments.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- reinforcement learning concept