Fact — measurement — Knowledge Tree

The regret of the Federated Upper Confidence Bound Value Iteration algorithm (Fed-UCBVI) scales as Õ(√(H^3 |S| |A| T / M)), where |S| is the number of states, |A| is the number of actions, H is the episode length, M is the number of agents, and T is the number of episodes, with an additional small term accounting for agent heterogeneity.

Authors

Person: Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche Organization: AISTATS
Track: Poster Session 3 - aistats 2026

Sources

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS via serper

Referenced by nodes (1)

reinforcement learning concept