claim
Off-policy evaluation (OPE) in reinforcement learning is the problem of estimating the expected long-term payoff of a target policy using experiences generated by a different, potentially unknown, behavior policy.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- reinforcement learning concept