Optimal Adaptive Policies for Markov Decision Processes 论文

1997Mathematics of Operations Research引用 239
Reinforcement Learning in RoboticsAge of Information OptimizationAdvanced Bandit Algorithms Research

摘要

In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.