Optimal Adaptive Policies for Markov Decision Processes 论文

1997Mathematics of Operations Research引用 239

Reinforcement Learning in RoboticsAge of Information OptimizationAdvanced Bandit Algorithms Research

机器人 Reinforcement Learning in Robotics Advanced Bandit Algorithms Research Age of Information Optimization

作者

摘要

In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.

作者查看全部 (2)

Michael N. Katehakis

Apostolos Burnetas

Optimal Adaptive Policies for Markov Decision Processes 论文

摘要

作者查看全部 (2)

相关技术查看全部 (3)

相关事件

相关文章