Simulation-based optimization of Markov reward processes 论文

2001IEEE Transactions on Automatic Control引用 319

Simulation Techniques and ApplicationsReinforcement Learning in RoboticsMarkov Chains and Monte Carlo Methods

机器人 Reinforcement Learning in Robotics Simulation Techniques and Applications Markov Chains and Monte Carlo Methods

作者

摘要

This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided.

作者查看全部 (2)

John N. Tsitsiklis

Peter Marbach

Simulation-based optimization of Markov reward processes 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章