Linearly-solvable Markov decision problems 论文

2007The MIT Press eBooks引用 358

Reinforcement Learning in RoboticsFormal Methods in VerificationAdvanced Control Systems Optimization

机器人 Reinforcement Learning in Robotics Advanced Control Systems Optimization Formal Methods in Verification

作者

摘要

We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows an-alytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signicance, the new MDPs enable efcient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Off-policy learning of the optimal value function is possible without need for state-action values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS0524761. 1

作者查看全部 (1)

Emanuel Todorov

Linearly-solvable Markov decision problems 论文

摘要

作者查看全部 (1)

相关技术查看全部 (3)

相关事件

相关文章