Relative Entropy Inverse Reinforcement Learning 论文

2011Max Planck Digital Library引用 256

Reinforcement Learning in RoboticsSports Analytics and PerformanceRobot Manipulation and Learning

机器人 Reinforcement Learning in Robotics Sports Analytics and Performance Robot Manipulation and Learning

作者

摘要

We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)-optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a uniform policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to well-known IRL algorithms using approximate MDP models. Empirical results on simulated car racing, gridworld and ball-in-a-cup problems show that our approach is able to learn good policies from a small number of demonstrations.

作者查看全部 (3)

Jan Peters

Jens Kober

Abdeslam Boularias

Relative Entropy Inverse Reinforcement Learning 论文

详细信息

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章