Policy Shaping: Integrating Human Feedback with Reinforcement Learning 论文

2013引用 304

Reinforcement Learning in RoboticsSoftware Engineering ResearchAdversarial Robustness in Machine Learning

机器人 Software Engineering Research Reinforcement Learning in Robotics Adversarial Robustness in Machine Learning

作者

摘要

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and val-ues and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback. 1

作者查看全部 (5)

Andrea L. Thomaz

Charles L. Isbell

Jonathan Scholz

K.A. Subramanian

Policy Shaping: Integrating Human Feedback with Reinforcement Learning 论文

摘要

作者查看全部 (5)

相关技术查看全部 (2)

相关事件

相关文章