Policy Shaping: Integrating Human Feedback with Reinforcement Learning 论文
2013引用 304
Reinforcement Learning in RoboticsSoftware Engineering ResearchAdversarial Robustness in Machine Learning
摘要
A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and val-ues and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback. 1