Stochastic policy gradient reinforcement learning on a simple 3D biped 论文

2005引用 270

Robotic Locomotion and ControlReinforcement Learning in RoboticsProsthetics and Rehabilitation Robotics

机器人 Reinforcement Learning in Robotics Robotic Locomotion and Control Prosthetics and Rehabilitation Robotics

作者

摘要

We present a learning system which is able to quickly and reliably acquire a robust feedback control policy for 3D dynamic walking from a blank-slate using only trials implemented on our physical robot. The robot begins walking within a minute and learning converges in approximately 20 minutes. This success can be attributed to the mechanics of our robot, which are modeled after a passive dynamic walker, and to a dramatic reduction in the dimensionality of the learning problem. We reduce the dimensionality by designing a robot with only 6 internal degrees of freedom and 4 actuators, by decomposing the control system in the frontal and sagittal planes, and by formulating the learning problem on the discrete return map dynamics. We apply a stochastic policy gradient algorithm to this reduced problem and decrease the variance of the update using a state-based estimate of the expected cost. This optimized learning system works quickly enough that the robot is able to continually adapt to the terrain as it walks.

作者查看全部 (3)

H. Sebastian Seung

Tianbao Zhang

Russ Tedrake

Stochastic policy gradient reinforcement learning on a simple 3D biped 论文

摘要

作者查看全部 (3)

相关技术查看全部 (1)

相关事件

相关文章