Least-Squares Temporal Difference Learning 论文

1999引用 261

Reinforcement Learning in RoboticsEvolutionary Algorithms and ApplicationsAdvanced Multi-Objective Optimization Algorithms

机器人 Advanced Multi-Objective Optimization Algorithms Evolutionary Algorithms and Applications Reinforcement Learning in Robotics

关系图谱

作者

摘要

TD ¢¡¤£ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD ¢¡¤ £ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it makes inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and ¡¦¥¨ § , the Least-Squares TD (LSTD) algorithm of Bradtke and Barto [5] eliminates all stepsize parameters and improves data efficiency. This paper extends Bradtke and Barto’s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from ¡©¥�§ to arbitrary values of ¡ ; at the extreme of ¡�¥� �, the resulting algorithm is shown to be a practical formulation of supervised linear regression. Third, it presents a novel, intuitive interpretation of LSTD as a model-based reinforcement learning technique. 1

作者查看全部 (1)

Justin A. Boyan

Least-Squares Temporal Difference Learning 论文

摘要

作者查看全部 (1)

相关技术查看全部 (3)

相关事件

相关文章