Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors 论文

2021IEEE Transactions on Neural Networks and Learning Systems引用 280

Reinforcement Learning in RoboticsAdversarial Robustness in Machine LearningAdaptive Dynamic Programming Control

机器人 Reinforcement Learning in Robotics Adversarial Robustness in Machine Learning Adaptive Dynamic Programming Control

作者

摘要

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q -value overestimations, thus greatly reducing policy performance. This article presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q -value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q -value overestimations because it is capable of adaptively adjusting the update step size of the Q -value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state-action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.

作者查看全部 (6)

Qi Sun

Bo Cheng

Yangang Ren

Shengbo Eben Li

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors 论文

摘要

作者查看全部 (6)

相关技术查看全部 (2)

相关事件

相关文章