Multi-objective reinforcement learning using sets of pareto dominating policies 论文

2014引用 239

Advanced Multi-Objective Optimization AlgorithmsReinforcement Learning in RoboticsEnergy Efficiency and Management

机器人 Advanced Multi-Objective Optimization Algorithms Reinforcement Learning in Robotics Energy Efficiency and Management

关系图谱

作者

摘要

Many real-world problems involve the optimization of multiple, possibly conflicting ob-jectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In this paper, we present a novel temporal differ-ence learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies in a single run. We name this algorithm Pareto Q-learning and it is applicable in episodic environments with deterministic as well as stochastic transition func-tions. A crucial aspect of Pareto Q-learning is the updating mechanism that bootstraps sets of Q-vectors. One of our main contributions in this paper is a mechanism that sep-arates the expected immediate reward vector from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space. To balance exploration and exploitation

作者查看全部 (2)

Ann Nowé

Kristof Van Moffaert

Multi-objective reinforcement learning using sets of pareto dominating policies 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章