Evolutionary Function Approximation for Reinforcement Learning 论文

2006引用 238
Evolutionary Algorithms and ApplicationsReinforcement Learning in RoboticsAdvanced Multi-Objective Optimization Algorithms

摘要

Abstract. Temporal difference methods are theoretically grounded and empirically effective methods for addressing sequential decision making problems with delayed rewards. Most problems of real-world interest require coupling TD methods with a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper introduces evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. Our method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, and Q-learning, a popular temporal difference method. The resulting NEAT+Q algorithm automatically learns effective representations for neural network function approximators. Empirical results in a server job scheduling task demonstrate that NEAT+Q can significantly improve the performance of TD methods. 1