Evolutionary Function Approximation for Reinforcement Learning 论文
摘要
Abstract. Temporal difference methods are theoretically grounded and empirically effective methods for addressing sequential decision making problems with delayed rewards. Most problems of real-world interest require coupling TD methods with a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper introduces evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. Our method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, and Q-learning, a popular temporal difference method. The resulting NEAT+Q algorithm automatically learns effective representations for neural network function approximators. Empirical results in a server job scheduling task demonstrate that NEAT+Q can significantly improve the performance of TD methods. 1