The Epoch-Greedy algorithm for contextual multi-armed bandits 论文

2007引用 330
Advanced Bandit Algorithms ResearchMachine Learning and AlgorithmsReinforcement Learning in Robotics

摘要

We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following prop-erties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales asO(T 2/3S1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning. 1