The Epoch-Greedy algorithm for contextual multi-armed bandits 论文

2007引用 330

Advanced Bandit Algorithms ResearchMachine Learning and AlgorithmsReinforcement Learning in Robotics

机器人 Machine Learning and Algorithms Reinforcement Learning in Robotics Advanced Bandit Algorithms Research

作者

摘要

We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following prop-erties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales asO(T 2/3S1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning. 1

作者查看全部 (1)

John Langford

The Epoch-Greedy algorithm for contextual multi-armed bandits 论文

摘要

作者查看全部 (1)

相关技术查看全部 (3)

相关事件

相关文章