Almost Optimal Exploration in Multi-Armed Bandits 论文

2013引用 244
Advanced Bandit Algorithms ResearchOptimization and Search ProblemsReinforcement Learning in Robotics

详细信息

发表日期
2013-06-16
发表年份
2013

关键词

Advanced Bandit Algorithms ResearchOptimization and Search ProblemsReinforcement Learning in Robotics

摘要

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameterfree algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doublylogarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases. 1.