Almost Optimal Exploration in Multi-Armed Bandits 论文

2013引用 244

Advanced Bandit Algorithms ResearchOptimization and Search ProblemsReinforcement Learning in Robotics

机器人 Reinforcement Learning in Robotics Advanced Bandit Algorithms Research Optimization and Search Problems

作者

摘要

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameterfree algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doublylogarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases. 1.

作者查看全部 (3)

Oren Somekh

Tomer Koren

Zohar Karnin

Almost Optimal Exploration in Multi-Armed Bandits 论文

详细信息

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章