Bandits With Heavy Tail 论文

2013IEEE Transactions on Information Theory引用 249

Advanced Bandit Algorithms ResearchMachine Learning and AlgorithmsOptimization and Search Problems

人工智能 Machine Learning and Algorithms Advanced Bandit Algorithms Research Optimization and Search Problems

作者

摘要

The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$1 + \varepsilon $</tex></formula> , for some <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$ \varepsilon \in (0,1]$</tex></formula> . Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$M$</tex> </formula> -estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex Notation="TeX">$ \varepsilon < 1$</tex></formula> .

作者查看全部 (3)

Gábor Lugosi

Nicolò Cesa‐Bianchi

Sébastien Bubeck

Bandits With Heavy Tail 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章