Contextual bandits with linear Payoff functions 论文

2011引用 577

Advanced Bandit Algorithms ResearchOptimization and Search ProblemsMachine Learning and Algorithms

人工智能 Machine Learning and Algorithms Advanced Bandit Algorithms Research Optimization and Search Problems

作者

摘要

In this paper, we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoff functions. For T rounds, K actions, and d(√ dimensional feature vectors, we prove an O Td ln 3) (KT ln(T)/δ) regret bound that holds with probability 1 − δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω ( √ Td) for this setting, matching the upper bound up to logarithmic factors. 1

作者查看全部 (3)

Robert E. Schapire

Lev Reyzin

Lihong Li

Contextual bandits with linear Payoff functions 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章