UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem 论文

2010Periodica Mathematica Hungarica引用 291
Advanced Bandit Algorithms ResearchReinforcement Learning in RoboticsOptimization and Search Problems