PAC Bounds for Multi-armed Bandit and Markov Decision Processes 论文

2002Lecture notes in computer science引用 304
Advanced Bandit Algorithms ResearchMachine Learning and AlgorithmsReinforcement Learning in Robotics