R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning 论文

2001引用 352
Reinforcement Learning in RoboticsAdvanced Bandit Algorithms ResearchArtificial Intelligence in Games

R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning · 相关技术