R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning 论文
2001引用 352
Reinforcement Learning in RoboticsAdvanced Bandit Algorithms ResearchArtificial Intelligence in Games
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning · 相关文章
暂无数据