Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path 论文
2007Machine Learning引用 249
Markov Chains and Monte Carlo MethodsMachine Learning and AlgorithmsReinforcement Learning in Robotics
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path · 相关文章
暂无数据