Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path 论文

2007Machine Learning引用 249
Markov Chains and Monte Carlo MethodsMachine Learning and AlgorithmsReinforcement Learning in Robotics

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path · 相关文章

暂无数据