Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path 论文

2007Machine Learning引用 249
Markov Chains and Monte Carlo MethodsMachine Learning and AlgorithmsReinforcement Learning in Robotics