Monte Carlo POMDPs 论文

1999Neural Information Processing Systems引用 234
Reinforcement Learning in RoboticsBayesian Modeling and Causal InferenceMachine Learning and Algorithms

摘要

We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sample-based version of nearest neighbor is used to generalize across states. Initial empirical results suggest that our approach works well in practical applications.