Exploiting Structure in Policy Construction 论文

1995引用 360

Bayesian Modeling and Causal InferenceAI-based Problem Solving and PlanningReinforcement Learning in Robotics

机器人 Reinforcement Learning in Robotics Bayesian Modeling and Causal Inference AI-based Problem Solving and Planning

作者

摘要

Markov decision processes (MDPs) have recently been applied to the problem of modeling decisiontheoretic planning. While traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and propositionalindependencies reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, policies and value functions, and the algorithm itself can be used in conjunction with recent approximation methods. 1 Introduction Increasingly research in planning has been directed towards problems in which the initial conditions and the e...

作者查看全部 (3)

Moisés Goldszmidt

Richard Dearden

Craig Boutilier

Exploiting Structure in Policy Construction 论文

详细信息

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章