Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs 文章

ArXiv CS.AI2026-06-03NEWSen作者: Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

摘要

arXiv:2601.23229v2 Announce Type: replace Abstract: Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据