Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets 文章

ArXiv CS.AI2026-06-10NEWSen作者: Yi Chen (Lucy), Rushuai Yang (Lucy), Qiang Chen (Lucy), Dongyan (Lucy), Huo

详细信息

来源站点: ArXiv CS.AI
作者: Yi Chen (Lucy), Rushuai Yang (Lucy), Qiang Chen (Lucy), Dongyan (Lucy), Huo
文章类型: NEWS
语言: en
发布日期: 2026-06-10

摘要

arXiv:2606.10979v1 Announce Type: new Abstract: Many Markov decision processes (MDPs) in operations research have feasible actions that are state dependent and defined implicitly by various operational constraints. These features make it difficult to use standard deep reinforcement learning (DRL) algorithms, whose action interfaces typically assume either a fixed finite action catalog or a simple Euclidean space. Motivated by a Taylor expansion of the optimal action-value function, we propose Bellman--Taylor score decoding, a framework that moves policy learning to a Euclidean score space while enforcing feasibility through an action decoder. The induced latent-score MDP then can be optimized by standard DRL algorithms without differentiating through the decoder. We provide a performance guarantee showing that the optimality gap of this approach decomposes into a structural approximation error and an algorithmic learning error.

Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (4)