Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems arXiv:2605.26657v1 Announce Type: new Abstract: Long-horizon decision problems with cumulative damage couple locally attractive actions to globally adverse outcomes. We identify two orthogonal failure modes for policy-gradient methods on this class and propose a decomposition that separates them: \emph{completion} (reaching the terminal horizon rather than exiting via an implicit terminal constraint) and \emph{

Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems · 相关人物

暂无数据