Generative OOD-regularized Model-based Policy Optimization 文章

ArXiv CS.AI2026-05-26NEWSen作者: Aysin Tumay, Jiahe Huang, Elise Jortberg, Rose Yu

摘要

arXiv:2605.24405v1 Announce Type: cross Abstract: We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribution (OOD) actions when training relies only on sparse offline representations. To ensure safe offline policies in a sparse state-action space, we explore how density estimation models can be integrated into model-based RL methods to avoid the OOD regions. Generative models are capable of explicitly modeling the density in sparse state-action spaces. Building on this, we introduce Generative OOD-regularized Model-based Policy Optimization (GORMPO), a density-regularized offline RL algorithm that uses generative density modeling to restrict policy updates to high-density areas of the dataset. Furthermore, we examine whether better OOD detection corresponds to better model-based offline policies.

Generative OOD-regularized Model-based Policy Optimization 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (5)

相关技术查看全部 (15)