Offline Reinforcement Learning with Generative Trajectory Policies 文章

ArXiv CS.AI2026-05-29NEWSen作者: Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen

摘要

arXiv:2510.11499v2 Announce Type: replace-cross Abstract: Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture complex, multi-modal behaviors. However, existing methods face a stark trade-off: slow, iterative models like diffusion policies are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance. In this paper, we demonstrate that it is possible to bridge this gap. The key to moving beyond the limitations of individual methods, we argue, lies in a unifying perspective that views modern generative models, including diffusion, flow matching, and consistency models, as specific instances of learning a continuous-time generative trajectory governed by an Ordinary Differential Equation (ODE).

Offline Reinforcement Learning with Generative Trajectory Policies 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (7)