PROWL: Prioritized Regret-Driven Optimization for World Model Learning 文章

ArXiv CS.AI2026-06-01NEWSen作者: Ahmet H. G\"uzel, Jenny Seidenschwarz, Benjamin Graham, Jonathan Sadeghi, Jeffrey Hawke, Ilija Bogunovic

查看原文 →

关系图谱

摘要

arXiv:2605.18803v2 Announce Type: replace-cross Abstract: Modern action-conditioned video world models achieve strong short-horizon visual realism, yet remain unreliable on rare, interaction-critical transitions that dominate downstream planning and policy performance. Because passive demonstration data systematically under-samples these high-impact regimes, improving robustness requires actively eliciting model failures rather than relying on their natural occurrence. We introduce a KL-constrained adversarial curriculum in which a policy is trained to expose high-error trajectories of a diffusion-based world model while remaining close to the behavior distribution. The world model is continuously fine-tuned on these adversarially discovered trajectories, yielding an adversarial training loop that converts rare failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation.

PROWL: Prioritized Regret-Driven Optimization for World Model Learning 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)