Video-Mirai: Autoregressive Video Diffusion Models Need Foresight 文章

ArXiv CS.CV2026-06-03NEWSen作者: Yonghao Yu, Lang Huang, Runyi Li, Zerun Wang, Toshihiko Yamasaki

摘要

arXiv:2606.03971v1 Announce Type: new Abstract: Causal video generators must predict from the past, but they need not learn only from it. In streaming autoregressive video diffusion, each emitted segment becomes a commitment that future segments must preserve. Standard training, however, only asks each causal state to explain the present. This creates what we call a representation-level planning gap: states that fit the current segment may discard identity, layout, and motion information needed for a consistent future. We introduce Video-Mirai, a training-only method that closes this gap without changing causal inference: the generator rolls out causally, a frozen foresight encoder reads the completed rollout non-causally, and a lightweight predictor distills the resulting stopped-gradient targets into causal states. Future frames supervise representations, never generator inputs.

Video-Mirai: Autoregressive Video Diffusion Models Need Foresight 文章

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)