Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization 文章

ArXiv CS.CV2026-06-01NEWSen作者: Rongzhen Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen

摘要

arXiv:2605.31508v1 Announce Type: new Abstract: Video Object-Centric Learning (OCL) aims to represent objects as \textit{slot} vectors and maintain their consistency across frames. Slot-Slot Contrastive (SSC) loss has become the cornerstone for state-of-the-art (SOTA) video OCL methods. While highly effective, SSC relies on one-to-one object correspondence across frames and introduces an extra loss. Following Occam's Razor, we propose a paradigm shift: temporal consistency is better enforced as an implicit model design rather than an explicit loss. To elegantly exclude SSC (\textbf{xSSC}), we introduce two quasi-zero-overhead synergistic mechanisms: (\textit{i}) Chrono-Channel Decomposition (CCD) structurally disentangles slot representations along the channel dimension into \textit{static} and \textit{dynamic} sub-spaces, serving as an empirically unified information bottleneck;

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (3)