Tempered Self-Similarity Alignment for Physically Plausible Video Generation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Manjin Kim, Suha Kwak, Minsu Cho

摘要

arXiv:2605.24962v1 Announce Type: new Abstract: Despite remarkable advances in video generative models, they still struggle to generate physically realistic videos, frequently exhibiting appearance drift, implausible motion, and temporal inconsistencies. In this work, we address this limitation by transferring relational knowledge encoded in spatio-temporal self-similarity (STSS) from visual foundation models into video generative models. STSS represents pairwise similarities among features across space and time, revealing the relational structure of how objects interact with other entities throughout a video, effectively capturing real-world dynamics, including object motion and semantic transformations.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据