Tempered Self-Similarity Alignment for Physically Plausible Video Generation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Manjin Kim, Suha Kwak, Minsu Cho

摘要

arXiv:2605.24962v1 Announce Type: new Abstract: Despite remarkable advances in video generative models, they still struggle to generate physically realistic videos, frequently exhibiting appearance drift, implausible motion, and temporal inconsistencies. In this work, we address this limitation by transferring relational knowledge encoded in spatio-temporal self-similarity (STSS) from visual foundation models into video generative models. STSS represents pairwise similarities among features across space and time, revealing the relational structure of how objects interact with other entities throughout a video, effectively capturing real-world dynamics, including object motion and semantic transformations.

Tempered Self-Similarity Alignment for Physically Plausible Video Generation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)