Native Audio-Visual Alignment for Generation 事件

OPEN_SOURCE2026-05-29影响: MEDIUM

Native Audio-Visual Alignment for Generation arXiv:2605.30073v1 Announce Type: new Abstract: Joint audio-video generation aims to synthesize temporally synchronized and semantically coherent visual-acoustic content. However, existing open-source methods mainly rely on either dual-tower designs with posterior alignment or fully unified tri-modal designs that mix textual context, audio and video in one shared space. The former weakens fine-grained audio-video co-evolution, while the latter couple

Native Audio-Visual Alignment for Generation · 相关产品