Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation 事件

OPEN_SOURCE2026-05-26影响: MEDIUM

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation arXiv:2605.25195v1 Announce Type: new Abstract: Current open-source diffusion models struggle to generate stable and synchronized audio-visual content, particularly in scenarios demanding complex semantic reasoning. The root cause is that existing methods rely on coarse text embeddings from off-the-shelf encoders to guide audio-video denoising, which discards fine-grained semantics and, critically, lacks a shared long-horizon

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation · 相关技术