Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation 事件
OPEN_SOURCE2026-05-26影响: MEDIUM
Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation arXiv:2605.25195v1 Announce Type: new Abstract: Current open-source diffusion models struggle to generate stable and synchronized audio-visual content, particularly in scenarios demanding complex semantic reasoning. The root cause is that existing methods rely on coarse text embeddings from off-the-shelf encoders to guide audio-video denoising, which discards fine-grained semantics and, critically, lacks a shared long-horizon
相关产品查看全部 (10)
相关报道查看全部 (1)
Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation
ArXiv CS.CV2026-05-26