MiVE: Multiscale Vision-language features for reference-guided video Editing 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
MiVE: Multiscale Vision-language features for reference-guided video Editing arXiv:2605.14664v2 Announce Type: replace Abstract: Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and unedited content. Existing methods fall into two paradigms, each with inherent limitations: decoupled encoders suffer from modality gaps when processing instructions
相关产品查看全部 (10)
相关报道查看全部 (1)
MiVE: Multiscale Vision-language features for reference-guided video Editing
ArXiv CS.CV2026-05-27