MiVE: Multiscale Vision-language features for reference-guided video Editing 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

MiVE: Multiscale Vision-language features for reference-guided video Editing arXiv:2605.14664v2 Announce Type: replace Abstract: Reference-guided video editing takes a source video, a text instruction, and a reference image as inputs, requiring the model to faithfully apply the instructed edits while preserving original motion and unedited content. Existing methods fall into two paradigms, each with inherent limitations: decoupled encoders suffer from modality gaps when processing instructions

MiVE: Multiscale Vision-language features for reference-guided video Editing · 相关人物