JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation 文章

ArXiv CS.CV2026-06-03NEWSen作者: Yinan Chen, Chuming Lin, Zhennan Chen, Yuxiang Zeng, Junwei Zhu, Yali Bi, Xijie Huang, Chengming Xu, Donghao Luo, Zhucun Xue, Xiaobin Hu, Chengjie Wang, Yong Liu, Jiangning Zhang, Shuicheng Yan

查看原文 →

关系图谱

摘要

arXiv:2606.03168v1 Announce Type: new Abstract: While instruction-based video editing has seen significant progress, joint audio-visual editing remains constrained by the absence of dedicated datasets and benchmarks. To bridge this gap, we present JAVEdit-100k, the first large-scale, high-quality dataset tailored for instruction-guided joint audio-visual editing. Focusing on human-centric videos, JAVEdit-100k comprises approximately 100K editing triplets spanning five distinct categories, including subject editing and speech editing. This dataset is rigorously constructed via four meticulously designed generation pipelines, seamlessly paired with an agent-in-the-loop quality control mechanism. Furthermore, to address the lack of standardized evaluation within the field, we introduce JAVEditBench, a comprehensive benchmark featuring curated source videos and human-aligned instructions across all editing categories.

JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation 文章

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术