EditCaption: Human-Refined SFT and HAE-DPO for Image Editing Instruction Synthesis 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
EditCaption: Human-Refined SFT and HAE-DPO for Image Editing Instruction Synthesis arXiv:2604.08213v2 Announce Type: replace Abstract: High-quality source-target image pairs with precise editing instructions are essential for instruction-guided image editing, yet constructing such training triplets at scale remains costly. Recent pipelines often rely on vision-language models to synthesize editing instructions automatically, but we find that strong VLMs still struggle to describe visual transfo