PaintBench: Deterministic Evaluation of Precise Visual Editing 文章

ArXiv CS.CV2026-06-03NEWSen作者: Kai Xu, Ellis Brown, Shrikar Madhu, Rob Fergus, He He, Saining Xie

摘要

arXiv:2606.00188v1 Announce Type: cross Abstract: While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据