When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness? 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness? arXiv:2605.27932v1 Announce Type: new Abstract: Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly understood. Existing systems already span multiple process designs, including direct response generation, text-only prior turn, visual-state manipulation, and explicit external image-tool invocation. In this paper, we ask w

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness? · 相关技术