PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI 事件

BREAKTHROUGH2026-05-28影响: HIGH

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI arXiv:2605.27545v1 Announce Type: new Abstract: Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe text and current defenses are relatively immature. We introduce PAST2HARM, a simple yet effective adaptive jailbreak framework that bypasses refusal training in state of the art multimodal text to image models. Building on pri

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI · 相关报道