VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models 文章

ArXiv CS.CV2026-05-27NEWSen作者: Qilin Liao, Anamika Lochab, Ruqi Zhang

摘要

arXiv:2510.17759v2 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates, focus on single-attack settings, and expose only a narrow subset of vulnerabilities. To address these limitations, we introduce VERA-V, a variational inference framework that recasts multimodal jailbreak discovery as learning a joint posterior distribution over paired text-image prompts. This probabilistic view enables the generation of stealthy, coupled adversarial inputs that bypass model guardrails. We train a lightweight attacker to approximate the posterior, allowing efficient sampling of diverse jailbreaks and providing distributional insights into vulnerabilities.