Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models arXiv:2605.26409v1 Announce Type: cross Abstract: Evaluating and mitigating a generative system's susceptibility to jailbreak attacks is critical to its safe deployment. Given the number of deployable systems, full per-configuration evaluation and optimization is impractical. In this paper, we formalize the behavioral geometry of a population of models that, by leveraging previously evaluated and defended m