BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning 文章

ArXiv CS.CL2026-05-27NEWSen作者: Xuan Luo, Yue Wang, Geng Tu, Jing Li, Ruifeng Xu

详细信息

来源站点: ArXiv CS.CL
作者: Xuan Luo, Yue Wang, Geng Tu, Jing Li, Ruifeng Xu
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.27110v1 Announce Type: cross Abstract: In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed example. By expanding each step upon the model's previous responses, BAIT turns the model's own reasoning and consistency tendency into a disclosure pathway. Experiments on AdvBench, JailbreakBench, AIR-Bench, and SORRY-Bench demonstrate that BAIT consistently achieves strong attack success rates across top-tier large language models, significantly advancing conventional jailbreak baselines. Further analysis reveals that: 1) prevention-oriented framing significantly outperforms direct knowledge request; 2) the refinement step plays a critical role in disclosure escalation;

BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning 文章

详细信息

摘要

相关事件

相关公司查看全部 (6)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (23)