REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak 事件
PRODUCT_LAUNCH2026-06-04影响: MEDIUM
REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak arXiv:2605.20654v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that circumvent conventional surface-level safety alignment by exploiting the internal generation process. To address these vulnerabilities, we propose Reflector, a principled two-stage framework that internalizes self-reflec
相关产品查看全部 (10)
相关报道查看全部 (1)
REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak
ArXiv CS.AI2026-06-04