REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak 事件

Name: REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak
Start: 2026-06-04

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak arXiv:2605.20654v2 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that circumvent conventional surface-level safety alignment by exploiting the internal generation process. To address these vulnerabilities, we propose Reflector, a principled two-stage framework that internalizes self-reflec

人工智能

关系图谱

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)