Furina: Fragmented Uncertainty-Driven Refusal Instability Attack 事件

Name: Furina: Fragmented Uncertainty-Driven Refusal Instability Attack
Start: 2026-05-27

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack arXiv:2605.26158v1 Announce Type: cross Abstract: Safety alignment in large language models (LLMs) and multimodal large language models (MLLMs) is commonly assumed to operate as a near-binary threshold mechanism. We challenge this assumption by revealing that safety behavior is governed by an instability region where small perturbations induce stochastic refusal decisions rather than deterministic outcomes. We develop a multi-metr

人工智能

关系图谱

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (9)

相关报道查看全部 (1)