Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal 事件
REGULATION2026-05-27影响: MEDIUM
Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal arXiv:2605.26772v1 Announce Type: new Abstract: Large reasoning models (LRMs) generate chain-of-thought (CoT) traces before producing final outputs, introducing a dynamic internal state that may complicate control mechanisms such as refusal. Unlike instruction-tuned LLMs, where refusal is mediated by a single directional subspace, refusal in large reasoning models (LRMs) additionally depends on the CoT. In DeepSeek-