Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal 事件

Name: Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal
Start: 2026-05-27

REGULATION2026-05-27影响: MEDIUM

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal arXiv:2605.26772v1 Announce Type: new Abstract: Large reasoning models (LRMs) generate chain-of-thought (CoT) traces before producing final outputs, introducing a dynamic internal state that may complicate control mechanisms such as refusal. Unlike instruction-tuned LLMs, where refusal is mediated by a single directional subspace, refusal in large reasoning models (LRMs) additionally depends on the CoT. In DeepSeek-

人工智能

关系图谱