Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers 事件
PRODUCT_LAUNCH2026-06-09影响: MEDIUM
Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers arXiv:2606.08292v1 Announce Type: new Abstract: In mechanistic interpretability, attention heads are commonly elevated to role claims (e.g., "this head represents addition") when they are necessary for a behavior, encode it linearly, and recover that behavior when restored after ablation. We show this evidence is insufficient: across three 7-8B instruction-tuned models and five computation famili