Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink 文章

ArXiv CS.CL2026-06-02NEWSen作者: Yuhang Jiang

摘要

arXiv:2606.00930v1 Announce Type: new Abstract: Mechanistic interpretability often assumes that probes identifying a representational signature also identify the circuit executing the corresponding computation. We show that this assumption can fail systematically in Mamba-2. Studying the state sink (disproportionate Delta-gate activation on boundary tokens, analogous to the attention sink), we find that single-bucket probes recover only a small execution layer while missing a much larger detection layer with the same representational signature. In Mamba-2, the state sink decomposes into two functional head sets. Single-bucket BOS-specialist heads (about 5% of heads at 2.7B) causally support both BOS-context and newline-target predictions across model scales and corpora. Dual heads (27-35% of heads, recovered by multi-class aggregation of the same probe) show stronger BOS-newline representational similarity but substantially weaker causal effects under ablation.

相关公司

暂无数据

相关人物

暂无数据