Where does Absolute Position come from in decoder-only Transformers? 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

Where does Absolute Position come from in decoder-only Transformers? arXiv:2606.06160v1 Announce Type: cross Abstract: RoPE-trained transformers distinguish absolute position in their attention patterns, even though RoPE encodes only relative offsets in the inner product. We trace this leakage to two architectural components, The causal mask is responsible for the first: its per-query softmax denominator depends on the absolute query position by construction. The residual stream supplies the se