Task Structure Reverses Layerwise State Encoding in Sequence Models 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Task Structure Reverses Layerwise State Encoding in Sequence Models arXiv:2606.00926v1 Announce Type: cross Abstract: Mechanistic studies of sequence models often treat layerwise state encodings as architectural traits: recurrent models concentrate readable state, attention-based models distribute it. We find that the same architecture reverses this profile when the task changes. Across Transformers, Mamba, Mamba-2, LSTMs, and GRUs, Parity is concentrated late in Mamba and the recurrent baselin

Task Structure Reverses Layerwise State Encoding in Sequence Models · 相关公司

C
ChangCOMPANY
A
arXivNONPROFIT
A
AnisNONPROFIT
E
EATNONPROFIT
I
IterRESEARCH_INSTITUTE
A
ACTNONPROFIT
F
FINDNONPROFIT