ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions arXiv:2605.27819v1 Announce Type: cross Abstract: Sparse autoencoders are usually trained one layer at a time, even though transformer residual stream activations are strongly coupled across depth. This creates a practical problem for multi-layer interventions: different layerwise dictionaries can spend capacity representing the same carried-forward information, and replacing several layers at once can produce int