PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding arXiv:2602.01322v2 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) interpret neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether ''Starbucks'' arises from the comp