PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding arXiv:2602.01322v2 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) interpret neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether ''Starbucks'' arises from the comp