Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m 文章

ArXiv CS.CL2026-05-26NEWSen作者: Jordan F. McCann

详细信息

来源站点: ArXiv CS.CL
作者: Jordan F. McCann
文章类型: NEWS
语言: en
发布日期: 2026-05-26

摘要

arXiv:2605.24577v1 Announce Type: cross Abstract: Independently trained transformers compute the same function in residual-stream bases that differ by a uniform random rotation on $\mathrm{SO}(d_{\mathrm{model}})$. We call this phenomenon polymorphism: same function, mutually unintelligible interior coordinates. One matrix multiplication per model pair removes it: an orthogonal Procrustes fit on a single batch of activations transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models, with no retraining. The phenomenon is invisible to the standard SAE universality metric. Decoder-column cosine similarity matches across seeds at 98%, the SAE-universality headline number, while an SAE trained on one seed reconstructs another seed's activations at negative explained variance, worse than predicting the constant mean. The decoder columns align; the encoder reads from a rotated frame.

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (5)