Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance 文章

ArXiv CS.CV2026-06-01NEWSen作者: Do\u{g}ukan Ba\u{g}c{\i}, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse

详细信息

来源站点: ArXiv CS.CV
作者: Do\u{g}ukan Ba\u{g}c{\i}, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse
文章类型: NEWS
语言: en
发布日期: 2026-06-01

摘要

arXiv:2605.31304v1 Announce Type: cross Abstract: Deep neural networks (DNNs) are widely used, but interpreting what they actually learn remains difficult. A major obstacle is that individual neurons often encode multiple unrelated concepts, obscuring the decision process of the network. While prior work, such as sparse autoencoders, can separate these mixed signals into more meaningful, "monosemantic" features, this typically requires altering the model in ways that can degrade downstream performance. To overcome this, we introduce ELUDe (explicit, lossless, unsupervised disentanglement), a method for improving the interpretability of DNNs while preserving their functional equivalence. ELUDe breaks latent representations into clear, inspectable sub-units that behave like interpretable features, while guaranteeing that the model's outputs remain exactly the same. It requires no explicit training, no labels, and can be applied to pretrained models.

Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)