Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance 文章

ArXiv CS.CV2026-06-01NEWSen作者: Do\u{g}ukan Ba\u{g}c{\i}, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse

详细信息

来源站点
ArXiv CS.CV
作者
Do\u{g}ukan Ba\u{g}c{\i}, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse
文章类型
NEWS
语言
en
发布日期
2026-06-01

摘要

arXiv:2605.31304v1 Announce Type: cross Abstract: Deep neural networks (DNNs) are widely used, but interpreting what they actually learn remains difficult. A major obstacle is that individual neurons often encode multiple unrelated concepts, obscuring the decision process of the network. While prior work, such as sparse autoencoders, can separate these mixed signals into more meaningful, "monosemantic" features, this typically requires altering the model in ways that can degrade downstream performance. To overcome this, we introduce ELUDe (explicit, lossless, unsupervised disentanglement), a method for improving the interpretability of DNNs while preserving their functional equivalence. ELUDe breaks latent representations into clear, inspectable sub-units that behave like interpretable features, while guaranteeing that the model's outputs remain exactly the same. It requires no explicit training, no labels, and can be applied to pretrained models.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据