Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression 文章

ArXiv CS.AI2026-05-28NEWSen作者: Tue M. Cao, Nguyen Do, My T. Thai

详细信息

来源站点: ArXiv CS.AI
作者: Tue M. Cao, Nguyen Do, My T. Thai
文章类型: NEWS
语言: en
发布日期: 2026-05-28

摘要

arXiv:2605.28567v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult to scale are (1) matching semantically similar features across multi-layers and (2) compressing large feature circuits into interpretable supernodes. Although these have been treated as separate problems, we show that both are instances of a more fundamental challenge, which we frame as the estimation of semantic distances between SAE features that lie on different activation manifolds. We introduce a distributional framework for this problem, in which each feature is represented not by a single decoder vector like in the literature, but by an activation-weighted distribution over the hidden states that express it.

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)