Singular Vectors of Attention Heads Align with Features 文章

ArXiv CS.AI2026-05-28NEWSen作者: Gabriel Franco, Carson Loughridge, Mark Crovella

摘要

arXiv:2602.13524v2 Announce Type: replace-cross Abstract: Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made the observation that feature representations can be inferred in some cases from singular vectors of attention matrices. However, sound justification for this phenomenon is lacking. In this paper we address that question, asking: why and when do singular vectors align with features? First, we demonstrate that singular vectors robustly align with features in a model where features can be directly observed. We then show theoretically that such alignment is expected under a range of conditions. We close by asking how, operationally, alignment may be recognized in real models where feature representations are not directly observable.

相关事件查看全部 (1)

Singular Vectors of Attention Heads Align with Features
2026-05-28PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据