Spectral Principal Paths: A Spectral Perspective on Linear Representation Formation in LLMs 文章

ArXiv CS.CV2026-05-27NEWSen作者: Bowei Tian, Xuntao Lyu, Meng Liu, Hongyi Wang, Ang Li

摘要

arXiv:2506.08543v3 Announce Type: replace Abstract: High-level representations have become a central focus in enhancing AI transparency and control, shifting attention from individual neurons or circuits to structured semantic directions that align with human-interpretable concepts. While the Linear Representation Hypothesis (LRH) suggests that such directions emerge in representations, it remains unclear how these representations originate and why they become increasingly stable across layers. To solve this issue, we introduce the Input-Space Linearity Hypothesis, positing that concept-aligned directions originate in the input space and are steadily maintained with increasing depth. We then propose the Spectral Principal Path (SPP) framework, which formalizes how deep networks progressively distill linear representations along the spectral principal directions.