Geometric Evolution Maps: Extracting Stable Concept Probes from Transformer Residual Streams 文章

ArXiv CS.AI2026-05-26NEWSen作者: James Henry

摘要

arXiv:2605.25848v1 Announce Type: cross Abstract: Concept probes extracted from transformer residual streams are only as reliable as the layer from which they are extracted. The common practice of probing at a fixed late layer or at the peak of a separation score function ignores a fundamental structural feature: concept representations undergo substantial directional rotation during their assembly phase, and do not settle into a stable direction until a characteristic handoff layer after the primary Concept Allocation Zone (CAZ). We introduce Geometric Evolution Maps (GEMs), which track the full directional trajectory of a concept through residual stream activations, identify the handoff layer where rotation ceases, and extract the settled probe direction from that layer. Across 23 architectures spanning 70M to 14B parameters and 17 concept types, the entry-to-exit cosine similarity within CAZs has a mean of 0.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据