Is Inference Mediated by Distinct Semantic Structures in LLMs? A Mechanistic Interpretation 文章

ArXiv CS.CL2026-05-26NEWSen作者: Nura Aljaafari, Marco Valentino, Andr\'e Freitas

摘要

arXiv:2605.25520v1 Announce Type: new Abstract: Predicting a label correctly does not necessarily require representing the operation that produces it. Transformer representations are known to carry label-level information, but whether they encode semantic operations producing those labels is unclear. We investigate this in Natural Language Inference using controlled premise-hypothesis pairs that differ by a single semantic transformation. Using layer-wise activations, we estimate operation-level subspaces via SVD and test their causal relevance through activation steering in four open-weight decoder models. Transformation effects are decodable with $84.8$-$99\%$ accuracy and occupy partially distinct but overlapping subspaces, exceeding random-subspace baselines. Steering experiments show that these directions causally influence predictions, though steerability varies across models;

Is Inference Mediated by Distinct Semantic Structures in LLMs? A Mechanistic Interpretation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (4)