Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning 文章

ArXiv CS.CL2026-05-29NEWSen作者: Zhenghao Herbert Zhou, R. Thomas McCoy, Robert Frank

摘要

arXiv:2605.29971v1 Announce Type: new Abstract: Causal interventions in language model representations have largely targeted discrete features, like grammatical number. However, language models must also make use of features that are graded. We introduce a method for causal intervention on continuous variables: given activation vectors paired with a graded target variable, we localize a low-dimensional direction for that variable and use this direction to edit a vectors toward counterfactual target values. We apply this method to a continuous feature that is well-studied in psycholinguistics, namely verb bias (which reflects which syntactic structures tend to follow a given verb). We show that verb bias is causally represented in steering vectors extracted from large language models: counterfactual edits to verb bias systematically shift downstream structural preferences. Verb bias has also previously been linked to in-context learning;

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据