Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization 文章

ArXiv CS.AI2026-06-01NEWSen作者: Felipe Urrutia, Juan Jos\'e Alegr\'ia, Cinthia Sanchez Macias, Jorge Salas, Cristian B. Calderon, Cristobal Rojas

摘要

arXiv:2605.31558v1 Announce Type: cross Abstract: Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic.

相关公司

暂无数据

相关人物

暂无数据