Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization 文章

ArXiv CS.AI2026-06-01NEWSen作者: Felipe Urrutia, Juan Jos\'e Alegr\'ia, Cinthia Sanchez Macias, Jorge Salas, Cristian B. Calderon, Cristobal Rojas

查看原文 →

关系图谱

摘要

arXiv:2605.31558v1 Announce Type: cross Abstract: Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic.

Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (1)