Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization arXiv:2605.31558v1 Announce Type: cross Abstract: Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Tr