Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization arXiv:2605.31558v1 Announce Type: cross Abstract: Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Tr