Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View arXiv:2606.04405v1 Announce Type: cross Abstract: Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-norm weight decay acts purely along the radial direction of the weight space and cannot directly simplify the funct

Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View · 相关报道