Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization 事件

PRODUCT_LAUNCH2026-06-12影响: MEDIUM

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization arXiv:2606.13276v1 Announce Type: cross Abstract: Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for GPT-2 pretraining and compare layer-wise assignments of Stiefel and