Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization 事件
PRODUCT_LAUNCH2026-06-12影响: MEDIUM
Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization arXiv:2606.13276v1 Announce Type: cross Abstract: Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for GPT-2 pretraining and compare layer-wise assignments of Stiefel and