Why Muon Outperforms Adam: A Curvature Perspective 事件
PRODUCT_LAUNCH2026-06-04影响: MEDIUM
Why Muon Outperforms Adam: A Curvature Perspective arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss
Why Muon Outperforms Adam: A Curvature Perspective · 相关报道
相关报道
Why Muon Outperforms Adam: A Curvature Perspective
ArXiv CS.AI2026-06-04