Why Muon Outperforms Adam: A Curvature Perspective 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Why Muon Outperforms Adam: A Curvature Perspective arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss