Why Muon Outperforms Adam: A Curvature Perspective 事件
PRODUCT_LAUNCH2026-06-04影响: MEDIUM
Why Muon Outperforms Adam: A Curvature Perspective arXiv:2606.04662v1 Announce Type: cross Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss
相关产品查看全部 (10)
相关报道查看全部 (1)
Why Muon Outperforms Adam: A Curvature Perspective
ArXiv CS.AI2026-06-04