DynMuon: A Dynamic Spectral Shaping View of Muon 文章

ArXiv CS.AI2026-06-02NEWSen作者: Fangzhou Wu, Rikhav Shah, Sandeep Silwal, Qiuyi Zhang

摘要

arXiv:2605.17109v3 Announce Type: replace-cross Abstract: In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=U\Sigma V^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $U\Sigma^p V^\top$ for some parameter $p$. We call this a "spectral-shaping" operation, and develop a theory of how to pick $p$ which depends on (a) local curvature of the loss function, (b) noise stemming from stochastic gradients and label noise, and (c) training stage.

相关事件查看全部 (1)

DynMuon: A Dynamic Spectral Shaping View of Muon
2026-06-02PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据