Spectral Scaling Laws of Muon 文章

ArXiv CS.AI2026-06-04NEWSen作者: Gagik Magakyan, Pablo Parrilo, Asuman Ozdaglar

摘要

arXiv:2606.04058v1 Announce Type: cross Abstract: Orthonormalized update rules have rapidly become a leading choice of optimizer for training large language models, with recent open-source state-of-the-art models adopting Muon. To keep these updates tractable, Muon performs the orthonormalization with the Newton--Schulz (NS) iteration. Since NS is only approximate, directions with small singular values fail to be orthonormalized. In Muon, NS is applied to the momentum matrix at every step, yet little is known about how the singular value spectrum of these momentum matrices behaves during training, or how that behavior changes with model size. We present the first systematic study of this question. Tracking singular value quantiles of the momentum buffer across layers in models ranging from 77M to 2.8B parameters, we observe a consistent picture: after a short burn-in, the quantiles stabilize at a value determined by the layer type and model size.

相关事件查看全部 (3)

Spectral Scaling Laws of Muon
2026-06-04BREAKTHROUGH影响: HIGH
Spectral Scaling Laws of Muon
2026-06-04OPEN_SOURCE影响: MEDIUM
Spectral Scaling Laws of Muon
2026-06-04PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据