On the Optimizer Dependence of Neural Scaling Laws 文章

ArXiv CS.AI2026-05-29NEWSen作者: Vansh Ramani, Shourya Vir Jain

摘要

arXiv:2605.29387v1 Announce Type: cross Abstract: The scaling exponent $\alpha$ in neural scaling laws $L(N) \propto N^{-\alpha}$ is commonly treated as a fixed constant set by architecture and data. We present evidence that $\alpha$ depends systematically on the optimizer. In controlled random-feature regression experiments -- the canonical theoretical framework for neural scaling -- we measure $\alpha$ across five optimizer variants and six spectral conditions. Preconditioned optimizers consistently yield steeper scaling (larger $\alpha$), with the $\alpha$-shift increasing across most of the tested spectral range, peaking near $s = 1.5$, and remaining large at $s = 2.0$. At $s \approx 1.0$ (characteristic of natural language), the full natural gradient achieves $\alpha \approx 0.31$ versus $\alpha \approx 0.12$ for gradient descent -- a $2.6\times$ larger fitted exponent that, within the random-feature model, compounds with each model-size doubling.

On the Optimizer Dependence of Neural Scaling Laws 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (6)