One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs arXiv:2605.22297v2 Announce Type: replace-cross Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their effectiveness as the backbone of Large Language Models (LLMs). In this paper, we introduce Layerwise Learning Rate (LLR), an ad

One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs · 相关产品