One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs arXiv:2605.22297v2 Announce Type: replace-cross Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their effectiveness as the backbone of Large Language Models (LLMs). In this paper, we introduce Layerwise Learning Rate (LLR), an ad