Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models 文章

ArXiv CS.CL2026-06-04NEWSen作者: Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song

详细信息

来源站点: ArXiv CS.CL
作者: Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song
文章类型: NEWS
语言: en
发布日期: 2026-06-04

摘要

arXiv:2601.09719v3 Announce Type: replace Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic Tanh (DyT) improve throughput but remain fragile at depth. To jointly address stability and efficiency, we propose Bounded Hyperbolic Tanh (BHyT), a drop-in replacement for Pre-LN. BHyT combines a tanh nonlinearity with explicit, data-driven input bounding to keep activations within a non-saturating range. It prevents depth-wise growth in activation magnitude and variance and provides a theoretical stability guarantee.

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (6)