Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models 文章

ArXiv CS.CL2026-06-04NEWSen作者: Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song

详细信息

来源站点
ArXiv CS.CL
作者
Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song
文章类型
NEWS
语言
en
发布日期
2026-06-04

摘要

arXiv:2601.09719v3 Announce Type: replace Abstract: Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic Tanh (DyT) improve throughput but remain fragile at depth. To jointly address stability and efficiency, we propose Bounded Hyperbolic Tanh (BHyT), a drop-in replacement for Pre-LN. BHyT combines a tanh nonlinearity with explicit, data-driven input bounding to keep activations within a non-saturating range. It prevents depth-wise growth in activation magnitude and variance and provides a theoretical stability guarantee.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据