PowLU: An Activation Function for Stable Pre-Training of LLMs 事件

REGULATION2026-05-26影响: MEDIUM

PowLU: An Activation Function for Stable Pre-Training of LLMs arXiv:2605.25704v1 Announce Type: new Abstract: In contemporary large language models (LLMs), the swish-gated linear unit (SwiGLU) activation function is widely adopted to regulate the information flow and introduce non-linearity. For large positive inputs, SwiGLU approximates the quadratic function $x^2$, providing strong nonlinearity and expressive capacity. However, this property also causes numerical instability as the input or m