More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations 文章

ArXiv CS.AI2026-05-27NEWSen作者: Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong

详细信息

来源站点
ArXiv CS.AI
作者
Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong
文章类型
NEWS
语言
en
发布日期
2026-05-27

摘要

arXiv:2605.26647v1 Announce Type: cross Abstract: Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear projections. As an input-independent counterpart, we also introduce learnable activations (LA), which form linear combinations of activation functions for both ReLU-type and SwiGLU-type FFNs.