More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations 文章

ArXiv CS.AI2026-05-27NEWSen作者: Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong

详细信息

来源站点: ArXiv CS.AI
作者: Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.26647v1 Announce Type: cross Abstract: Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear projections. As an input-independent counterpart, we also introduce learnable activations (LA), which form linear combinations of activation functions for both ReLU-type and SwiGLU-type FFNs.

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations 文章

详细信息

摘要

相关事件

相关公司查看全部 (4)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (21)