详细信息
- 来源站点
- ArXiv CS.AI
- 作者
- Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-05-27
摘要
arXiv:2605.26647v1 Announce Type: cross Abstract: Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear projections. As an input-independent counterpart, we also introduce learnable activations (LA), which form linear combinations of activation functions for both ReLU-type and SwiGLU-type FFNs.
相关事件
暂无数据
相关人物
暂无数据