Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts arXiv:2605.24270v1 Announce Type: new Abstract: Sparse mixture-of-experts (MoE) language models activate only a small subset of parameters for each token, making router behavior a central part of model computation. This paper studies routing behavior of Mixtral 8x7B-Instruct under benign and harmful prompts using two complementary signals: activation-based routing scores derived from expert selection frequencies an