Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse 文章

ArXiv CS.CL2026-05-28NEWSen作者: Zizhuo Fu, Wenxuan Zeng, Runsheng Wang, Meng Li

Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse · 相关技术