HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space arXiv:2509.22299v3 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared to dense LLMs. However, their large parameter counts result in prohibitive memory requirements, limiting practical deployment. While existing pruning methods primarily focus on expert-level pruning, this coarse granularity often lea
相关产品查看全部 (10)
相关报道查看全部 (1)
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
ArXiv CS.AI2026-05-26