BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization arXiv:2606.00079v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) large language models reduce per-token computation through sparse expert activation, but their deployment remains memory-intensive because all expert weights must be kept resident in memory. Existing MoE compression methods struggle in the ultra-low-bit regime: pruning irreversibly removes model capacity, while coarse-grained quantization f