Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling arXiv:2606.00888v1 Announce Type: cross Abstract: Dynamic Sparse Training (DST) offers a promising paradigm for improving the training and inference efficiency of deep neural networks; however, we find that in large language model training, DST can suffer from optimization instability, manifested as loss spikes after topology updates. In this work, we show that the naive use of standard Adam-based optimizer