Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm 文章

ArXiv CS.CL2026-06-16NEWSen作者: Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang

详细信息

来源站点
ArXiv CS.CL
作者
Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2602.11543v3 Announce Type: replace Abstract: Pretraining large language models (LLMs) typically requires centralized clusters with thousands of high-memory GPUs (e.g., H100/A100). Recent decentralized training methods reduce communication overhead by employing federated optimization; however, they still need to train the entire model on each node, remaining constrained by GPU memory limitations. In this work, we propose SParse Expert Synchronization (SPES), a memory-efficient decentralized framework for pretraining mixture-of-experts (MoE) LLMs. SPES trains only a subset of experts per node, substantially lowering the memory footprint. Each node updates its local experts and periodically synchronizes with other nodes, eliminating full-parameter transmission while ensuring efficient knowledge sharing.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据