Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm 文章

ArXiv CS.CL2026-06-16NEWSen作者: Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm · 相关人物

暂无数据