Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads 文章

ArXiv CS.AI2026-06-09NEWSen作者: Minyu Cui, Miquel Pericas

摘要

arXiv:2606.09200v1 Announce Type: cross Abstract: The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model sizes and computational throughput continue to increase, communication overhead has become a dominant bottleneck in multi-GPU training, particularly when computation and communication are executed sequentially. This work explores concurrent execution of computation and collective communication using two portable runtime controls: shared-memory-driven occupancy shaping for computation kernels and elevated scheduling priority for communication kernels. Our approach regulates computation-kernel residency through per-block shared-memory allocation, leaving sufficient on-chip resources for communication kernels to make progress. In addition, assigning higher priority to communication streams ensures steady communication progress once resources become available.

Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术