Speeding Up Distributed Machine Learning Using Codes 论文

2017IEEE Transactions on Information Theory引用 866

Stochastic Gradient Optimization TechniquesAdvanced Data Storage TechnologiesCaching and Content Delivery

Stochastic Gradient Optimization Techniques Advanced Data Storage Technologies Caching and Content Delivery

作者

摘要

Codes are widely used in many engineering applications to offer <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">robustness against <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms—straggler nodes, system failures, or communication bottlenecks—but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">matrix multiplication and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> , and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\log n$ </tex-math></inline-formula> . For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is the number of workers, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">coded shuffling reduces the communication cost by a factor of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\left({\alpha + \frac {1}{n}}\right)\gamma (n)$ </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma (n)$ </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> messages to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> users. For instance, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\gamma (n) \simeq n$ </tex-math></inline-formula> if multicasting a message to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.

作者查看全部 (5)

Kannan Ramchandran

Dimitris Papailiopoulos

Ramtin Pedarsani

Maximilian Lam

Speeding Up Distributed Machine Learning Using Codes 论文

摘要

作者查看全部 (5)

相关技术查看全部 (1)

相关事件

相关文章