---
_id: '8723'
abstract:
- lang: eng
  text: Deep learning at scale is dominated by communication time. Distributing samples
    across nodes usually yields the best performance, but poses scaling challenges
    due to global information dissemination and load imbalance across uneven sample
    lengths. State-of-the-art decentralized optimizers mitigate the problem, but require
    more iterations to achieve the same accuracy as their globally-communicating counterparts.
    We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic
    optimizer that reduces global communication via subgroup weight exchange. The
    key insight is a combination of algorithmic changes to the averaging scheme and
    the use of a group allreduce operation. We prove the convergence of WAGMA-SGD,
    and empirically show that it retains convergence rates similar to Allreduce-SGD.
    For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation;
    and deep reinforcement learning for navigation at scale. Compared with state-of-the-art
    decentralized SGD variants, WAGMA-SGD significantly improves training throughput
    (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest
    time-to-solution (e.g., the highest score using the shortest training time for
    Transformer).
acknowledgement: "This project has received funding from the European Research Council
  (ERC) under the European Union’s Hori-\r\nzon 2020 programme under Grant DAPP, Grant
  678880; EPi-GRAM-HS, Grant 801039; and ERC Starting Grant ScaleML, Grant 805223.
  The work of Tal Ben-Nun is supported by the Swiss National Science Foundation (Ambizione
  Project No. 185778). The work of Nikoli Dryden is supported by the ETH Postdoctoral
  Fellowship. The authors would like to thank the Swiss National Supercomputing Center
  for providing the computing resources and technical support."
article_number: '9271898'
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Shigang
  full_name: Li, Shigang
  last_name: Li
- first_name: Tal Ben-Nun
  full_name: Tal Ben-Nun, Tal Ben-Nun
  last_name: Tal Ben-Nun
- first_name: Giorgi
  full_name: Nadiradze, Giorgi
  id: 3279A00C-F248-11E8-B48F-1D18A9856A87
  last_name: Nadiradze
- first_name: Salvatore Di
  full_name: Girolamo, Salvatore Di
  last_name: Girolamo
- first_name: Nikoli
  full_name: Dryden, Nikoli
  last_name: Dryden
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
citation:
  ama: Li S, Tal Ben-Nun TB-N, Nadiradze G, et al. Breaking (global) barriers in parallel
    stochastic optimization with wait-avoiding group averaging. <i>IEEE Transactions
    on Parallel and Distributed Systems</i>. 2021;32(7). doi:<a href="https://doi.org/10.1109/TPDS.2020.3040606">10.1109/TPDS.2020.3040606</a>
  apa: Li, S., Tal Ben-Nun, T. B.-N., Nadiradze, G., Girolamo, S. D., Dryden, N.,
    Alistarh, D.-A., &#38; Hoefler, T. (2021). Breaking (global) barriers in parallel
    stochastic optimization with wait-avoiding group averaging. <i>IEEE Transactions
    on Parallel and Distributed Systems</i>. IEEE. <a href="https://doi.org/10.1109/TPDS.2020.3040606">https://doi.org/10.1109/TPDS.2020.3040606</a>
  chicago: Li, Shigang, Tal Ben-Nun Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo,
    Nikoli Dryden, Dan-Adrian Alistarh, and Torsten Hoefler. “Breaking (Global) Barriers
    in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.” <i>IEEE
    Transactions on Parallel and Distributed Systems</i>. IEEE, 2021. <a href="https://doi.org/10.1109/TPDS.2020.3040606">https://doi.org/10.1109/TPDS.2020.3040606</a>.
  ieee: S. Li <i>et al.</i>, “Breaking (global) barriers in parallel stochastic optimization
    with wait-avoiding group averaging,” <i>IEEE Transactions on Parallel and Distributed
    Systems</i>, vol. 32, no. 7. IEEE, 2021.
  ista: Li S, Tal Ben-Nun TB-N, Nadiradze G, Girolamo SD, Dryden N, Alistarh D-A,
    Hoefler T. 2021. Breaking (global) barriers in parallel stochastic optimization
    with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed
    Systems. 32(7), 9271898.
  mla: Li, Shigang, et al. “Breaking (Global) Barriers in Parallel Stochastic Optimization
    with Wait-Avoiding Group Averaging.” <i>IEEE Transactions on Parallel and Distributed
    Systems</i>, vol. 32, no. 7, 9271898, IEEE, 2021, doi:<a href="https://doi.org/10.1109/TPDS.2020.3040606">10.1109/TPDS.2020.3040606</a>.
  short: S. Li, T.B.-N. Tal Ben-Nun, G. Nadiradze, S.D. Girolamo, N. Dryden, D.-A.
    Alistarh, T. Hoefler, IEEE Transactions on Parallel and Distributed Systems 32
    (2021).
date_created: 2020-11-05T15:25:43Z
date_published: 2021-07-01T00:00:00Z
date_updated: 2023-08-04T11:08:52Z
day: '01'
department:
- _id: DaAl
doi: 10.1109/TPDS.2020.3040606
ec_funded: 1
external_id:
  arxiv:
  - '2005.00124'
  isi:
  - '000621405200019'
intvolume: '        32'
isi: 1
issue: '7'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2005.00124
month: '07'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: IEEE Transactions on Parallel and Distributed Systems
publication_identifier:
  issn:
  - '10459219'
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: Breaking (global) barriers in parallel stochastic optimization with wait-avoiding
  group averaging
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 32
year: '2021'
...