On biased compression for distributed learning

Beznosikov, Aleksandr; Horvath, Samuel; Richtarik, Peter; Safaryan, Mher

On biased compression for distributed learning

Beznosikov A, Horvath S, Richtarik P, Safaryan M. 2023. On biased compression for distributed learning. Journal of Machine Learning Research. 24, 1–50.

Download

2023_JMLR_Beznosikov.pdf 1.51 MB [Published Version]

Journal Article | Published | English

Author

Beznosikov, Aleksandr; Horvath, Samuel; Richtarik, Peter; Safaryan, Mher^ISTA

Department

Alistarh Group

Abstract

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate O(δLexp[−μKδL]+(C+δD)Kμ), where δ≥1 is a compression parameter which grows when more compression is applied, L and μ are the smoothness and strong convexity constants, C captures stochastic gradient noise (C=0 if full gradients are computed on each node) and D captures the variance of the gradients at the optimum (D=0 for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.

Publishing Year

2023

Date Published

2023-10-01

Journal Title

Journal of Machine Learning Research

Publisher

Journal of Machine Learning Research

Acknowledgement

The work in Sections 1-5 was conducted while A. Beznosikov was a research intern in the Optimizationand Machine Learning Lab of Peter Richtárik at KAUST; this visit was funded by the KAUST Baseline Research Funding Scheme. The work of A. Beznosikov in Section 6 was conducted in Skoltech and was supported by Ministry of Science and Higher Education grant No. 075-10-2021-068.

Volume

Page

1-50

eISSN

1533-7928

IST-REx-ID

14815

Cite this

Beznosikov A, Horvath S, Richtarik P, Safaryan M. On biased compression for distributed learning. Journal of Machine Learning Research. 2023;24:1-50.

Beznosikov, A., Horvath, S., Richtarik, P., & Safaryan, M. (2023). On biased compression for distributed learning. Journal of Machine Learning Research. Journal of Machine Learning Research.

Beznosikov, Aleksandr, Samuel Horvath, Peter Richtarik, and Mher Safaryan. “On Biased Compression for Distributed Learning.” Journal of Machine Learning Research. Journal of Machine Learning Research, 2023.

A. Beznosikov, S. Horvath, P. Richtarik, and M. Safaryan, “On biased compression for distributed learning,” Journal of Machine Learning Research, vol. 24. Journal of Machine Learning Research, pp. 1–50, 2023.

Beznosikov A, Horvath S, Richtarik P, Safaryan M. 2023. On biased compression for distributed learning. Journal of Machine Learning Research. 24, 1–50.

Beznosikov, Aleksandr, et al. “On Biased Compression for Distributed Learning.” Journal of Machine Learning Research, vol. 24, Journal of Machine Learning Research, 2023, pp. 1–50.

All files available under the following license(s):

Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):

licenses.cc_by_4_0.deed_url
https://creativecommons.org/licenses/by/4.0/legalcode

Main File(s)

File Name

2023_JMLR_Beznosikov.pdf 1.51 MB

Access Level

Open Access

Date Uploaded

2024-01-16

MD5 Checksum

c50f2b9db53938b755e30a085f464059

Export

Marked Publications

Open Data ISTA Research Explorer

On biased compression for distributed learning

Cite this

Export

Web of Science

Sources

Search this title in