Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study
Grubic D, Tam L, Alistarh D-A, Zhang C. 2018. Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. Proceedings of the 21st International Conference on Extending Database Technology. EDBT: Conference on Extending Database Technology, 145–156.
Download
Conference Paper
| Published
| English
Scopus indexed
Author
Grubic, Demjan;
Tam, Leo;
Alistarh, Dan-AdrianISTA ;
Zhang, Ce
Department
Abstract
Training deep learning models has received tremendous research interest recently. In particular, there has been intensive research on reducing the communication cost of training when using multiple computational devices, through reducing the precision of the underlying data representation. Naturally, such methods induce system trade-offs—lowering communication precision could de-crease communication overheads and improve scalability; but, on the other hand, it can also reduce the accuracy of training. In this paper, we study this trade-off space, and ask:Can low-precision communication consistently improve the end-to-end performance of training modern neural networks, with no accuracy loss?From the performance point of view, the answer to this question may appear deceptively easy: compressing communication through low precision should help when the ratio between communication and computation is high. However, this answer is less straightforward when we try to generalize this principle across various neural network architectures (e.g., AlexNet vs. ResNet),number of GPUs (e.g., 2 vs. 8 GPUs), machine configurations(e.g., EC2 instances vs. NVIDIA DGX-1), communication primitives (e.g., MPI vs. NCCL), and even different GPU architectures(e.g., Kepler vs. Pascal). Currently, it is not clear how a realistic realization of all these factors maps to the speed up provided by low-precision communication. In this paper, we conduct an empirical study to answer this question and report the insights.
Publishing Year
Date Published
2018-03-26
Proceedings Title
Proceedings of the 21st International Conference on Extending Database Technology
Publisher
OpenProceedings
Page
145-156
Conference
EDBT: Conference on Extending Database Technology
Conference Location
Vienna, Austria
Conference Date
2018-03-26 – 2018-03-29
ISBN
ISSN
IST-REx-ID
Cite this
Grubic D, Tam L, Alistarh D-A, Zhang C. Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. In: Proceedings of the 21st International Conference on Extending Database Technology. OpenProceedings; 2018:145-156. doi:10.5441/002/EDBT.2018.14
Grubic, D., Tam, L., Alistarh, D.-A., & Zhang, C. (2018). Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. In Proceedings of the 21st International Conference on Extending Database Technology (pp. 145–156). Vienna, Austria: OpenProceedings. https://doi.org/10.5441/002/EDBT.2018.14
Grubic, Demjan, Leo Tam, Dan-Adrian Alistarh, and Ce Zhang. “Synchronous Multi-GPU Training for Deep Learning with Low-Precision Communications: An Empirical Study.” In Proceedings of the 21st International Conference on Extending Database Technology, 145–56. OpenProceedings, 2018. https://doi.org/10.5441/002/EDBT.2018.14.
D. Grubic, L. Tam, D.-A. Alistarh, and C. Zhang, “Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study,” in Proceedings of the 21st International Conference on Extending Database Technology, Vienna, Austria, 2018, pp. 145–156.
Grubic D, Tam L, Alistarh D-A, Zhang C. 2018. Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. Proceedings of the 21st International Conference on Extending Database Technology. EDBT: Conference on Extending Database Technology, 145–156.
Grubic, Demjan, et al. “Synchronous Multi-GPU Training for Deep Learning with Low-Precision Communications: An Empirical Study.” Proceedings of the 21st International Conference on Extending Database Technology, OpenProceedings, 2018, pp. 145–56, doi:10.5441/002/EDBT.2018.14.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0):
Main File(s)
File Name
2018_OpenProceedings_Grubic.pdf
1.60 MB
Access Level
Open Access
Date Uploaded
2019-11-26
MD5 Checksum
ec979b56abc71016d6e6adfdadbb4afe