计算机与控制工程 |
|
|
|
|
降低分布式训练通信的梯度稀疏压缩方法 |
陈世达1,2( ),刘强1,2,*( ),韩亮3 |
1. 天津大学 微电子学院,天津 300072 2. 天津市成像与感知微电子技术重点实验室,天津 300072 3. 阿里巴巴集团,美国 加利福尼亚州 森尼韦尔 94085 |
|
Gradient sparsification compression approach to reducing communication in distributed training |
Shi-da CHEN1,2( ),Qiang LIU1,2,*( ),Liang HAN3 |
1. School of Microelectronics, Tianjin University, Tianjin 300072, China 2. Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China 3. Alibaba Group, Sunnyvale 94085, USA |
1 |
TANENBAUM A S, VAN STEEN M. Distributed systems: principles and paradigms [M]. New York: Prentice-Hall, 2007: 17-24.
|
2 |
DEAN J, CORRADO G, MONGA R, et al. Large scale distributed deep networks [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems: Volume 1. Lake Tahoe: Curran Associates, 2012: 1223-1231.
|
3 |
XU H, HO C Y, ABDELMONIEM A M, et al. Compressed communication for distributed deep learning: survey and quantitative evaluation [EB/OL]. [2020-4-13]. https: //repository. kaust. edu. sa/handle/10754/662495.
|
4 |
FANG J, FU H, YANG G, et al RedSync: reducing synchronization bandwidth for distributed deep learning training system[J]. Journal of Parallel and Distributed Computing, 2019, 133: 30- 39
doi: 10.1016/j.jpdc.2019.05.016
|
5 |
CHEN C Y, CHOI J, BRAND D, et al. Adacomp: adaptive residual gradient compression for data-parallel distributed training [C]// 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI press, 2018: 2827-2835.
|
6 |
AJI A F, HEAFIELD K. Sparse communication for distributed gradient descent [C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics (ACL), 2017: 440–445.
|
7 |
LIN Y, HAN S, MAO H, et al. Deep gradient compression: reducing the communication bandwidth for distributed training [EB/OL]. [2017-12-5]. https: //arxiv. org/abs/1712.01887.
|
8 |
SUN H, SHAO Y, JIANG J, et al. Sparse gradient compression for distributed SGD [C]// International Conference on Database Systems for Advanced Applications. Chiang Mai: Springer, 2019: 139-155.
|
9 |
SATTLER F, WIEDEMANN S, MüLLER K R, et al. Sparse binary compression: towards distributed deep learning with minimal communication [C]// 2019 International Joint Conference on Neural Networks (IJCNN). Budapest: IEEE, 2019: 1-8.
|
10 |
STICH S U, CORDONNIER J B, JAGGI M. Sparsified SGD with memory [C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: Curran Associates, 2018: 4452-4463.
|
11 |
ALISTARH D, HOEFLER T, JOHANSSON M, et al. The convergence of sparsified gradient methods [C]// Proceedings of the Thirty-second International Conference on Neural Information Processing Systems. Montreal: Curran Associates, 2018: 5977-5987.
|
12 |
DUTTA A, BERGOU E H, ABDELMONIEM A M, et al. On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning [EB/OL]. [2019-11-19]. https: //arxiv. org/abs/1911.08250.
|
13 |
STROM N. Scalable distributed DNN training using commodity GPU cloud computing [C]// 16th Annual Conference of the International Speech Communication Association. Dresden: International Speech Communication Association, 2015: 1488-1492.
|
14 |
ALABI T, BLANCHARD J D, GORDON B, et al Fast k-selection algorithms for graphics processing units[J]. Journal of Experimental Algorithmics, 2012, 17 (4): 1- 29
|
15 |
WEN W, XU C, YAN F, et al. Terngrad: ternary gradients to reduce communication in distributed deep learning [C]// Proceedings of the Thirty-first International Conference on Neural Information Processing Systems. Long Beach: Curran Associates, 2017: 1508-1518.
|
16 |
BERNSTEIN J, WANG Y X, AZIZZADENESHELI K, et al. signSGD: compressed optimisation for non-convex problems [C]// Proceedings of the International Conference on Machine Learning, Stockholm: International Machine Learning Society , 2018: 894-918.
|
17 |
HE L, ZHENG S, CHEN W, et al OptQuant: distributed training of neural networks with optimized quantization mechanisms[J]. Neurocomputing, 2019, 340: 233- 244
doi: 10.1016/j.neucom.2019.02.049
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|