降低分布式训练通信的梯度稀疏压缩方法
陈世达,刘强,韩亮

Gradient sparsification compression approach to reducing communication in distributed training
Shi-da CHEN,Qiang LIU,Liang HAN
表 2 不同策略下模型的训练精度结果对比
Tab.2 Comparison of training accuracy of models under different strategies
数据集 网络模型 V/MB 训练精度/%
Baseline radixSelect DGC层级top-k RGC 剪枝top-k RGC LDTE-BS
CIFAR-10 ResNet101 162.17 93.70 93.21 (−0.49) 93.28 (−0.42) 93.23 (−0.47) 93.42 (−0.28)
CIFAR-10 DenseNet169 47.66 94.04 93.32 (−0.72) 93.29 (−0.75) 93.32 (−0.72) 93.56 (−0.48)
CIFAR-100 ResNet50 89.72 74.78 72.69 (−2.09) 72.49 (−2.29) 72.71 (−2.07) 73.11 (−1.67)
CIFAR-100 DenseNet121 26.54 75.41 73.25 (−2.16) 73.14 (−2.27) 73.17 (−2.24) 73.85 (−1.56)