降低分布式训练通信的梯度稀疏压缩方法
|
陈世达,刘强,韩亮
|
Gradient sparsification compression approach to reducing communication in distributed training
|
Shi-da CHEN,Qiang LIU,Liang HAN
|
|
表 2 不同策略下模型的训练精度结果对比 |
Tab.2 Comparison of training accuracy of models under different strategies |
|
数据集 | 网络模型 | V/MB | 训练精度/% | Baseline | radixSelect | DGC层级top-k | RGC 剪枝top-k | RGC LDTE-BS | CIFAR-10 | ResNet101 | 162.17 | 93.70 | 93.21 (−0.49) | 93.28 (−0.42) | 93.23 (−0.47) | 93.42 (−0.28) | CIFAR-10 | DenseNet169 | 47.66 | 94.04 | 93.32 (−0.72) | 93.29 (−0.75) | 93.32 (−0.72) | 93.56 (−0.48) | CIFAR-100 | ResNet50 | 89.72 | 74.78 | 72.69 (−2.09) | 72.49 (−2.29) | 72.71 (−2.07) | 73.11 (−1.67) | CIFAR-100 | DenseNet121 | 26.54 | 75.41 | 73.25 (−2.16) | 73.14 (−2.27) | 73.17 (−2.24) | 73.85 (−1.56) |
|
|
|