降低分布式训练通信的梯度稀疏压缩方法
陈世达,刘强,韩亮

Gradient sparsification compression approach to reducing communication in distributed training
Shi-da CHEN,Qiang LIU,Liang HAN
表 1 2种分布在不同网络的Wasserstein距离
Tab.1 Wasserstein distance of two distribution methods in different networks
网络模型 Gaussian EMD Laplacian EMD
AlexNet 19.273 13.2476
VGG19 28.810 13.628
ResNet50 8.343 4.694
DenseNet121 35.846 19.005
SENet18 46.224 27.316