A new federated learning method was proposed in the medical dialogue scene for the heterogeneous data/models and different types of data. The cloud model and the end model transferred knowledge by mutual bootstrapping distillation. The end-to-cloud bootstrapping distillation process was a multi-teacher-single-student model, and knowledge was distilled from multiple local models to a global model. The cloud-to-end bootstrapping distillation process was a single-teacher-multi-student model, and knowledge was distilled from the global model back to multiple local models. On the medical dialogue ReMeDi and MedDG data sets, the proposed method is significantly improved compared with the classical baseline by the text generation evaluation criterion, and the training speed has also been improved.
Yupeng LIU,Minghao LIN,Jiang ZHANG,Dengju YAO. Heterogeneous cloud-end medical dialogue federation based on bi-directional bootstrapping distillation. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2062-2068.
Fig.1Federated learning method based on bi-directional bootstrapping distillation
方法
ReMeDi
MedDG
BLEU-1
BLEU-4
ROGUE-1
ROGUE-2
Distinct-1
Distinct-2
BLEU-1
BLEU-4
ROGUE-1
ROGUE-2
Distinct-1
Distinct-2
中心化训练
27.86
6.59
50.36
32.25
0.72
8.59
30.47
14.21
53.97
35.73
0.87
10.92
FedAvg
18.37
4.83
38.64
22.45
0.50
5.32
19.89
9.62
39.71
25.87
0.58
7.06
FedMD
21.41
5.79
41.92
26.93
0.63
7.54
23.74
11.84
43.76
30.21
0.63
9.14
FedDF
21.68
5.46
40.45
26.64
0.62
8.06
24.26
11.03
43.89
29.51
0.77
9.25
FedGen
24.08
6.38
42.64
27.68
0.65
7.92
26.10
13.05
46.17
32.04
0.69
9.87
FedBiD
25.01
6.32
45.76
28.19
0.68
8.32
26.75
13.16
46.83
31.63
0.78
9.98
Tab.1Performance comparison of different federated leaning methods in two datasets
Fig.2Performance of model under homogeneous data
Fig.3The performance of model under heterogeneous data
客户端
模型
层数
隐层维度
np/106
1
GPT-2-small
12
768
117
2
GPT-2
24
1024
345
3
BART-base
12
768
130
4
BART
24
1024
374
Tab.2Model parameters on each client
Fig.4Model performance curve of each client
数据集
BLEU
T1=1
T1=10
T1=20
T1=30
T2=0.1
T2=1.0
T2=2.0
T2=5.0
ReMeDi
11.85
14.37
15.71
13.92
12.17
15.71
14.85
12.46
MedDG
13.79
18.02
19.91
17.68
15.88
19.91
18.38
16.25
Tab.3Eeffect of temperature on model performance
模型
层数
隐藏层维度
np/106
BLEU
GPT-2-small
12
768
117
16.75
GPT-2
24
1024
345
19.91
GPT-2-large
36
1280
762
20.62
GPT-2-max
48
1600
1542
22.03
Tab.4Effects of different model parameters on model performance
[1]
YAO A C. Protocols for secure computations [C]// Proceedings of 23rd Annual Symposium on Foundations of Computer Science . Chicago: IEEE, 1982: 160–164.
[2]
GOLDREICH O, MICALI S, WIGDERSON A. How to play any mental game [C]// Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing . New York: [s.n.], 1987: 218–229.
[3]
SHAMIR A How to share a secret[J]. Communications of the ACM, 1979, 22 (11): 612- 613
doi: 10.1145/359168.359176
[4]
KONEČNÝ J, MCMAHAN H B, RAMAGE D, et al. Federated optimization: distributed machine learning for on-device intelligence [EB/OL]. (2016−10−08) [2022−12−01]. https://arxiv.org/pdf/1610.02527.
[5]
TAN Y, LONG G, LIU L, et al. FedProto: federated prototype learning across heterogeneous clients [C]// Proceedings of the AAAI Conference on Artificial Intelligence . [S.l.]: AAAI Press, 2022: 8432−8440.
[6]
LI D, WANG J. FedMD: heterogenous federated learning via model distillation [EB/OL]. (2019−10−08)[2022−12−01]. https://arxiv.org/pdf/1910.03581.
[7]
MCMAHAN H B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data [EB/OL]. (2023−01−26) [2023−12−01]. https://arxiv.org/pdf/1602.05629.
[8]
HANZELY F, RICHTÁRIK P. Federated learning of a mixture of global and local models [EB/OL]. (2021−02−12)[2022−12−01]. https://arxiv.org/pdf/2002.05516.
[9]
HUANG L, YIN Y, FU Z, et al. LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data [J]. PLoS ONE , 2020, 15(4): e0230706.
[10]
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015−03−09)[2022−12−01]. https://arxiv.org/pdf/1503.02531.
[11]
FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born-again neural networks [EB/OL]. (2018−06−29)[2022−12−01]. https://arxiv.org/pdf/1805.04770.
[12]
KIMURA A, GHAHRAMANI Z, TAKEUCHI K, et al. Few-shot learning of neural networks from scratch by pseudo example optimization [EB/OL]. (2018−07−05)[2022−12−01]. https://arxiv.org/pdf/1802.03039.
[13]
LOPES R G, FENU S, STARNER T. Data-free knowledge distillation for deep neural networks [EB/OL]. (2017−11−23)[2022−12−01]. https://arxiv.org/pdf/1710.07535.
[14]
NAYAK G K, MOPURI K R, SHAJ V, et al. Zero-shot knowledge distillation in deep networks [EB/OL]. (2019−05−20) [2022−12−01]. https://arxiv.org/pdf/1905.08114.
[15]
CHEN H, WANG Y, XU C, et al. Data-free learning of student networks [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 3514–3522.
[16]
FANG G, SONG J, SHEN C, et al. Data-free adversarial distillation [EB/OL]. (2020−03−02) [2022−12−01]. https://arxiv.org/pdf/1912.11006.
[17]
JEONG E, OH S, KIM H, et al. Communication-efficient on-device machine learning: federated distillation and augmentation under non-IID private data [EB/OL]. (2023−10−19)[2023−12−01]. https://arxiv.org/pdf/1811.11479.
[18]
ITAHARA S, NISHIO T, KODA Y, et al Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data[J]. IEEE Transactions on Mobile Computing, 2021, 22 (1): 191- 205
[19]
LIN T, KONG L, STICH S U, et al. Ensemble distillation for robust model fusion in federated learning [C]// Proceedings of the 34th International Conference on Neural Information Processing Systems . [S.l.]: CAI, 2020: 2351−2363.
[20]
CHANDRAKALA S, JAYALAKSHMI S L Generative model driven representation learning in a hybrid framework for environmental audio scene and sound event recognition[J]. IEEE Transactions on Multimedia, 2019, 22 (1): 3- 14
[21]
ARIVAZHAGAN M G, AGGARWAL V, SINGH A K, et al. Federated learning with personalization layers [EB/OL]. (2019−12−02) [2022−12−01]. https://arxiv.org/pdf/1912.00818.
[22]
ZHU Z, HONG J, ZHOU J. Data-free knowledge distillation for heterogeneous federated learning [EB/OL]. (2021−06−09) [2022−12−01]. https://arxiv.org/pdf/2105.10056.
[23]
RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. [2022−12−01]. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
[24]
YAN G, PEI J, REN P, et al. ReMeDi: resources for multi-domain, multi-service, medical dialogues [EB/OL]. (2022−03−01) [2022−12−01]. https://arxiv.org/pdf/2109.00430.
[25]
LIU W, TANG J, CHENG Y, et al. MedDG: an entity-centric medical consultation dataset for entity-aware medical dialogue generation [C]// Natural Language Processing and Chinese Computing . [S.l.]: Springer, 2022: 447−459.