Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (10): 2062-2068    DOI: 10.3785/j.issn.1008-973X.2024.10.009
    
Heterogeneous cloud-end medical dialogue federation based on bi-directional bootstrapping distillation
Yupeng LIU(),Minghao LIN,Jiang ZHANG,Dengju YAO
School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
Download: HTML     PDF(721KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new federated learning method was proposed in the medical dialogue scene for the heterogeneous data/models and different types of data. The cloud model and the end model transferred knowledge by mutual bootstrapping distillation. The end-to-cloud bootstrapping distillation process was a multi-teacher-single-student model, and knowledge was distilled from multiple local models to a global model. The cloud-to-end bootstrapping distillation process was a single-teacher-multi-student model, and knowledge was distilled from the global model back to multiple local models. On the medical dialogue ReMeDi and MedDG data sets, the proposed method is significantly improved compared with the classical baseline by the text generation evaluation criterion, and the training speed has also been improved.



Key wordsbootstrapping distillation      heterogenous data      heterogenous model      structure regularization      medical dialogue     
Received: 29 July 2023      Published: 27 September 2024
CLC:  TP 393  
Fund:  国家自然科学基金资助项目(61300115, 62172128).
Cite this article:

Yupeng LIU,Minghao LIN,Jiang ZHANG,Dengju YAO. Heterogeneous cloud-end medical dialogue federation based on bi-directional bootstrapping distillation. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2062-2068.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.10.009     OR     https://www.zjujournals.com/eng/Y2024/V58/I10/2062


基于双向自举蒸馏的异质云-端医疗对话联邦

医疗对话场景下的数据/模型异质、数据类型不同,为此提出新的联邦学习方法. 云模型和端模型以相互自举蒸馏的方式进行知识递进传递. 端到云的自举蒸馏过程为多教师-单学生模式,知识被从多个局部模型蒸馏统一到全局模型;云到端的自举蒸馏过程为单教师-多学生模式,知识被从全局模型蒸馏回多个局部模型. 在医疗对话ReMeDi和MedDG数据集上,所提方法与经典基线相比通过文本生成指标评价获得了显著提高,训练速度有所提升.


关键词: 自举蒸馏,  异质数据,  异质模型,  结构正则,  医疗对话 
Fig.1 Federated learning method based on bi-directional bootstrapping distillation
方法ReMeDiMedDG
BLEU-1BLEU-4ROGUE-1ROGUE-2Distinct-1Distinct-2BLEU-1BLEU-4ROGUE-1ROGUE-2Distinct-1Distinct-2
中心化训练27.866.5950.3632.250.728.5930.4714.2153.9735.730.8710.92
FedAvg18.374.8338.6422.450.505.3219.899.6239.7125.870.587.06
FedMD21.415.7941.9226.930.637.5423.7411.8443.7630.210.639.14
FedDF21.685.4640.4526.640.628.0624.2611.0343.8929.510.779.25
FedGen24.086.3842.6427.680.657.9226.1013.0546.1732.040.699.87
FedBiD25.016.3245.7628.190.688.3226.7513.1646.8331.630.789.98
Tab.1 Performance comparison of different federated leaning methods in two datasets
Fig.2 Performance of model under homogeneous data
Fig.3 The performance of model under heterogeneous data
客户端模型层数隐层维度np/106
1GPT-2-small12768117
2GPT-2241024345
3BART-base12768130
4BART241024374
Tab.2 Model parameters on each client
Fig.4 Model performance curve of each client
数据集BLEU
T1=1T1=10T1=20T1=30T2=0.1T2=1.0T2=2.0T2=5.0
ReMeDi11.8514.3715.7113.9212.1715.7114.8512.46
MedDG13.7918.0219.9117.6815.8819.9118.3816.25
Tab.3 Eeffect of temperature on model performance
模型层数隐藏层维度np/106BLEU
GPT-2-small1276811716.75
GPT-224102434519.91
GPT-2-large36128076220.62
GPT-2-max481600154222.03
Tab.4 Effects of different model parameters on model performance
[1]   YAO A C. Protocols for secure computations [C]// Proceedings of 23rd Annual Symposium on Foundations of Computer Science . Chicago: IEEE, 1982: 160–164.
[2]   GOLDREICH O, MICALI S, WIGDERSON A. How to play any mental game [C]// Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing . New York: [s.n.], 1987: 218–229.
[3]   SHAMIR A How to share a secret[J]. Communications of the ACM, 1979, 22 (11): 612- 613
doi: 10.1145/359168.359176
[4]   KONEČNÝ J, MCMAHAN H B, RAMAGE D, et al. Federated optimization: distributed machine learning for on-device intelligence [EB/OL]. (2016−10−08) [2022−12−01]. https://arxiv.org/pdf/1610.02527.
[5]   TAN Y, LONG G, LIU L, et al. FedProto: federated prototype learning across heterogeneous clients [C]// Proceedings of the AAAI Conference on Artificial Intelligence . [S.l.]: AAAI Press, 2022: 8432−8440.
[6]   LI D, WANG J. FedMD: heterogenous federated learning via model distillation [EB/OL]. (2019−10−08)[2022−12−01]. https://arxiv.org/pdf/1910.03581.
[7]   MCMAHAN H B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data [EB/OL]. (2023−01−26) [2023−12−01]. https://arxiv.org/pdf/1602.05629.
[8]   HANZELY F, RICHTÁRIK P. Federated learning of a mixture of global and local models [EB/OL]. (2021−02−12)[2022−12−01]. https://arxiv.org/pdf/2002.05516.
[9]   HUANG L, YIN Y, FU Z, et al. LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data [J]. PLoS ONE , 2020, 15(4): e0230706.
[10]   HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015−03−09)[2022−12−01]. https://arxiv.org/pdf/1503.02531.
[11]   FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born-again neural networks [EB/OL]. (2018−06−29)[2022−12−01]. https://arxiv.org/pdf/1805.04770.
[12]   KIMURA A, GHAHRAMANI Z, TAKEUCHI K, et al. Few-shot learning of neural networks from scratch by pseudo example optimization [EB/OL]. (2018−07−05)[2022−12−01]. https://arxiv.org/pdf/1802.03039.
[13]   LOPES R G, FENU S, STARNER T. Data-free knowledge distillation for deep neural networks [EB/OL]. (2017−11−23)[2022−12−01]. https://arxiv.org/pdf/1710.07535.
[14]   NAYAK G K, MOPURI K R, SHAJ V, et al. Zero-shot knowledge distillation in deep networks [EB/OL]. (2019−05−20) [2022−12−01]. https://arxiv.org/pdf/1905.08114.
[15]   CHEN H, WANG Y, XU C, et al. Data-free learning of student networks [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 3514–3522.
[16]   FANG G, SONG J, SHEN C, et al. Data-free adversarial distillation [EB/OL]. (2020−03−02) [2022−12−01]. https://arxiv.org/pdf/1912.11006.
[17]   JEONG E, OH S, KIM H, et al. Communication-efficient on-device machine learning: federated distillation and augmentation under non-IID private data [EB/OL]. (2023−10−19)[2023−12−01]. https://arxiv.org/pdf/1811.11479.
[18]   ITAHARA S, NISHIO T, KODA Y, et al Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data[J]. IEEE Transactions on Mobile Computing, 2021, 22 (1): 191- 205
[19]   LIN T, KONG L, STICH S U, et al. Ensemble distillation for robust model fusion in federated learning [C]// Proceedings of the 34th International Conference on Neural Information Processing Systems . [S.l.]: CAI, 2020: 2351−2363.
[20]   CHANDRAKALA S, JAYALAKSHMI S L Generative model driven representation learning in a hybrid framework for environmental audio scene and sound event recognition[J]. IEEE Transactions on Multimedia, 2019, 22 (1): 3- 14
[21]   ARIVAZHAGAN M G, AGGARWAL V, SINGH A K, et al. Federated learning with personalization layers [EB/OL]. (2019−12−02) [2022−12−01]. https://arxiv.org/pdf/1912.00818.
[22]   ZHU Z, HONG J, ZHOU J. Data-free knowledge distillation for heterogeneous federated learning [EB/OL]. (2021−06−09) [2022−12−01]. https://arxiv.org/pdf/2105.10056.
[23]   RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. [2022−12−01]. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
[24]   YAN G, PEI J, REN P, et al. ReMeDi: resources for multi-domain, multi-service, medical dialogues [EB/OL]. (2022−03−01) [2022−12−01]. https://arxiv.org/pdf/2109.00430.
[25]   LIU W, TANG J, CHENG Y, et al. MedDG: an entity-centric medical consultation dataset for entity-aware medical dialogue generation [C]// Natural Language Processing and Chinese Computing . [S.l.]: Springer, 2022: 447−459.
[1] Youwei WANG,Weiqi WANG,Lizhou FENG,Jianming ZHU,Yang LI. Rumor detection method based on breadth-depth sampling and graph convolutional networks[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2040-2052.
[2] Xialu WEN,He HUANG,Huifeng WANG,Lan YANG,Tao GAO. Optimization of 3D multi-UAVs low altitude penetration based on bald eagle search algorithm[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2020-2030.
[3] Xuejiao LIU,Xiang ZHAO,Yingjie XIA,Tiancong CAO. Efficient heterogeneous authentication scheme with privacy protection in air-ground collaboration scenario[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 1981-1991.
[4] Huan LIU,Yunhong LI,Leitao ZHANG,Yue GUO,Xueping SU,Yaolin ZHU,Lele HOU. Identification of apple leaf diseases based on MA-ConvNext network and stepwise relational knowledge distillation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1757-1767.
[5] Baolin YE,Ruitao SUN,Weimin WU,Bin CHEN,Qing YAO. Traffic signal control method based on asynchronous advantage actor-critic[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1671-1680.
[6] Taotao HU,Shaojun HE,Dong WANG. Creep damage intrinsic model of carbonaceous slate considering laminar inclination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1704-1716.
[7] Shuren HAO,Tian’e LI,Hui PENG,Haiquan WU,Yueqin YAN,Haiwang LI,Ning SU. Experimental study on wind load characteristics of continuous roof with concave surface[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1467-1478.
[8] Xinyang LI,Weifeng LIU,Xuning GUO,Yunling LI,Feilin ZHU,Ping’an ZHONG. Complementary characteristics of wind-photovoltaic-hydropower output in basin[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1505-1515.
[9] Zihao SHAO,Ru HUO,Zhihao WANG,Dong NI,Renchao XIE. Survey of mobile crowdsensing data processing based on blockchain[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1091-1106.
[10] Qianlin YE,Wanliang WANG,Zheng WANG. Survey of multi-objective particle swarm optimization algorithms and their applications[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1107-1120.
[11] Yufu HUO,Beihong JIN,Zhaoyi LIAO. Multi-modal information augmented model for micro-video recommendation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1142-1152.
[12] Juan SONG,Longxi HE,Huiping LONG. Deep learning-based algorithm for multi defect detection in tunnel lining[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1161-1173.
[13] Haijun XING,Yujing YE,Zheyuan LIU,Weijian JIANG,Wenbo ZHANG,Shuxin TIAN. Low-carbon optimal scheduling of integrated energy system considering multiple flexible resources[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1243-1254.
[14] Qing SHU,Xiping LIU,Zhao TAN,Xi LI,Changxuan WAN,Dexi LIU,Guoqiong LIAO. SQL generation method based on dependency relational graphattention network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 908-917.
[15] Yidan LIU,Xiaofei ZHU,Yabo YIN. Heterogeneous graph convolutional neural network for argument pair extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 900-907.