Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (6): 1231-1239    DOI: 10.3785/j.issn.1008-973X.2026.06.010
计算机技术     
基于深度互学的多任务学习
肖洪湖1,3(),黄成泉1,2,3,*(),周训会1,3,董红来1,3,周丽华1
1. 贵州民族大学 数据科学与信息工程学院,贵州 贵阳 550025
2. 贵州民族大学 工程技术人才实践训练中心,贵州 贵阳 550025
3. 贵州民族大学 贵州省模式识别与智能系统重点实验室,贵州 贵阳 550025
Multi-task learning based on deep mutual learning
Honghu XIAO1,3(),Chengquan HUANG1,2,3,*(),Xunhui ZHOU1,3,Honglai DONG1,3,Lihua ZHOU1
1. School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, China
2. Engineering Training Center, Guizhou Minzu University, Guiyang 550025, China
3. Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou Province, Guizhou Minzu University, Guiyang 550025, China
 全文: PDF(892 KB)   HTML
摘要:

针对多任务学习(MTL)中因泛化监督信号不稳健导致MTL过拟合的问题,提出多深度相互学习(MDML)算法. 在2个多任务网络的更新中引入模仿损失,将多任务学习建模为相互学习问题. 在2个多任务网络中引入模仿损失函数,通过任务输出来确定,对2个多任务网络中同一任务的不同输出进行对齐,得到模仿损失. MDML算法根据加权方案对传统监督学习损失与模仿损失进行损失融合,更新2个多任务网络. 在NYUv2和Cityscapes数据集上的实验结果表明,利用MDML算法,有效解决了多任务网络中泛化监督信号不稳健的问题,降低了多任务网络过拟合.

关键词: 多深度相互学习多任务学习相互学习模仿损失泛化监督信号    
Abstract:

A multi-depth mutual learning (MDML) algorithm was proposed to address the issue of overfitting in multi-task learning caused by unstable generalization supervision signal. The mimicry loss was introduced into the update of two multi-tasking networks, and the multi-task learning problem was formulated as a mutual learning problem. The mimicry loss function was introduced into the two multi-task networks. The mimicry loss function was determined by the task output, and the mimicry loss was obtained by aligning the output of the same task from the two multi-task networks. The conventional supervised learning loss and mimicry loss were combined according to the weighting scheme, and the two multi-task networks were updated by the MDML algorithm. The experimental result on the NYUv2 and Cityscapes dataset showed that the MDML algorithm effectively solved the issue of unstable generalization supervision signal in multi-task network, thereby reducing overfitting of multi-task network.

Key words: multi-depth mutual learning    multi-task learning    mutual learning    mimicry loss    generalized supervised signal
收稿日期: 2025-07-17 出版日期: 2026-05-06
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(62062024);贵州省科技计划资助项目(黔科合基础-ZK[2021]一般342);贵州省研究生教育教学改革重点项目(黔教合YJSJGKT [2021]018);贵州省教育厅自然科学研究资助项目(黔教技[2022]015);贵州省模式识别与智能系统重点实验室2022年度开放课题资助项目(GZMUKL[2022]KF03).
通讯作者: 黄成泉     E-mail: 2143821719@qq.com;hcq@gzmu.edu.cn
作者简介: 肖洪湖(1998—),男,硕士生,从事深度学习与多任务学习研究. orcid.org/0009-0007-0832-0091. E-mail:2143821719@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
肖洪湖
黄成泉
周训会
董红来
周丽华

引用本文:

肖洪湖,黄成泉,周训会,董红来,周丽华. 基于深度互学的多任务学习[J]. 浙江大学学报(工学版), 2026, 60(6): 1231-1239.

Honghu XIAO,Chengquan HUANG,Xunhui ZHOU,Honglai DONG,Lihua ZHOU. Multi-task learning based on deep mutual learning. Journal of ZheJiang University (Engineering Science), 2026, 60(6): 1231-1239.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.06.010        https://www.zjujournals.com/eng/CN/Y2026/V60/I6/1231

图 1  多深度相互学习算法的结构图
加权方案方法网络语义分割深度估计
mIoU/%pAcc/%absrel/%
DWAIndependentNet 172.70±0.1192.75±0.040.0158±0.00319.8475±0.0017
Net 273.33±0.1292.98±0.040.0156±0.001210.9579±0.0012
MDMLNet 1**75.16±0.16**93.65±0.050.0157±0.00169.3845±0.0017
Net 174.96±0.1293.62±0.030.0158±0.00189.6220±0.013
MDMLNet 274.93±0.3393.64±0.080.0156±0.00229.9440±0.0013
Net 274.82±0.3593.64±0.080.0157±0.00249.9031±0.0018
FAMOIndependentNet 174.48±0.1993.25±0.060.0149±0.00128.5942±0.0017
Net 274.46±0.2193.25±0.060.0149±0.001410.8209±0.0018
MDMLNet 175.10±0.1793.65±0.180.0152±0.00138.6379±0.0013
Net 175.25±0.1493.73±0.060.0150±0.00118.4862±0.0017
MDMLNet 275.11±0.1493.73±0.080.0148±0.00158.8535±0.0018
Net 2**75.31±0.29*93.74±0.130.0150±0.00138.5065±0.0015
OTW&MLWIndependent (OKD)Net 172.62±0.3592.63±0.140.0154±0.00118.7886±0.0016
Net 273.17±0.2592.91±0.210.0152±0.000210.0187±0.0017
MDMLNet 1**75.14±0.2493.67±0.040.0157±0.00128.8345±0.0012
Net 175.07±0.2193.65±0.060.0157±0.00118.8424±0.0014
MDMLNet 275.00±0.1593.66±0.040.0157±0.00119.7004±0.0016
Net 275.02±0.13**93.68±0.020.0155±0.00129.8319±0.0023
表 1  各方法用从头开始训练网络Net 1和Net 2在数据集Cityscapes上的结果
加权方案方法网络语义分割深度估计表面法线估计
mIoU/%pAcc/%absrel/%MeanMedian
DWAIndependentNet 131.23±0.1457.55±0.030.6479±0.00180.2472±0.001631.90±0.0725.91±0.05
Net 231.35±0.1257.59±0.050.6580±0.00140.2522±0.002131.95±0.0625.87±0.09
MDMLNet 136.50±0.2662.41±0.130.5780±0.0016**0.2184±0.001828.96±0.0524.25±0.07
Net 1**36.51±0.25**62.50±0.01**0.5757±0.00140.2201±0.0017**28.91±0.09**24.22±0.12
MDMLNet 235.48±0.3561.58±0.080.5822±0.00240.2208±0.001229.49±0.1224.88±0.07
Net 235.53±0.2861.61±0.090.5804±0.00240.2226±0.001529.41±0.0724.78±0.07
FAMOIndependentNet 132.41±0.2158.82±0.020.6364±0.00320.2473±0.001430.06±0.0423.41±0.09
Net 233.03±0.2859.42±0.010.6481±0.00340.2493±0.001629.86±0.0822.90±0.06
MDMLNet 136.46±0.2562.61±0.060.5733±0.00160.2149±0.001927.57±0.0722.25±0.07
Net 1**36.88±0.25**62.96±0.02**0.5693±0.0018**0.2148±0.0021**27.44±0.06**22.04±0.07
MDMLNet 234.17±0.3160.87±0.180.5849±0.00170.2216±0.001828.52±0.0423.44±0.06
Net 234.32±0.2461.03±0.190.5903±0.00180.2236±0.001928.53±0.0823.47±0.09
OTW&MLWIndependent (OKD)Net 130.94±0.1457.04±0.170.6336±0.00190.2458±0.002231.24±0.0524.92±0.05
Net 231.23±0.2757.50±0.070.6439±0.00130.2494±0.001930.63±0.08*23.96±0.06
MDMLNet 136.38±0.3862.41±0.16**0.5740±0.00270.2197±0.001528.85±0.0624.11±0.05
Net 1**36.40±0.43**62.42±0.120.5795±0.0028**0.2194±0.0015**28.84±0.0924.10±0.07
MDMLNet 235.27±0.2561.63±0.190.5851±0.00260.2216±0.001229.46±0.0624.83±0.08
Net 235.61±0.2961.79±0.140.5819±0.00450.2229±0.001329.51±0.0324.86±0.09
表 2  各方法用从头开始训练网络Net 1和Net 2在数据集NYUv2上的结果
加权方案方法网络语义分割深度估计表面法线估计
mIoU/%pAcc/%absrel/%MeanMedian
DWAIndependentNet 147.54±0.2570.84±0.090.5198±0.00230.1987±0.001426.02±0.0219.58±0.04
Net 250.83±0.2373.65±0.090.5113±0.00250.1933±0.001924.52±0.03*18.06±0.05
MDMLNet 151.40±0.1574.04±0.150.4726±0.00210.1720±0.001125.02±0.0620.16±0.02
Net 151.09±0.2473.77±0.140.4739±0.00280.1722±0.00225.03±0.0720.12±0.06
MDMLNet 253.19±0.2775.03±0.150.4658±0.00270.1682±0.001624.30±0.0819.50±0.03
Net 2**53.24±0.25**75.23±0.15**0.4611±0.0027**0.1666±0.0022**24.26±0.0219.49±0.08
FAMOIndependentNet 147.35±0.3670.62±0.040.5269±0.00150.1988±0.002525.09±0.0918.36±0.07
Net 251.12±0.3173.73±0.160.5186±0.00120.1925±0.001523.46±0.03*16.88±0.04
MDMLNet 151.30±0.2273.82±0.120.4765±0.00160.1719±0.001723.64±0.0818.21±0.02
Net 151.18±0.3173.91±0.090.4783±0.00160.1731±0.001223.59±0.0818.12±0.09
MDMLNet 253.37±0.29**75.45±0.16**0.4690±0.00120.1678±0.002222.64±0.0417.25±0.06
Net 2**53.67±0.2575.42±0.180.4738±0.0017**0.1672±0.0018**22.62±0.0217.20±0.02
OTW&MLWIndependent (OKD)Net 147.87±0.3271.17±0.120.5115±0.00150.1961±0.001224.52±0.0817.64±0.09
Net 252.01±0.1974.35±0.120.5044±0.00190.1894±0.0016**22.84±0.05**16.18±0.06
MDMLNet 151.32±0.2773.98±0.140.4838±0.00120.1751±0.001524.96±0.0720.11±0.03
Net 151.31±0.1974.02±0.050.4773±0.00170.1736±0.001124.93±0.0520.03±0.08
MDMLNet 2**53.59±0.27**75.49±0.15**0.4563±0.0019**0.1651±0.001324.12±0.0719.09±0.05
Net 253.29±0.3175.44±0.140.4589±0.00170.1682±0.001524.15±0.0719.32±0.02
表 3  各方法用预训练网络Net 1和Net 2在数据集NYUv2上的结果
图 2  MDML算法和OKD算法在各项任务上的训练损失和测试损失变化结果
加权方案方法网络从头开始训练网络预训练网络
ttr/sNp/106ttr/sNp/106
DWAIndependentNet 113.8396.6916.3496.69
Net 217.18135.2519.05135.25
MDMLNet 1&Net 124.47193.3822.99193.38
MDMLNet 2&Net 228.74270.5022.76270.50
FAMOIndependentNet 117.5196.6916.0596.69
Net 220.81135.2521.45135.25
MDMLNet 1&Net 126.87193.3826.03193.38
MDMLNet 2&Net 232.98270.5035.70270.50
OTW&MLWIndependent(OKD)Net 138.02200.9936.89200.99
Net 251.48316.6751.24316.67
MDMLNet 1&Net 121.33193.3824.94193.38
MDMLNet 2&Net 216.10270.5022.44270.50
表 4  各方法在NYUv2数据集上的模型复杂度比较
1 袁姮, 于东琪, 高原 面向图像分类的双域特征联合网络[J]. 模式识别与人工智能, 2025, 38 (4): 325- 340
YUAN Heng, YU Dongqi, GAO Yuan Two-domain feature association networks for image classification[J]. Pattern Recognition and Artificial Intelligence, 2025, 38 (4): 325- 340
2 张振利, 胡新凯, 李凡, 等 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报: 工学版, 2025, 59 (4): 778- 786
ZHANG Zhenli, HU Xinkai, LI Fan, et al Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (4): 778- 786
doi: 10.3785/j.issn.1008-973X.2025.04.013
3 顾磊, 夏楠, 江佳鸿, 等 基于时空特征增强的单目标跟踪算法[J]. 浙江大学学报: 工学版, 2025, 59 (11): 2418- 2429
GU Lei, XIA Nan, JIANG Jiahong, et al Single object tracking algorithm based on spatio-temporal feature enhancement[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (11): 2418- 2429
doi: 10.3785/j.issn.1008-973X.2025.11.021
4 ALMALIOGLU Y, TURAN M, SAPUTRA M R U, et al SelfVIO: self-supervised deep monocular visual–inertial odometry and depth estimation[J]. Neural Networks, 2022, 150: 119- 136
doi: 10.1016/j.neunet.2022.03.005
5 JIAO L, WANG M, LIU X, et al Multiscale deep learning for detection and recognition: a comprehensive survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36 (4): 5900- 5920
doi: 10.1109/TNNLS.2024.3389454
6 ZHANG Y, YANG Q A survey on multi-task learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (12): 5586- 5609
doi: 10.1109/TKDE.2021.3070203
7 HAURUM J B, MADADI M, ESCALERA S, et al. Multi-task classification of sewer pipe defects and properties using a cross-task graph neural network decoder [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 2806–2817.
8 STANDLEY T, ZAMIR A, CHEN D, et al. Which tasks should be learned together in multi-task learning? [C]// International Conference on Machine Learning. [S. l.]: PMLR, 2020: 9120–9132.
9 LI W H, BILEN H. Knowledge distillation for multi-task learning [C]//European Conference on Computer Vision. Cham: Springer, 2020: 163–176.
10 HU Z, ZHAO Z, YI X, et al. Improving multi-task generalization via regularizing spurious correlation [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2022: 11450-11466.
11 GUO M, HAQUE A, HUANG D A, et al. Dynamic task prioritization for multitask learning [C]// European Conference on Computer Vision. Cham: Springer, 2018: 270–287.
12 LIU S, JOHNS E, DAVISON A J. End-to-end multi-task learning with attention [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020: 1871–1880.
13 LIU B, FENG Y, STONE P, et al. FAMO: fast adaptive multitask optimization [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2023: 57226–57243.
14 YU T, KUMAR S, GUPTA A, et al. Gradient surgery for multi-task learning [C]//Advances in Neural Information Processing Systems. Vancouver: MIT Press, 2020, 33: 5824–5836.
15 LIU B, LIU X, JIN X, et al. Conflict-averse gradient descent for multi-task learning [C]// Advances in Neural Information Processing Systems. [S. l.]: MIT Press, 2021, 34: 18878–18890.
16 JACOB G M, AGARWAL V, STENGER B. Online knowledge distillation for multi-task learning [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 2359–2368.
17 ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep mutual learning [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4320–4328.
18 FAN D, JAGGI M, MENDLER-DÜNNER C. Collaborative learning via prediction consensus [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2023: 1988–2009.
19 HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2025-05-24]. https://arxiv.org/abs/1503.02531.
20 LIANG X, WU L, LI J, et al. R-drop: regularized dropout for neural networks [C]// Advances in Neural Information Processing Systems. [S. l.]: MIT Press, 2021, 34: 10890–10905.
21 SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images [C]// European Conference on Computer Vision. Florence: Springer, 2012: 746–760.
22 CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213–3223.
23 CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
doi: 10.1109/tpami.2017.2699184
24 CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05)[2025-05-24]. https://arxiv.org/abs/1706.05587.
25 DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248–255.
26 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
27 HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2016: 1026–1034.
28 KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30)[2025-05-24]. https://arxiv.org/abs/1412.6980.
[1] 林鹏志,钟铭恩,范康,谭佳威,林志强. 基于跨任务双向特征交互的交通场景感知算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1784-1792.
[2] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[3] 范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[4] 薛雅丽,周李尊,王林飞,欧阳权. 基于多特征重构的三维目标反演算法[J]. 浙江大学学报(工学版), 2024, 58(11): 2199-2207.
[5] 陈巧红,孙佳锦,漏杨波,方志坚. 基于多任务学习与层叠 Transformer 的多模态情感分析模型[J]. 浙江大学学报(工学版), 2023, 57(12): 2421-2429.