基于深度互学的多任务学习

doi:10.3785/j.issn.1008-973X.2026.06.010

浙江大学学报(工学版)

2026, Vol. 60

Issue (6): 1231-1239 DOI: 10.3785/j.issn.1008-973X.2026.06.010

计算机技术

基于深度互学的多任务学习

肖洪湖1,3(

),黄成泉1,2,3,*(

),周训会1,3,董红来1,3,周丽华1

1. 贵州民族大学数据科学与信息工程学院，贵州贵阳 550025
2. 贵州民族大学工程技术人才实践训练中心，贵州贵阳 550025
3. 贵州民族大学贵州省模式识别与智能系统重点实验室，贵州贵阳 550025

Multi-task learning based on deep mutual learning

Honghu XIAO1,3(

),Chengquan HUANG1,2,3,*(

),Xunhui ZHOU1,3,Honglai DONG1,3,Lihua ZHOU1

1. School of Data Science and Information Engineering, Guizhou Minzu University, Guiyang 550025, China
2. Engineering Training Center, Guizhou Minzu University, Guiyang 550025, China
3. Key Laboratory of Pattern Recognition and Intelligent Systems of Guizhou Province, Guizhou Minzu University, Guiyang 550025, China

全文: PDF(892 KB) HTML

摘要：

针对多任务学习（MTL）中因泛化监督信号不稳健导致MTL过拟合的问题，提出多深度相互学习（MDML）算法. 在2个多任务网络的更新中引入模仿损失，将多任务学习建模为相互学习问题. 在2个多任务网络中引入模仿损失函数，通过任务输出来确定，对2个多任务网络中同一任务的不同输出进行对齐，得到模仿损失. MDML算法根据加权方案对传统监督学习损失与模仿损失进行损失融合，更新2个多任务网络. 在NYUv2和Cityscapes数据集上的实验结果表明，利用MDML算法，有效解决了多任务网络中泛化监督信号不稳健的问题，降低了多任务网络过拟合.

关键词： 多深度相互学习; 多任务学习; 相互学习; 模仿损失; 泛化监督信号

Abstract:

A multi-depth mutual learning (MDML) algorithm was proposed to address the issue of overfitting in multi-task learning caused by unstable generalization supervision signal. The mimicry loss was introduced into the update of two multi-tasking networks, and the multi-task learning problem was formulated as a mutual learning problem. The mimicry loss function was introduced into the two multi-task networks. The mimicry loss function was determined by the task output, and the mimicry loss was obtained by aligning the output of the same task from the two multi-task networks. The conventional supervised learning loss and mimicry loss were combined according to the weighting scheme, and the two multi-task networks were updated by the MDML algorithm. The experimental result on the NYUv2 and Cityscapes dataset showed that the MDML algorithm effectively solved the issue of unstable generalization supervision signal in multi-task network, thereby reducing overfitting of multi-task network.

Key words: multi-depth mutual learning multi-task learning mutual learning mimicry loss generalized supervised signal

收稿日期: 2025-07-17 出版日期: 2026-05-06

CLC:

TP 391

基金资助: 国家自然科学基金资助项目（62062024）；贵州省科技计划资助项目(黔科合基础-ZK[2021]一般342)；贵州省研究生教育教学改革重点项目(黔教合YJSJGKT [2021]018)；贵州省教育厅自然科学研究资助项目(黔教技[2022]015)；贵州省模式识别与智能系统重点实验室2022年度开放课题资助项目(GZMUKL[2022]KF03).

通讯作者: 黄成泉 E-mail: 2143821719@qq.com;hcq@gzmu.edu.cn

作者简介: 肖洪湖（1998—），男，硕士生，从事深度学习与多任务学习研究. orcid.org/0009-0007-0832-0091. E-mail：2143821719@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	肖洪湖
	黄成泉
	周训会
	董红来
	周丽华

引用本文:

肖洪湖,黄成泉,周训会,董红来,周丽华. 基于深度互学的多任务学习[J]. 浙江大学学报(工学版), 2026, 60(6): 1231-1239.

Honghu XIAO,Chengquan HUANG,Xunhui ZHOU,Honglai DONG,Lihua ZHOU. Multi-task learning based on deep mutual learning. Journal of ZheJiang University (Engineering Science), 2026, 60(6): 1231-1239.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.06.010 或 https://www.zjujournals.com/eng/CN/Y2026/V60/I6/1231

图 1 多深度相互学习算法的结构图

表 1 各方法用从头开始训练网络Net 1和Net 2在数据集Cityscapes上的结果

表 2 各方法用从头开始训练网络Net 1和Net 2在数据集NYUv2上的结果

表 3 各方法用预训练网络Net 1和Net 2在数据集NYUv2上的结果

图 2 MDML算法和OKD算法在各项任务上的训练损失和测试损失变化结果

表 4 各方法在NYUv2数据集上的模型复杂度比较

1	袁姮, 于东琪, 高原面向图像分类的双域特征联合网络[J]. 模式识别与人工智能, 2025, 38 (4): 325- 340 YUAN Heng, YU Dongqi, GAO Yuan Two-domain feature association networks for image classification[J]. Pattern Recognition and Artificial Intelligence, 2025, 38 (4): 325- 340
2	张振利, 胡新凯, 李凡, 等基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报: 工学版, 2025, 59 (4): 778- 786 ZHANG Zhenli, HU Xinkai, LI Fan, et al Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (4): 778- 786 doi: 10.3785/j.issn.1008-973X.2025.04.013
3	顾磊, 夏楠, 江佳鸿, 等基于时空特征增强的单目标跟踪算法[J]. 浙江大学学报: 工学版, 2025, 59 (11): 2418- 2429 GU Lei, XIA Nan, JIANG Jiahong, et al Single object tracking algorithm based on spatio-temporal feature enhancement[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (11): 2418- 2429 doi: 10.3785/j.issn.1008-973X.2025.11.021
4	ALMALIOGLU Y, TURAN M, SAPUTRA M R U, et al SelfVIO: self-supervised deep monocular visual–inertial odometry and depth estimation[J]. Neural Networks, 2022, 150: 119- 136 doi: 10.1016/j.neunet.2022.03.005
5	JIAO L, WANG M, LIU X, et al Multiscale deep learning for detection and recognition: a comprehensive survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36 (4): 5900- 5920 doi: 10.1109/TNNLS.2024.3389454
6	ZHANG Y, YANG Q A survey on multi-task learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (12): 5586- 5609 doi: 10.1109/TKDE.2021.3070203
7	HAURUM J B, MADADI M, ESCALERA S, et al. Multi-task classification of sewer pipe defects and properties using a cross-task graph neural network decoder [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2022: 2806–2817.
8	STANDLEY T, ZAMIR A, CHEN D, et al. Which tasks should be learned together in multi-task learning? [C]// International Conference on Machine Learning. [S. l.]: PMLR, 2020: 9120–9132.
9	LI W H, BILEN H. Knowledge distillation for multi-task learning [C]//European Conference on Computer Vision. Cham: Springer, 2020: 163–176.
10	HU Z, ZHAO Z, YI X, et al. Improving multi-task generalization via regularizing spurious correlation [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2022: 11450-11466.
11	GUO M, HAQUE A, HUANG D A, et al. Dynamic task prioritization for multitask learning [C]// European Conference on Computer Vision. Cham: Springer, 2018: 270–287.
12	LIU S, JOHNS E, DAVISON A J. End-to-end multi-task learning with attention [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020: 1871–1880.
13	LIU B, FENG Y, STONE P, et al. FAMO: fast adaptive multitask optimization [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2023: 57226–57243.
14	YU T, KUMAR S, GUPTA A, et al. Gradient surgery for multi-task learning [C]//Advances in Neural Information Processing Systems. Vancouver: MIT Press, 2020, 33: 5824–5836.
15	LIU B, LIU X, JIN X, et al. Conflict-averse gradient descent for multi-task learning [C]// Advances in Neural Information Processing Systems. [S. l.]: MIT Press, 2021, 34: 18878–18890.
16	JACOB G M, AGARWAL V, STENGER B. Online knowledge distillation for multi-task learning [C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 2359–2368.
17	ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep mutual learning [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4320–4328.
18	FAN D, JAGGI M, MENDLER-DÜNNER C. Collaborative learning via prediction consensus [C]// Advances in Neural Information Processing Systems. New Orleans: MIT Press, 2023: 1988–2009.
19	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [EB/OL]. (2015-03-09)[2025-05-24]. https://arxiv.org/abs/1503.02531.
20	LIANG X, WU L, LI J, et al. R-drop: regularized dropout for neural networks [C]// Advances in Neural Information Processing Systems. [S. l.]: MIT Press, 2021, 34: 10890–10905.
21	SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images [C]// European Conference on Computer Vision. Florence: Springer, 2012: 746–760.
22	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213–3223.
23	CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848 doi: 10.1109/tpami.2017.2699184
24	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05)[2025-05-24]. https://arxiv.org/abs/1706.05587.
25	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248–255.
26	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
27	HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2016: 1026–1034.
28	KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30)[2025-05-24]. https://arxiv.org/abs/1412.6980.

[1]	林鹏志,钟铭恩,范康,谭佳威,林志强. 基于跨任务双向特征交互的交通场景感知算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1784-1792.
[2]	李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[3]	范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[4]	薛雅丽,周李尊,王林飞,欧阳权. 基于多特征重构的三维目标反演算法[J]. 浙江大学学报(工学版), 2024, 58(11): 2199-2207.
[5]	陈巧红,孙佳锦,漏杨波,方志坚. 基于多任务学习与层叠 Transformer 的多模态情感分析模型[J]. 浙江大学学报(工学版), 2023, 57(12): 2421-2429.

Viewed

Full text

Abstract

Cited

Shared

Discussed