Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (3): 503-509    DOI: 10.3785/j.issn.1008-973X.2022.03.009
计算机与控制工程     
适用于目标检测的上下文感知知识蒸馏网络
褚晶辉(),史李栋,井佩光,吕卫*()
天津大学 电气自动化与信息工程学院,天津 300072
Context-aware knowledge distillation network for object detection
Jing-hui CHU(),Li-dong SHI,Pei-guang JING,Wei LV*()
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
 全文: PDF(927 KB)   HTML
摘要:

针对现有应用于目标检测的知识蒸馏方法难以利用目标周围上下文区域的特征信息,提出适用于目标检测的上下文感知知识蒸馏网络(CAKD Net)方法.该方法能充分利用被检测目标的上下文信息,同时沿空间域和通道域进行信息感知,消除教师网络和学生网络的差异. 该方法包括基于上下文感知的区域提纯模块(CARM)和自适应通道注意力模块(ACAM). CARM利用上下文信息,自适应生成显著性区域的细粒度掩膜,准确消除教师网络和学生网络各自特征响应在该区域的差异;ACAM引入空间?通道注意力机制,进一步优化目标函数,提高学生网络的性能. 实验结果表明,所提方法对模型检测精确率提升超过2.9%.

关键词: 知识蒸馏通道注意力模型轻量化目标检测深度学习    
Abstract:

A context-aware knowledge distillation network (CAKD Net) method for object detection was proposed, aiming at the current methods of knowledge distillation for the task of object detection were difficult to use feature information of the surrounding context region of the detection object. The context information of the object was fully used, and the gap between the teacher network and the student network were eliminated by performing information perception along the spatial domain and channel domain simultaneously. A context-aware region modified module (CARM) and an adaptive channel attention module (ACAM) were included in CAKD Net. The context information was used to adaptively form a fine-grained mask of the salient region, and the difference of feature response of the teacher network and student network were precisely eliminated in the region of CARM. A novel spatial-channel attention was used to further optimize the objective function, thereby the performance of the student network was improved in ACAM. Experimental results show that the proposed algorithm improves the mean average precision by more than 2.9%.

Key words: knowledge distillation    channel attention    model compression    object detection    deep learning
收稿日期: 2021-09-07 出版日期: 2022-03-29
CLC:  TP 37  
基金资助: 天津市科技计划项目(18ZXJMTG00020);天津市自然科学基金资助项目(20JCQNJC01210)
通讯作者: 吕卫     E-mail: cjh@tju.edu.cn;luwei@tju.edu.cn
作者简介: 褚晶辉(1969—),女,副教授,从事计算机视觉方向研究. orcid.org/0000-0001-7926-8824. E-mail: cjh@tju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
褚晶辉
史李栋
井佩光
吕卫

引用本文:

褚晶辉,史李栋,井佩光,吕卫. 适用于目标检测的上下文感知知识蒸馏网络[J]. 浙江大学学报(工学版), 2022, 56(3): 503-509.

Jing-hui CHU,Li-dong SHI,Pei-guang JING,Wei LV. Context-aware knowledge distillation network for object detection. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 503-509.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.03.009        https://www.zjujournals.com/eng/CN/Y2022/V56/I3/503

图 1  适用于目标检测的上下文感知知识蒸馏网络概况图
图 2  基于上下文感知的区域提纯模块结构图
图 3  自适应通道注意力模块结构图
模型 mAP AP
Aeroplane Bike Bird Boat Bus Chair Table Mbike Person Train
VGG16(教师) 70.4 70.9 78.0 67.8 55.1 79.6 48.7 63.5 74.5 77.0 76.0
VGG11(学生) 59.6 67.3 71.4 56.6 44.3 68.8 37.7 51.6 70.0 71.9 62.9
VGG11(本研究方法) 68.5 74.4 77.6 65.3 55.6 77.4 46.2 63.4 76.8 76.3 75.0
表 1  VGG16-VGG11作为教师−学生网络在 VOC07 测试集上的实验结果
模型 mAP AP
Aeroplane Bike Bird Boat Bus Chair Table Mbike Person Train
ResNet101(教师) 74.4 77.8 78.9 77.5 63.2 79.2 54.5 68.7 77.8 78.6 78.8
ResNet50(学生) 69.1 68.9 79.0 67.0 54.1 78.6 49.7 62.6 72.5 77.2 75.0
ResNet50(本研究方法) 72.4 75.8 79.0 71.7 58.1 80.8 51.5 69.1 77.8 78.3 81.5
表 2  ResNet101-ResNet50作为教师−学生网络在 VOC07 测试集上的实验结果
%
模型 mAP AP
Car Cyclist Pedestrian
ResNet101(教师) 63.4 78.5 54.6 57.1
ResNet50(学生) 52.5 77.7 35.4 44.2
ResNet50(本研究方法) 56.4 79.3 38.2 51.7
VGG16(教师) 62.6 79.3 52.1 56.4
VGG11(学生) 58.7 77.7 45.4 53.1
VGG11(本研究方法) 62.3 79.8 50.1 57.0
表 3  在 KITTI 测试集上使用 Faster R-CNN 检测器的实验结果
组号 CARM ACAM mAP/%
1 × × 58.7
2 × 62.0
3 × 60.7
4 62.3
表 4  在 KITTI测试集上使用 Faster R-CNN 检测器进行消融实验的结果
组号 $ \rho $ mAP/% 组号 $\rho $ mAP/%
1 0 61.6 3 0.50 61.9
2 0.25 62.3 4 0.75 61.7
表 5  不同滤波器阈值的实验结果
模型 mAP/%
Hinton CD FitNets DOD Task LD CAKD Net
教师 74.4 74.4 74.4 74.4 74.4 74.4 74.4
学生 69.1 69.1 69.1 69.1 70.0 69.1 69.1
蒸馏后 69.7 70.1 69.3 72.0 72.4 70.3 72.4
表 6  在 VOC07 数据集上使用 Faster R-CNN 检测器进行对比实验的结果
图 4  不同知识蒸馏方法优化目标的可视图
组号 模型 ACAM mAP
1 FitNets × 59.0%
2 FitNets 59.7%
3 DOD × 60.7%
4 DOD 61.4%
表 7  自适应通道注意力模块的普适性验证实验结果
1 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
2 张彦楠, 黄小红, 马严, 等 基于深度学习的录音文本分类方法[J]. 浙江大学学报:工学版, 2020, 54 (7): 1264- 1271
ZHANG Yan-nan, HUANG Xiao-hong, MA Yan, et al Method with recording text classification based on deep learning[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (7): 1264- 1271
3 洪炎佳, 孟铁豹, 黎浩江, 等 多模态多维信息融合的鼻咽癌MR图像肿瘤深度分割方法[J]. 浙江大学学报:工学版, 2020, 54 (3): 566- 573
HONG Yan-jia, MENG Tie-bao, LI Hao-jiang, et al Deep segmentation method of tumor boundaries from MR images of patients with nasopharyngeal carcinoma using multi-modality and multi-dimension fusion[J]. Journal of Zhejiang University:Engineering Science, 2020, 54 (3): 566- 573
4 TIAN Y, KRISHNAN D, ISOLA P. Contrastive representation distillation [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/1910.10699v2.pdf.
5 HINTON G, VINVALS O, DEAN J Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14 (7): 38- 39
6 CHEN G, CHOI W, YU X, et al. Learning efficient object detection models with knowledge distillation [C]// Proceedings of the Annual Conference on Neural Information Processing Systems. Long Beach: [s. n.], 2017: 742–751.
7 TAN X, REN Y, HE D, et al. Multilingual neural machine translation with knowledge distillation [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/1902.10461v3.pdf.
8 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/1409.1556.pdf.
9 ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets [C]// Proceedings of the International Conference on Learning Representations. San Diego: [s.n.], 2015: 1–13.
10 WANG T, YUAN L, ZHANG X, et al. Distilling object detectors with fine-grained feature imitation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4928–4937.
11 REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/1506.01497.pdf.
12 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
13 JIE H, LI S, GANG S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
14 WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Angeles: IEEE, 2020: 11531–11539.
15 HOU Y, MA Z, LIU C, et al. Learning lightweight lane detection CNNs by self attention distillation [C]// Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1013-1021.
16 EVERINGHAM M, ESLAMI S M A, VAN GOOL L, et al The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111 (1): 98- 136
doi: 10.1007/s11263-014-0733-5
17 GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.
18 MAO J, XIAO T, JIANG Y, et al. What can help pedestrian detection? [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3127-3136.
19 SUN R, TANG F, ZHANG X, et al. Distilling object detectors with task adaptive regularization [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/2006.13108.pdf.
20 ZHENG Z, YE R, WANG P, et al. Localization distillation for object detection [EB/OL]. [2021-09-07]. https://arxiv.org/pdf/2102.12252v3.pdf.
[1] 程若然,赵晓丽,周浩军,叶翰辰. 基于深度学习的中文字体风格转换研究综述[J]. 浙江大学学报(工学版), 2022, 56(3): 510-519, 530.
[2] 陈彤,郭剑锋,韩心中,谢学立,席建祥. 基于生成对抗模型的可见光-红外图像匹配方法[J]. 浙江大学学报(工学版), 2022, 56(1): 63-74.
[3] 任松,朱倩雯,涂歆玥,邓超,王小书. 基于深度学习的公路隧道衬砌病害识别方法[J]. 浙江大学学报(工学版), 2022, 56(1): 92-99.
[4] 刘兴,余建波. 注意力卷积GRU自编码器及其在工业过程监控的应用[J]. 浙江大学学报(工学版), 2021, 55(9): 1643-1651.
[5] 陈雪云,黄小巧,谢丽. 基于多尺度条件生成对抗网络血细胞图像分类检测方法[J]. 浙江大学学报(工学版), 2021, 55(9): 1772-1781.
[6] 刘嘉诚,冀俊忠. 基于宽度学习系统的fMRI数据分类方法[J]. 浙江大学学报(工学版), 2021, 55(7): 1270-1278.
[7] 金立生,华强,郭柏苍,谢宪毅,闫福刚,武波涛. 基于优化DeepSort的前方车辆多目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1056-1064.
[8] 周金海,周世镒,常阳,吴耿俊,王依川. 基于超宽带雷达基带信号的多人目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1208-1214.
[9] 宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[10] 许佳辉,王敬昌,陈岭,吴勇. 基于图神经网络的地表水水质预测模型[J]. 浙江大学学报(工学版), 2021, 55(4): 601-607.
[11] 王虹力,郭斌,刘思聪,刘佳琪,仵允港,於志文. 边端融合的终端情境自适应深度感知模型[J]. 浙江大学学报(工学版), 2021, 55(4): 626-638.
[12] 张腾,蒋鑫龙,陈益强,陈前,米涛免,陈彪. 基于腕部姿态的帕金森病用药后开-关期检测[J]. 浙江大学学报(工学版), 2021, 55(4): 639-647.
[13] 徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[14] 陈涵娟,达飞鹏,盖绍彦. 基于竞争注意力融合的深度三维点云分类网络[J]. 浙江大学学报(工学版), 2021, 55(12): 2342-2351.
[15] 牛英杰,苏燕辰,程敦诚,廖家,赵海波,高永强. 高铁接触网U型抱箍螺母故障检测算法[J]. 浙江大学学报(工学版), 2021, 55(10): 1912-1921.