Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (12): 2403-2415    DOI: 10.3785/j.issn.1008-973X.2022.12.009
计算机技术     
基于特征优化与深层次融合的目标检测算法
谢誉1(),包梓群1,张娜1,*(),吴彪2,涂小妹1,3,包晓安1
1. 浙江理工大学 计算机科学与技术学院,浙江 杭州 310018
2. 浙江理工大学 理学院,浙江 杭州 310018
3. 浙江广厦建设职业技术大学 建筑工程学院,浙江 东阳 322100
Object detection algorithm based on feature enhancement and deep fusion
Yu XIE1(),Zi-qun BAO1,Na ZHANG1,*(),Biao WU2,Xiao-mei TU1,3,Xiao-an BAO1
1. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
3. School of Civil Engineering and Architecture, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China
 全文: PDF(1565 KB)   HTML
摘要:

针对单阶段多边框检测算法(SSD)存在对小目标检测误差较大的问题,提出基于特征优化与深层次融合的目标检测算法,通过空间通道特征增强(SCFE)模块和深层次特征金字塔网络(DFPN)改进SSD. SCFE模块基于局部空间特征增强和全局通道特征增强机制优化特征层,注重特征层的细节信息;DFPN基于残差空间通道增强模块改进特征金字塔网络,使不同尺度特征层进行深层次特征融合,提升目标检测精度. 在训练阶段添加样本加权训练策略,使网络注重训练定位良好的样本和置信度高的样本. 实验结果表明,在PASCAL VOC数据集上,所提算法在保证速度的同时检测精度由SSD的77.2%提升至79.7%;在COCO数据集上,所提算法的检测精度由SSD的25.6%提升至30.1%,对小目标的检测精度由SSD的6.8%提升至13.3%.

关键词: 目标检测深层次特征金字塔网络(DFPN)空间通道特征增强(SCFE)样本加权训练单阶段多边框检测算法(SSD)    
Abstract:

A object detection algorithm based on feature optimization and deep fusion was proposed, aiming at the problems of single-stage multi-box detector algorithm (SSD) with large detection errors for small targets. SSD was improved through spatial and channel feature enhancement (SCFE) and deep feature pyramid network (DFPN). A feature layer based on the local spatial feature enhancement and the global channel feature enhancement mechanism was optimized by SCFE?module which focused on detail information of the feature layer. Based on the residual space channel enhancement module, feature?pyramid?network was?improved by DFPN which fused feature layers of different scales and improved the accuracy of object detection. At the same time, a sample weighted training strategy was added in the training stage, which made the network focused on training samples with good position and high confidence. The experimental results show that on the PASCAL VOC dataset, the detection accuracy of the proposed algorithm is improved from 77.2% to 79.7% of SSD while ensuring speed. On the COCO dataset, the detection accuracy of the proposed algorithm is increased from 25.6% to 30.1% for that of SSD, and the detection accuracy for small targets is increased from 6.8% to 13.3% for that of SSD.

Key words: object detection    deep feature pyramid network (DFPN)    spatial and channel feature enhancement (SCFE)    sample weighted training    single-stage multi-box detector algorithm (SSD)
收稿日期: 2022-01-05 出版日期: 2023-01-03
CLC:  TP 391  
基金资助: 浙江省重点研发计划项目(2020C03094);浙江省教育厅一般科研项目(Y202147659); 浙江省教育厅项目(Y202250706,Y202250677);国家自然科学基金资助项目(6207050141);浙江省基础公益研究计划项目(QY19E050003)
通讯作者: 张娜     E-mail: 1419352830@qq.com;zhangna@zstu.edu.cn
作者简介: 谢誉(1997—),男,硕士生,从事人工智能及计算机视觉信息处理研究. orcid.org/0000-0003-1067-3674.E-mail: 1419352830@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
谢誉
包梓群
张娜
吴彪
涂小妹
包晓安

引用本文:

谢誉,包梓群,张娜,吴彪,涂小妹,包晓安. 基于特征优化与深层次融合的目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2403-2415.

Yu XIE,Zi-qun BAO,Na ZHANG,Biao WU,Xiao-mei TU,Xiao-an BAO. Object detection algorithm based on feature enhancement and deep fusion. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2403-2415.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.12.009        https://www.zjujournals.com/eng/CN/Y2022/V56/I12/2403

图 1  3种改进的特征金字塔网络的结构图
图 2  基于特征优化与深层次融合的目标检测算法的网络结构
图 3  3个特征层各自随机抽取的1 000个特征点及其数值
特征层 特征图尺寸 mmin mmax Abox α nbox
Conv3_3 75×75 10 60 100,600 1, 2, 3 33 750
Conv4_3 38×38 30 60 900,18 000 1, 2 5 776
Conv7 19×19 60 111 3 600,6 660 1, 2, 3 2 166
Conv8_2 10×10 111 162 12 321,17 982 1, 2, 3 600
Conv9_2 5×5 162 213 26 244,34 506 1, 2, 3 150
Conv10_2 3×3 213 264 45 369,56 232 1, 2 36
Conv11_2 1×1 264 315 69 696,83 160 1, 2 4
表 1  特征金字塔网络中默认框的参数设置
图 4  空间通道特征增强模块结构图
图 5  原特征与特征增强后的热力图对比
图 6  特征金字塔结构
特征图 TC TM
Conv4_3 512 256
Conv7 1024 256
Conv8_2 512 256
Conv9_2 256 256
Conv10_2 256 256
Conv11_2 256 256
表 2  特征金字塔网络的输入与输出
图 7  2种残差空间与通道特征增强模块的结构图
图 8  DIoU效果示意图
图 9  DIoU分层局部排序
算法 主干网络 mAP/% S/(帧·s?1)
Faster R-CNN[5] VGGNet 73.2 7.0
Faster R-CNN[5] ResNet-101 78.8 2.3
R-FCN[34] ResNet-101 80.5 9.0
Cascade R-CNN[6] VGGNet 79.6 4.2
YOLOV2[9] DarkNet-19 73.7 81.0
SSD300[7] VGGNet 77.2 46.0
DSSD321[19] ResNet-101 78.6 9.5
STDN300[30] DenseNet-169 78.1 41.5
FSSD300[12] VGGNet 78.8 65.0
YOLOv3[35] DarkNet-53 79.4 37.0
RetinaNet[13] ResNet-101 79.4 12.4
FAENet300[31] VGGNet 80.1 65.0
FCOS[32] ResNet-50 77.8 17.6
ATSS[33] ResNet-50 78.2 14.9
FEDet VGGNet 79.7 39.0
表 3  VOC2007测试集上平均检测精度对比
算法 mAP/%
aero bird boat bottle car person plant sheep train tv
SSD300[7] 77.1 75.3 68.0 50.4 85.2 80.2 47.5 76.1 86.3 77.0
DSSD321[19] 81.9 80.5 68.4 53.9 86.2 79.7 51.7 78.0 87.2 79.4
STDN300[30] 81.1 76.4 69.2 52.4 84.2 76.8 51.8 78.4 87.5 77.8
FAENet300[31] 82.8 76.5 74.7 58.7 87.5 81.4 57.7 80.4 86.8 79.6
FEDet 84.0 79.3 75.6 59.1 86.7 80.0 59.2 79.5 87.9 79.9
表 4  VOC2007测试集不同类别目标检测精度结果
图 10  2种算法在VOC2007数据集上检测结果对比
算法 AP AP50 AP75 APS APM APL
%
SSD[7] 25.6 43.8 26.3 6.8 27.8 42.2
YOLOv3[35] 28.2 51.5 29.7 11.9 30.6 43.4
RefineDet[36] 29.4 49.2 31.3 10.0 32.0 44.4
FAENet[31] 28.3 47.9 29.7 10.5 30.9 41.9
FEDet 30.1 50.0 31.2 13.3 33.2 44.0
表 5  不同算法在COCO数据集上的实验结果
图 11  2种算法在COCO数据集上检测结果对比
图 12  空间通道注意力特征增强模块的消融实验
图 13  深层次特征金字塔网络的消融实验
算法 mAP/%
DFPN 78.7
SCFE 77.9
DIoU-Pisa 78.0
DFPN+SCFE 79.2
DFPN+DIoU-Pisa 79.1
SCFE+DIoU-Pisa 78.9
DFPN+SCFE+DIoU-Pisa 79.7
表 6  所提算法不同模块的平均精度均值
注意力模块 APS/% mAP/%
空间 23.8 77.7
通道 23.1 77.7
空间+通道 24.9 77.9
表 7  通道注意力与空间注意力不同结合方式的评估实验结果
算法 mAP/(%)
SSD 77.2
SSD+FPN 77.5
RSCFE-a 78.1
RSCFE-b 78.1
表 8  不同模块连接结构的平均精度均值
1 李雅倩, 盖成远, 肖存军, 等 基于细化多尺度深度特征的目标检测网络[J]. 电子学报, 2020, 48 (12): 2360- 2366
LI Ya-qian, GAI Cheng-yuan, XIAO Cun-jun, et al Object detection network based on refined multi-scale depth features[J]. Acta Electronica Sinica, 2020, 48 (12): 2360- 2366
doi: 10.3969/j.issn.0372-2112.2020.12.011
2 郑浦, 白宏阳, 李伟, 等 复杂背景下的小目标检测算法[J]. 浙江大学学报:工学版, 2020, 54 (9): 1777- 1784
ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784
3 GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// 2014 IEEE Conference on Computer Vison and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
4 GIRSHICK R. Fast R-CNN [C]// 2015 IEEE International Conference on Computer Vison. Santiago: IEEE, 2015: 1440-1448.
5 REN S, HE K GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149
doi: 10.1109/TPAMI.2016.2577031
6 CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// 2018 IEEE/CVF Conference on Computer Vison and Pattern Recognition. Salt Lake City: IEEE, 2018: 2603-2611.
7 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. [S. l. ]: Springer, 2016: 21-37.
8 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
9 REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
10 REDMON J, FARHADI A. Yolov3: an incremental improvement. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1804.02767.pdf.
11 LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vison and Pattern Recognition. Honolulu: IEEE, 2017: 963-944.
12 LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1712. 00960.pdf.
13 LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// 2017 IEEE International Conference on Computer Vison. Venice: IEEE, 2017: 2999-3007.
14 裴伟, 许晏铭, 朱永英, 等 改进的SSD航拍目标检测方法[J]. 软件学报, 2019, 30 (3): 738- 758
PEI Wei, XU Yan-ming, ZHU Yong-ying, et al The target detection method of aerial photography images with improved SSD[J]. Journal of Software, 2019, 30 (3): 738- 758
doi: 10.13328/j.cnki.jos.005695
15 TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
16 GUO C, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.
17 ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression. [C]// AAAI Conference on Artificial Intelligence. NewYork: AAAI, 2020: 12993–13000.
18 SIMON Y K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1409.1556.pdf.
19 陈科圻, 朱志亮, 邓小明, 等 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32 (4): 1201- 1227
CHEN Ke-qi, ZHU Zhi-liang, DENG Xiao-ming, et al Deep learning for multi-scale object detection: a survey[J]. Journal of Software, 2021, 32 (4): 1201- 1227
doi: 10.13328/j.cnki.jos.006166
20 WANG K, LIEW J H, ZOU Y, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9197-9206.
21 GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7036-7045.
22 ZHANG Q, BAO X, WU B, et al Water meter pointer reading recognition method based on target-key point detection[J]. Flow Measurement and Instrumentation, 2021, 81: 102012
doi: 10.1016/j.flowmeasinst.2021.102012
23 HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
24 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// European Conference on Computer Vision. [S. l.]: Springer, 2018: 3-19.
25 ZHANG H, ZU K, LU J, et al. EPSANet: an efficient pyramid split attention block on convolutional neural network. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/ 2105.14447.pdf.
26 LIU W, RABINOVICH A, BERG A C. ParseNet: looking wider to see better. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1506.04579.pdf.
27 刘颖, 刘红燕, 范九伦, 等 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48 (3): 590- 601
LIU Ying, LIU Hong-yan, FAN Jiu-lun, et al A Survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48 (3): 590- 601
doi: 10.3969/j.issn.0372-2112.2020.03.024
28 QIN Z, LI Z, ZHANG Z, et al. ThunderNet: towards real-time generic object detection on mobile devices [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6718-6727.
29 CAO Y, CHEN K, LOY C C, et al. Prime sample attention in object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11583-11591.
30 ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 528-537.
31 LI W, LIU G. A single-shot object detector with feature aggregation and enhancement [C]// 2019 IEEE International Conference on Image Processing. [S.l.]: IEEE, 2019: 3910-3914.
32 TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9627-9636.
33 ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9759-9768.
34 田秀霞, 李华强, 张琴, 等 基于双通道R-FCN的图像篡改检测模型[J]. 计算机学报, 2021, 44 (2): 370- 383
TIAN Xiu-xia, LI Hua-qiang, ZHANG Qin, et al Dual-channel R-FCN model for image forgery detection[J]. Chinese Journal of Computers, 2021, 44 (2): 370- 383
doi: 10.11897/SP.J.1016.2021.00370
35 BOCHKOVSKIY A, WANG C Y, LIAO H Y M, et al. YOLOv4: optimal speed and accuracy of object detection. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/2004.10934.pdf.
[1] 张娜,戚旭磊,包晓安,吴彪,涂小妹,金瑜婷. 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(4): 783-794.
[2] 褚晶辉,史李栋,井佩光,吕卫. 适用于目标检测的上下文感知知识蒸馏网络[J]. 浙江大学学报(工学版), 2022, 56(3): 503-509.
[3] 于楠晶,范晓飚,邓天民,冒国韬. 基于多头自注意力的复杂背景船舶检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2392-2402.
[4] 张云佐,郭威,蔡昭权,李文博. 联合多尺度与注意力机制的遥感图像目标检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2215-2223.
[5] 张融,张为. 基于改进GhostNet-FCOS的火灾检测算法[J]. 浙江大学学报(工学版), 2022, 56(10): 1891-1899.
[6] 周金海,周世镒,常阳,吴耿俊,王依川. 基于超宽带雷达基带信号的多人目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1208-1214.
[7] 徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[8] 牛英杰,苏燕辰,程敦诚,廖家,赵海波,高永强. 高铁接触网U型抱箍螺母故障检测算法[J]. 浙江大学学报(工学版), 2021, 55(10): 1912-1921.
[9] 郑浦,白宏阳,李伟,郭宏伟. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784.
[10] 张峻宁,苏群星,刘鹏远,王正军,谷宏强. 基于空间约束的自适应单目3D物体检测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1138-1146.
[11] 郑晨斌,张勇,胡杭,吴颖睿,黄广靖. 目标检测强化上下文模型[J]. 浙江大学学报(工学版), 2020, 54(3): 529-539.
[12] 晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.
[13] 林志洁,罗壮,赵磊,鲁东明. 特征金字塔多尺度全卷积目标检测算法[J]. 浙江大学学报(工学版), 2019, 53(3): 533-540.
[14] 叶芳芳,许力. 实时的静止目标与鬼影检测及判别方法[J]. 浙江大学学报(工学版), 2015, 49(1): 181-185.
[15] 刘辉涛,汪李明,李建龙. 声纳强脉冲干扰的自适应抵消方法[J]. J4, 2011, 45(3): 515-519.