Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (12): 2403-2415    DOI: 10.3785/j.issn.1008-973X.2022.12.009
    
Object detection algorithm based on feature enhancement and deep fusion
Yu XIE1(),Zi-qun BAO1,Na ZHANG1,*(),Biao WU2,Xiao-mei TU1,3,Xiao-an BAO1
1. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
3. School of Civil Engineering and Architecture, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China
Download: HTML     PDF(1565KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A object detection algorithm based on feature optimization and deep fusion was proposed, aiming at the problems of single-stage multi-box detector algorithm (SSD) with large detection errors for small targets. SSD was improved through spatial and channel feature enhancement (SCFE) and deep feature pyramid network (DFPN). A feature layer based on the local spatial feature enhancement and the global channel feature enhancement mechanism was optimized by SCFE?module which focused on detail information of the feature layer. Based on the residual space channel enhancement module, feature?pyramid?network was?improved by DFPN which fused feature layers of different scales and improved the accuracy of object detection. At the same time, a sample weighted training strategy was added in the training stage, which made the network focused on training samples with good position and high confidence. The experimental results show that on the PASCAL VOC dataset, the detection accuracy of the proposed algorithm is improved from 77.2% to 79.7% of SSD while ensuring speed. On the COCO dataset, the detection accuracy of the proposed algorithm is increased from 25.6% to 30.1% for that of SSD, and the detection accuracy for small targets is increased from 6.8% to 13.3% for that of SSD.



Key wordsobject detection      deep feature pyramid network (DFPN)      spatial and channel feature enhancement (SCFE)      sample weighted training      single-stage multi-box detector algorithm (SSD)     
Received: 05 January 2022      Published: 03 January 2023
CLC:  TP 391  
Fund:  浙江省重点研发计划项目(2020C03094);浙江省教育厅一般科研项目(Y202147659); 浙江省教育厅项目(Y202250706,Y202250677);国家自然科学基金资助项目(6207050141);浙江省基础公益研究计划项目(QY19E050003)
Corresponding Authors: Na ZHANG     E-mail: 1419352830@qq.com;zhangna@zstu.edu.cn
Cite this article:

Yu XIE,Zi-qun BAO,Na ZHANG,Biao WU,Xiao-mei TU,Xiao-an BAO. Object detection algorithm based on feature enhancement and deep fusion. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2403-2415.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.12.009     OR     https://www.zjujournals.com/eng/Y2022/V56/I12/2403


基于特征优化与深层次融合的目标检测算法

针对单阶段多边框检测算法(SSD)存在对小目标检测误差较大的问题,提出基于特征优化与深层次融合的目标检测算法,通过空间通道特征增强(SCFE)模块和深层次特征金字塔网络(DFPN)改进SSD. SCFE模块基于局部空间特征增强和全局通道特征增强机制优化特征层,注重特征层的细节信息;DFPN基于残差空间通道增强模块改进特征金字塔网络,使不同尺度特征层进行深层次特征融合,提升目标检测精度. 在训练阶段添加样本加权训练策略,使网络注重训练定位良好的样本和置信度高的样本. 实验结果表明,在PASCAL VOC数据集上,所提算法在保证速度的同时检测精度由SSD的77.2%提升至79.7%;在COCO数据集上,所提算法的检测精度由SSD的25.6%提升至30.1%,对小目标的检测精度由SSD的6.8%提升至13.3%.


关键词: 目标检测,  深层次特征金字塔网络(DFPN),  空间通道特征增强(SCFE),  样本加权训练,  单阶段多边框检测算法(SSD) 
Fig.1 Structure diagram of three improved feature pyramid network
Fig.2 Structure diagram of single-stage object detection algorithm based on feature enhancement and deep fusion
Fig.3 One thousand randomly extracted feature points and their values for three feature layers each
特征层 特征图尺寸 mmin mmax Abox α nbox
Conv3_3 75×75 10 60 100,600 1, 2, 3 33 750
Conv4_3 38×38 30 60 900,18 000 1, 2 5 776
Conv7 19×19 60 111 3 600,6 660 1, 2, 3 2 166
Conv8_2 10×10 111 162 12 321,17 982 1, 2, 3 600
Conv9_2 5×5 162 213 26 244,34 506 1, 2, 3 150
Conv10_2 3×3 213 264 45 369,56 232 1, 2 36
Conv11_2 1×1 264 315 69 696,83 160 1, 2 4
Tab.1 Parameter settings of default box in feature pyramid network
Fig.4 Structure diagram of spatial channel feature enhancement module
Fig.5 Comparison between heat map of original feature and that of enhanced one
Fig.6 Feature pyramid structure
特征图 TC TM
Conv4_3 512 256
Conv7 1024 256
Conv8_2 512 256
Conv9_2 256 256
Conv10_2 256 256
Conv11_2 256 256
Tab.2 Input and output of feature pyramid network
Fig.7 Structural diagram of two residual spatial and channel attention feature enhancement modules
Fig.8 Schematic diagram of effect of DIoU
Fig.9 DIoU hierarchical local sorting
算法 主干网络 mAP/% S/(帧·s?1)
Faster R-CNN[5] VGGNet 73.2 7.0
Faster R-CNN[5] ResNet-101 78.8 2.3
R-FCN[34] ResNet-101 80.5 9.0
Cascade R-CNN[6] VGGNet 79.6 4.2
YOLOV2[9] DarkNet-19 73.7 81.0
SSD300[7] VGGNet 77.2 46.0
DSSD321[19] ResNet-101 78.6 9.5
STDN300[30] DenseNet-169 78.1 41.5
FSSD300[12] VGGNet 78.8 65.0
YOLOv3[35] DarkNet-53 79.4 37.0
RetinaNet[13] ResNet-101 79.4 12.4
FAENet300[31] VGGNet 80.1 65.0
FCOS[32] ResNet-50 77.8 17.6
ATSS[33] ResNet-50 78.2 14.9
FEDet VGGNet 79.7 39.0
Tab.3 Comparison of mean average precision on VOC2007 test set
算法 mAP/%
aero bird boat bottle car person plant sheep train tv
SSD300[7] 77.1 75.3 68.0 50.4 85.2 80.2 47.5 76.1 86.3 77.0
DSSD321[19] 81.9 80.5 68.4 53.9 86.2 79.7 51.7 78.0 87.2 79.4
STDN300[30] 81.1 76.4 69.2 52.4 84.2 76.8 51.8 78.4 87.5 77.8
FAENet300[31] 82.8 76.5 74.7 58.7 87.5 81.4 57.7 80.4 86.8 79.6
FEDet 84.0 79.3 75.6 59.1 86.7 80.0 59.2 79.5 87.9 79.9
Tab.4 Different types of target detection accuracy results on VOC2007 test set
Fig.10 Comparison of detection results of two algorithms on VOC2007 dataset
算法 AP AP50 AP75 APS APM APL
%
SSD[7] 25.6 43.8 26.3 6.8 27.8 42.2
YOLOv3[35] 28.2 51.5 29.7 11.9 30.6 43.4
RefineDet[36] 29.4 49.2 31.3 10.0 32.0 44.4
FAENet[31] 28.3 47.9 29.7 10.5 30.9 41.9
FEDet 30.1 50.0 31.2 13.3 33.2 44.0
Tab.5 Experiment results of different algorithms on COCO dataset
Fig.11 Comparison of detection results of two algorithms on COCO datase
Fig.12 Ablation experiment of spatial and channel attention feature enhancement  module
Fig.13 Ablation experiment of deep feature pyramid network
算法 mAP/%
DFPN 78.7
SCFE 77.9
DIoU-Pisa 78.0
DFPN+SCFE 79.2
DFPN+DIoU-Pisa 79.1
SCFE+DIoU-Pisa 78.9
DFPN+SCFE+DIoU-Pisa 79.7
Tab.6 Average precision means of different modules of proposed algorithm
注意力模块 APS/% mAP/%
空间 23.8 77.7
通道 23.1 77.7
空间+通道 24.9 77.9
Tab.7 Evaluation of Experimental results different combinations of channel attention and spatial attention
算法 mAP/(%)
SSD 77.2
SSD+FPN 77.5
RSCFE-a 78.1
RSCFE-b 78.1
Tab.8 Average accuracy means of different module connection structures
[1]   李雅倩, 盖成远, 肖存军, 等 基于细化多尺度深度特征的目标检测网络[J]. 电子学报, 2020, 48 (12): 2360- 2366
LI Ya-qian, GAI Cheng-yuan, XIAO Cun-jun, et al Object detection network based on refined multi-scale depth features[J]. Acta Electronica Sinica, 2020, 48 (12): 2360- 2366
doi: 10.3969/j.issn.0372-2112.2020.12.011
[2]   郑浦, 白宏阳, 李伟, 等 复杂背景下的小目标检测算法[J]. 浙江大学学报:工学版, 2020, 54 (9): 1777- 1784
ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784
[3]   GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// 2014 IEEE Conference on Computer Vison and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
[4]   GIRSHICK R. Fast R-CNN [C]// 2015 IEEE International Conference on Computer Vison. Santiago: IEEE, 2015: 1440-1448.
[5]   REN S, HE K GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149
doi: 10.1109/TPAMI.2016.2577031
[6]   CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// 2018 IEEE/CVF Conference on Computer Vison and Pattern Recognition. Salt Lake City: IEEE, 2018: 2603-2611.
[7]   LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. [S. l. ]: Springer, 2016: 21-37.
[8]   REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788.
[9]   REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
[10]   REDMON J, FARHADI A. Yolov3: an incremental improvement. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1804.02767.pdf.
[11]   LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// 2017 IEEE Conference on Computer Vison and Pattern Recognition. Honolulu: IEEE, 2017: 963-944.
[12]   LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1712. 00960.pdf.
[13]   LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// 2017 IEEE International Conference on Computer Vison. Venice: IEEE, 2017: 2999-3007.
[14]   裴伟, 许晏铭, 朱永英, 等 改进的SSD航拍目标检测方法[J]. 软件学报, 2019, 30 (3): 738- 758
PEI Wei, XU Yan-ming, ZHU Yong-ying, et al The target detection method of aerial photography images with improved SSD[J]. Journal of Software, 2019, 30 (3): 738- 758
doi: 10.13328/j.cnki.jos.005695
[15]   TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
[16]   GUO C, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.
[17]   ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression. [C]// AAAI Conference on Artificial Intelligence. NewYork: AAAI, 2020: 12993–13000.
[18]   SIMON Y K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1409.1556.pdf.
[19]   陈科圻, 朱志亮, 邓小明, 等 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32 (4): 1201- 1227
CHEN Ke-qi, ZHU Zhi-liang, DENG Xiao-ming, et al Deep learning for multi-scale object detection: a survey[J]. Journal of Software, 2021, 32 (4): 1201- 1227
doi: 10.13328/j.cnki.jos.006166
[20]   WANG K, LIEW J H, ZOU Y, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9197-9206.
[21]   GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7036-7045.
[22]   ZHANG Q, BAO X, WU B, et al Water meter pointer reading recognition method based on target-key point detection[J]. Flow Measurement and Instrumentation, 2021, 81: 102012
doi: 10.1016/j.flowmeasinst.2021.102012
[23]   HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
[24]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// European Conference on Computer Vision. [S. l.]: Springer, 2018: 3-19.
[25]   ZHANG H, ZU K, LU J, et al. EPSANet: an efficient pyramid split attention block on convolutional neural network. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/ 2105.14447.pdf.
[26]   LIU W, RABINOVICH A, BERG A C. ParseNet: looking wider to see better. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/1506.04579.pdf.
[27]   刘颖, 刘红燕, 范九伦, 等 基于深度学习的小目标检测研究与应用综述[J]. 电子学报, 2020, 48 (3): 590- 601
LIU Ying, LIU Hong-yan, FAN Jiu-lun, et al A Survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020, 48 (3): 590- 601
doi: 10.3969/j.issn.0372-2112.2020.03.024
[28]   QIN Z, LI Z, ZHANG Z, et al. ThunderNet: towards real-time generic object detection on mobile devices [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6718-6727.
[29]   CAO Y, CHEN K, LOY C C, et al. Prime sample attention in object detection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11583-11591.
[30]   ZHOU P, NI B, GENG C, et al. Scale-transferrable object detection [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 528-537.
[31]   LI W, LIU G. A single-shot object detector with feature aggregation and enhancement [C]// 2019 IEEE International Conference on Image Processing. [S.l.]: IEEE, 2019: 3910-3914.
[32]   TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9627-9636.
[33]   ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9759-9768.
[34]   田秀霞, 李华强, 张琴, 等 基于双通道R-FCN的图像篡改检测模型[J]. 计算机学报, 2021, 44 (2): 370- 383
TIAN Xiu-xia, LI Hua-qiang, ZHANG Qin, et al Dual-channel R-FCN model for image forgery detection[J]. Chinese Journal of Computers, 2021, 44 (2): 370- 383
doi: 10.11897/SP.J.1016.2021.00370
[35]   BOCHKOVSKIY A, WANG C Y, LIAO H Y M, et al. YOLOv4: optimal speed and accuracy of object detection. [EB/OL]. [2021-12-30]. https://arxiv.org/pdf/2004.10934.pdf.
[1] Na ZHANG,Xu-lei QI,Xiao-an BAO,Biao WU,Xiao-mei TU,Yu-ting JIN. Single-stage object detection algorithm based on optimizing position prediction[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 783-794.
[2] Jing-hui CHU,Li-dong SHI,Pei-guang JING,Wei LV. Context-aware knowledge distillation network for object detection[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 503-509.
[3] Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.
[4] Rong ZHANG,Wei ZHANG. Fire detection algorithm based on improved GhostNet-FCOS[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1891-1899.
[5] Kai DU,Guo-rong ZHU,Jiang-hua LU,Mu-ye PANG. Metal object detection method in wireless electric vehicle charging system[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 56-62.
[6] Ying-jie NIU,Yan-chen SU,Dun-cheng CHENG,Jia LIAO,Hai-bo ZHAO,Yong-qiang GAO. High-speed rail contact network U-holding nut fault detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1912-1921.
[7] Ying-jie XIA,Cong-yu OUYANG. Dynamic image background modeling method for detecting abandoned objects in highway[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1249-1255.
[8] Chen-bin ZHENG,Yong ZHANG,Hang HU,Ying-rui WU,Guang-jing HUANG. Object detection enhanced context model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 529-539.
[9] Yao JIN,Wei ZHANG. Real-time fire detection algorithm with Anchor-Free network architecture[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2430-2436.
[10] YE Fang-fang, XU Li. Real-time detection and discrimination of static objects and ghosts[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(1): 181-185.
[11] XU Xue-mei, LI Li-xian, ZHANG Jian-yang, NI Lan, HUANG Zheng-yu, CAO Jian. Tracking algorithm of visible particles in transparent
liquid pharmaceutical
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(10): 1822-1830.