Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (4): 783-794    DOI: 10.3785/j.issn.1008-973X.2022.04.018
    
Single-stage object detection algorithm based on optimizing position prediction
Na ZHANG1(),Xu-lei QI1,Xiao-an BAO1,*(),Biao WU1,Xiao-mei TU2,Yu-ting JIN2
1. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Information Science and Technology, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China
Download: HTML     PDF(1624KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A single-stage object detection algorithm named EL-SSD based on optimizing position prediction was proposed aiming at the problem of inaccurate target positioning and low accuracy of small object detection in the single shot multi-box detector (SSD) algorithm. The prediction feature maps from original SSD were decoded according to feature location information after feature fusion by bi-directional feature pyramid network. Then the weights of feature channels were redistributed. The feature semantic information was improved, and the cross-channel location information was captured. The classification confidence and additional fixed position confidence cascade clustering were constructed to non-maximum suppress prediction bounding box at the detection stage, which improved the positioning accuracy of the selected target. The experimental results showed that the mean average precision value of the EL-SSD on the PASCAL VOC2007 dataset was 79.8%, which was 2.6% higher than that of the original SSD algorithm. The mean average precision value of the EL-SSD on the COCO dataset was 29.4%, which was 3.5% higher than that of the original SSD algorithm. The improved SSD algorithm has a better location performance and small target detection performance, which is suitable for application scenarios that require high localization performance.



Key wordsobject detection      single shot multi-box detector algorithm      feature fusion      non-maximum suppression      localization confidence     
Received: 24 May 2021      Published: 24 April 2022
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(6207050141); 浙江省重点研发计划资助项目(2020C03094)
Corresponding Authors: Xiao-an BAO     E-mail: zhangna@zstu.edu.cn;baoxiaoan@zstu.edu.cn
Cite this article:

Na ZHANG,Xu-lei QI,Xiao-an BAO,Biao WU,Xiao-mei TU,Yu-ting JIN. Single-stage object detection algorithm based on optimizing position prediction. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 783-794.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.04.018     OR     https://www.zjujournals.com/eng/Y2022/V56/I4/783


基于优化预测定位的单阶段目标检测算法

针对单阶段多边框检测(SSD)算法中存在目标定位不准确和小目标检测精度不高的问题,提出基于优化预测定位的单阶段目标检测算法EL-SSD. 通过双向加权特征金字塔将原SSD预测特征图特征融合,对输出特征图进行特征位置信息解码后进行特征通道权重再分配,提升了特征语义信息,捕获了跨通道位置信息. 通过构建分类置信度及额外的定位置信度级联聚类对预测框进行非极大值抑制,提高在检测阶段对选择目标的定位精度. 实验结果表明,EL-SSD算法在PASCAL VOC2007上的平均检测均值达到79.8%,比原SSD算法提高了2.6%. 在COCO数据集上的精度达到29.4%,比原SSD算法提高了3.5%,在检测图片上的目标定位效果及小目标检测效果明显优于SSD, 适用于需要高定位性能的实时应用场景.


关键词: 目标检测,  单阶段多边框检测算法,  特征融合,  非极大值抑制,  定位置信度 
Fig.1 Comparison of popular improved SSD network structure
Fig.2 Predicted classification scores of bounding boxes and IoU with ground-truth
Fig.3 Network structure of EL-SSD
Fig.4 Schematic diagram of CBFN network structure
Fig.5 Schematic diagram of CEA module network structure
Fig.6 Schematic diagram of CEA module weight redistribution process
Fig.7 Heat map of same level of SSD and EL-SSD detection feature map
Fig.8 Detection head with localization confidence prediction branch
Fig.9 Schematic diagram of IoU confidence prediction error
Fig.10 Effect diagram of DIoU
方法 骨干网络 输入尺寸 GPU mAP/% v/(帧·s?1)
Faster R-CNN[3] VGGNet 1000×600 Titan X 73.2 7.0
Faster R-CNN[3] ResNet-101 1000×600 1080Ti 78.8 2.3
Mask R-CNN[36] ResNet-50 1000×600 1080Ti 77.4 4.2
Cascade R-CNN[37] VGGNet 1000×600 1080Ti 79.6 5.3
YOLOV2[6] Darknet-19 352×352 Titan X 73.7 81.0
RefineDet320[18] VGGNet 320×320 1080Ti 80.0 22.1
FCOS[32] ResNet-50 1333×800 1080Ti 73.5 17.6
ATSS[11] ResNet-50 1333×800 1080Ti 75.2 14.9
RetinaNet400[10] ResNet-101 ~640x400 1080Ti 79.4 12.4
FSSD300[16] VGGNet 300×300 1080Ti 78.8 65.0
SSD300[13] VGGNet 300×300 1080Ti 77.2 42.1
ASSD321[38] ResNet-101 321×321 K40 79.5 11.4
DSSD321[15] ResNet-101 321×321 Titan X 78.6 9.5
EL-SSD300 VGGNet 300×300 1080Ti 79.8 27.0
Tab.1 Comparison of mean average precision on VOC2007 test dataset
Fig.11 Histogram of 20 types of target detection precision results in VOC2007 test dataset
%
方法 IoUt = 0.50 IoUt = 0.75 IoUt = 0.95
Plant Bottle Plant Bottle Plant Bottle
SSD 47.5 50.4 17.4 21.1 0.8 0.1
EL-SSD 52.0 56.6 19.6 26.7 2.1 0.9
Tab.2 Comparison of mean average precision of location prediction effects for small target categories
方法 mAP/% mAP/%
IoUt = 0.5~0.95 IoUt = 0.5 IoUt = 0.75 S M L
SSD 25.9 44.3 25.4 6.2 26.0 41.5
EL-SSD 29.4 47.2 30.6 10.3 31.6 45.7
Tab.3 Experiment results of EL-SSD on COCO dataset
Fig.12 Comparison of detection results on COCO dataset
mAP/% +FPN +CBFN
(无 CEA)
+CEA +OPS-NMS
(无 DIoU)
+DIoU
77.6
77.8
78.0
78.6
78.6
77.7
78.9
79.5
79.8
Tab.4 Ablation experiment results of EL-SSD on VOC2007 test dataset
Fig.13 Line graph of positioning accuracy of three NMS algorithms in range of [0.43, 0.48]
Fig.14 Comparison of test results of SSD using different NMS
[1]   高文, 汤洋, 朱明 复杂背景下目标检测的级联分类器算法研究[J]. 物理学报, 2014, 63 (9): 156- 164
GAO Wen, TANG Yang, ZHU Ming Study on the cascade classifier in target detection under complex background[J]. Acta Physica Sinica, 2014, 63 (9): 156- 164
[2]   王浩, 单文静, 方宝富 基于多层上下文卷积神经网络的目标检测算法[J]. 模式识别与人工智能, 2020, 33 (2): 113- 120
WANG Hao, SHAN Wen-jing, FANG Bao-fu Multi-layers context convolutional neural network for object detection[J]. Pattern Recognition and Artificial Intelligence, 2020, 33 (2): 113- 120
[3]   REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
[4]   尉婉青, 禹晶, 柏鳗晏, 等 SSD与时空特征融合的视频目标检测[J]. 中国图象图形学报, 2021, 26 (3): 542- 555
WEI Wan-qing, YU Jing, BAI Man-yan, et al Video object detection using fusion of SSD and spatiotemporal features[J]. Journal of Image and Graphics, 2021, 26 (3): 542- 555
doi: 10.11834/jig.200020
[5]   徐利锋, 黄海帆, 丁维龙, 等 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报: 工学版, 2021, 55 (2): 377- 385
XU Li-feng, HUANG Hai-fan, DING Wei-long, et al Detection of small fruit target based on improved DenseNet[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (2): 377- 385
[6]   REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2016: 779–788.
[7]   REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
[8]   REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018-08-08). https://arxiv.org/pdf/1804.02767.pdf.
[9]   SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
doi: 10.1109/TPAMI.2016.2572683
[10]   LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
[11]   ZHANG S F, CHI C, YAO Y Q, el al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756-9765.
[12]   LIN T Y, DOLLAR P, GIRSGICK, et al. Feature pyramid networks for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
[13]   LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.
[14]   郑浦, 白宏阳, 李伟, 等 复杂背景下的小目标检测算法[J]. 浙江大学学报: 工学版, 2020, 54 (9): 1777- 1784
ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784
[15]   FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23). https://arxiv.org/pdf/1701.06659.pdf.
[16]   LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. (2017-12-04). https://arxiv.org/pdf/1712.00960.pdf.
[17]   SHEN Z Q, LIU Z, LI J G, et al. DSOD: learning deeply supervised object detectors from Scratch [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1937-1945.
[18]   ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4203-4212.
[19]   HE Y X, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2883-2892.
[20]   BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562-5570.
[21]   LIU S T, HUANG D, WANG Y H. Adaptive NMS: refining pedestrian detection in a crowd [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6452-6461.
[22]   LUO Y H, CAO X, ZHANG J T, et al. CE-FPN: enhancing channel information for object detection [EB/OL]. (2021-03-09). https://arxiv.org/pdf/2103.10643.pdf.
[23]   PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: towards balanced learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 821-830.
[24]   GUO C X, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.
[25]   WANG K X, LIEW J H, ZHOU D Q, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9196-9205.
[26]   TAN X M, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
[27]   CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.
[28]   HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
[29]   HOU Q B, ZHOU D Q, FENG J S, et al. Coordinate attention for efficient mobile network design [C]//IEEE International Conference on Computer Vision. [S. l.]: IEEE, 2021: 13708-13717.
[30]   HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
[31]   JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection [C]// European Conference on Computer Vision. Munich: Springer, 2018: 816-832.
[32]   TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9626-9635.
[33]   LI X, WANG W H, WU L J, et al Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002- 21012
[34]   WU S K, LI X P, WANG X G IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97: 103911
[35]   ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: faster and better learning for bounding box regression[C]// AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12993-13000.
[36]   HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[37]   CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.
[1] Jing-hui CHU,Li-dong SHI,Pei-guang JING,Wei LV. Context-aware knowledge distillation network for object detection[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 503-509.
[2] Xin-yu HUANG,Fan YOU,Pei ZHANG,Zhao ZHANG,Bai-li ZHANG,Jian-hua LV,Li-zhen XU. Silent liveness detection algorithm based on multi classification and feature fusion network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 263-270.
[3] Kai DU,Guo-rong ZHU,Jiang-hua LU,Mu-ye PANG. Metal object detection method in wireless electric vehicle charging system[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 56-62.
[4] Zhi-chao CHEN,Hai-ning JIAO,Jie YANG,Hua-fu ZENG. Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1490-1499.
[5] Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.
[6] Yue-lin CHEN,Wen-jing TIAN,Xiao-dong CAI,Shu-ting ZHENG. Text matching model based on dense connection networkand multi-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2352-2358.
[7] Xue-yun CHEN,Jin XIA,Ke DU. Overhead transmission line detection based on multiple linear-feature enhanced detector[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2382-2389.
[8] Ying-jie NIU,Yan-chen SU,Dun-cheng CHENG,Jia LIAO,Hai-bo ZHAO,Yong-qiang GAO. High-speed rail contact network U-holding nut fault detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1912-1921.
[9] Pu ZHENG,Hong-yang BAI,Wei LI,Hong-wei GUO. Small target detection algorithm in complex background[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1777-1784.
[10] Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.
[11] Ying-jie XIA,Cong-yu OUYANG. Dynamic image background modeling method for detecting abandoned objects in highway[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1249-1255.
[12] Chen-bin ZHENG,Yong ZHANG,Hang HU,Ying-rui WU,Guang-jing HUANG. Object detection enhanced context model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 529-539.
[13] Yao JIN,Wei ZHANG. Real-time fire detection algorithm with Anchor-Free network architecture[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2430-2436.
[14] Hong LIN,Yao-yao LU. Scale differentiated text detection method focusing on hard examples[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1506-1516.
[15] Xiang-hao CHENG,Fei-peng DA,Liang WANG. Feature fusion based constrained local model for three-dimensional facial landmark localization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(4): 770-776.