Single-stage object detection algorithm based on optimizing position prediction

doi:10.3785/j.issn.1008-973X.2022.04.018

Journal of ZheJiang University (Engineering Science)

2022, Vol. 56

Issue (4): 783-794 DOI: 10.3785/j.issn.1008-973X.2022.04.018

Single-stage object detection algorithm based on optimizing position prediction

Na ZHANG1(

),Xu-lei QI1,Xiao-an BAO1,*(

),Biao WU1,Xiao-mei TU2,Yu-ting JIN2

1. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Information Science and Technology, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China

Download:

HTML

PDF(1624KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A single-stage object detection algorithm named EL-SSD based on optimizing position prediction was proposed aiming at the problem of inaccurate target positioning and low accuracy of small object detection in the single shot multi-box detector (SSD) algorithm. The prediction feature maps from original SSD were decoded according to feature location information after feature fusion by bi-directional feature pyramid network. Then the weights of feature channels were redistributed. The feature semantic information was improved, and the cross-channel location information was captured. The classification confidence and additional fixed position confidence cascade clustering were constructed to non-maximum suppress prediction bounding box at the detection stage, which improved the positioning accuracy of the selected target. The experimental results showed that the mean average precision value of the EL-SSD on the PASCAL VOC2007 dataset was 79.8%, which was 2.6% higher than that of the original SSD algorithm. The mean average precision value of the EL-SSD on the COCO dataset was 29.4%, which was 3.5% higher than that of the original SSD algorithm. The improved SSD algorithm has a better location performance and small target detection performance, which is suitable for application scenarios that require high localization performance.

Key words： object detection single shot multi-box detector algorithm feature fusion non-maximum suppression localization confidence

Received: 24 May 2021 Published: 24 April 2022

CLC:

TP 391

Fund: 国家自然科学基金资助项目(6207050141); 浙江省重点研发计划资助项目(2020C03094)

Corresponding Authors: Xiao-an BAO E-mail: zhangna@zstu.edu.cn;baoxiaoan@zstu.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Na ZHANG
	Xu-lei QI
	Xiao-an BAO
	Biao WU
	Xiao-mei TU
	Yu-ting JIN

Cite this article:

Na ZHANG,Xu-lei QI,Xiao-an BAO,Biao WU,Xiao-mei TU,Yu-ting JIN. Single-stage object detection algorithm based on optimizing position prediction. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 783-794.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.04.018 OR https://www.zjujournals.com/eng/Y2022/V56/I4/783

基于优化预测定位的单阶段目标检测算法

针对单阶段多边框检测(SSD)算法中存在目标定位不准确和小目标检测精度不高的问题，提出基于优化预测定位的单阶段目标检测算法EL-SSD. 通过双向加权特征金字塔将原SSD预测特征图特征融合，对输出特征图进行特征位置信息解码后进行特征通道权重再分配，提升了特征语义信息，捕获了跨通道位置信息. 通过构建分类置信度及额外的定位置信度级联聚类对预测框进行非极大值抑制，提高在检测阶段对选择目标的定位精度. 实验结果表明，EL-SSD算法在PASCAL VOC2007上的平均检测均值达到79.8%，比原SSD算法提高了2.6%. 在COCO数据集上的精度达到29.4%，比原SSD算法提高了3.5%，在检测图片上的目标定位效果及小目标检测效果明显优于SSD, 适用于需要高定位性能的实时应用场景.

关键词： 目标检测, 单阶段多边框检测算法, 特征融合, 非极大值抑制, 定位置信度

Fig.1 Comparison of popular improved SSD network structure

Fig.2 Predicted classification scores of bounding boxes and IoU with ground-truth

Fig.3 Network structure of EL-SSD

Fig.4 Schematic diagram of CBFN network structure

Fig.5 Schematic diagram of CEA module network structure

Fig.6 Schematic diagram of CEA module weight redistribution process

Fig.7 Heat map of same level of SSD and EL-SSD detection feature map

Fig.8 Detection head with localization confidence prediction branch

Fig.9 Schematic diagram of IoU confidence prediction error

Fig.10 Effect diagram of DIoU

Tab.1 Comparison of mean average precision on VOC2007 test dataset

Fig.11 Histogram of 20 types of target detection precision results in VOC2007 test dataset

Tab.2 Comparison of mean average precision of location prediction effects for small target categories

Tab.3 Experiment results of EL-SSD on COCO dataset

Fig.12 Comparison of detection results on COCO dataset

Tab.4 Ablation experiment results of EL-SSD on VOC2007 test dataset

Fig.13 Line graph of positioning accuracy of three NMS algorithms in range of [0.43, 0.48]

Fig.14 Comparison of test results of SSD using different NMS


[1]	高文, 汤洋, 朱明复杂背景下目标检测的级联分类器算法研究[J]. 物理学报, 2014, 63 (9): 156- 164 GAO Wen, TANG Yang, ZHU Ming Study on the cascade classifier in target detection under complex background[J]. Acta Physica Sinica, 2014, 63 (9): 156- 164

[2]	王浩, 单文静, 方宝富基于多层上下文卷积神经网络的目标检测算法[J]. 模式识别与人工智能, 2020, 33 (2): 113- 120 WANG Hao, SHAN Wen-jing, FANG Bao-fu Multi-layers context convolutional neural network for object detection[J]. Pattern Recognition and Artificial Intelligence, 2020, 33 (2): 113- 120

[3]	REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149

[4]	尉婉青, 禹晶, 柏鳗晏, 等 SSD与时空特征融合的视频目标检测[J]. 中国图象图形学报, 2021, 26 (3): 542- 555 WEI Wan-qing, YU Jing, BAI Man-yan, et al Video object detection using fusion of SSD and spatiotemporal features[J]. Journal of Image and Graphics, 2021, 26 (3): 542- 555 doi: 10.11834/jig.200020

[5]	徐利锋, 黄海帆, 丁维龙, 等基于改进DenseNet的水果小目标检测[J]. 浙江大学学报: 工学版, 2021, 55 (2): 377- 385 XU Li-feng, HUANG Hai-fan, DING Wei-long, et al Detection of small fruit target based on improved DenseNet[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (2): 377- 385

[6]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2016: 779–788.

[7]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.

[8]	REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018-08-08). https://arxiv.org/pdf/1804.02767.pdf.

[9]	SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651 doi: 10.1109/TPAMI.2016.2572683

[10]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.

[11]	ZHANG S F, CHI C, YAO Y Q, el al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756-9765.

[12]	LIN T Y, DOLLAR P, GIRSGICK, et al. Feature pyramid networks for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.

[13]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.

[14]	郑浦, 白宏阳, 李伟, 等复杂背景下的小目标检测算法[J]. 浙江大学学报: 工学版, 2020, 54 (9): 1777- 1784 ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784

[15]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23). https://arxiv.org/pdf/1701.06659.pdf.

[16]	LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. (2017-12-04). https://arxiv.org/pdf/1712.00960.pdf.

[17]	SHEN Z Q, LIU Z, LI J G, et al. DSOD: learning deeply supervised object detectors from Scratch [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1937-1945.

[18]	ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4203-4212.

[19]	HE Y X, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2883-2892.

[20]	BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562-5570.

[21]	LIU S T, HUANG D, WANG Y H. Adaptive NMS: refining pedestrian detection in a crowd [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6452-6461.

[22]	LUO Y H, CAO X, ZHANG J T, et al. CE-FPN: enhancing channel information for object detection [EB/OL]. (2021-03-09). https://arxiv.org/pdf/2103.10643.pdf.

[23]	PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: towards balanced learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 821-830.

[24]	GUO C X, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.

[25]	WANG K X, LIEW J H, ZHOU D Q, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9196-9205.

[26]	TAN X M, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.

[27]	CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.

[28]	HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023 doi: 10.1109/TPAMI.2019.2913372

[29]	HOU Q B, ZHOU D Q, FENG J S, et al. Coordinate attention for efficient mobile network design [C]//IEEE International Conference on Computer Vision. [S. l.]: IEEE, 2021: 13708-13717.

[30]	HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.

[31]	JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection [C]// European Conference on Computer Vision. Munich: Springer, 2018: 816-832.

[32]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9626-9635.

[33]	LI X, WANG W H, WU L J, et al Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002- 21012

[34]	WU S K, LI X P, WANG X G IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97: 103911

[35]	ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: faster and better learning for bounding box regression[C]// AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12993-13000.

[36]	HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.

[37]	CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.

[1]	Jing-hui CHU,Li-dong SHI,Pei-guang JING,Wei LV. Context-aware knowledge distillation network for object detection[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 503-509.

[2]	Xin-yu HUANG,Fan YOU,Pei ZHANG,Zhao ZHANG,Bai-li ZHANG,Jian-hua LV,Li-zhen XU. Silent liveness detection algorithm based on multi classification and feature fusion network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 263-270.

[3]	Kai DU,Guo-rong ZHU,Jiang-hua LU,Mu-ye PANG. Metal object detection method in wireless electric vehicle charging system[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 56-62.

[4]	Zhi-chao CHEN,Hai-ning JIAO,Jie YANG,Hua-fu ZENG. Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1490-1499.

[5]	Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.

[6]	Yue-lin CHEN,Wen-jing TIAN,Xiao-dong CAI,Shu-ting ZHENG. Text matching model based on dense connection networkand multi-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2352-2358.

[7]	Xue-yun CHEN,Jin XIA,Ke DU. Overhead transmission line detection based on multiple linear-feature enhanced detector[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2382-2389.

[8]	Ying-jie NIU,Yan-chen SU,Dun-cheng CHENG,Jia LIAO,Hai-bo ZHAO,Yong-qiang GAO. High-speed rail contact network U-holding nut fault detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1912-1921.

[9]	Pu ZHENG,Hong-yang BAI,Wei LI,Hong-wei GUO. Small target detection algorithm in complex background[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1777-1784.

[10]	Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.

[11]	Ying-jie XIA,Cong-yu OUYANG. Dynamic image background modeling method for detecting abandoned objects in highway[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1249-1255.

[12]	Chen-bin ZHENG,Yong ZHANG,Hang HU,Ying-rui WU,Guang-jing HUANG. Object detection enhanced context model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 529-539.

[13]	Yao JIN,Wei ZHANG. Real-time fire detection algorithm with Anchor-Free network architecture[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2430-2436.

[14]	Hong LIN,Yao-yao LU. Scale differentiated text detection method focusing on hard examples[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1506-1516.

[15]	Xiang-hao CHENG,Fei-peng DA,Liang WANG. Feature fusion based constrained local model for three-dimensional facial landmark localization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(4): 770-776.

Viewed

Full text

Abstract

Cited

Shared

Discussed