|
|
Single-stage object detection algorithm based on optimizing position prediction |
Na ZHANG1(),Xu-lei QI1,Xiao-an BAO1,*(),Biao WU1,Xiao-mei TU2,Yu-ting JIN2 |
1. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China 2. School of Information Science and Technology, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China |
|
|
Abstract A single-stage object detection algorithm named EL-SSD based on optimizing position prediction was proposed aiming at the problem of inaccurate target positioning and low accuracy of small object detection in the single shot multi-box detector (SSD) algorithm. The prediction feature maps from original SSD were decoded according to feature location information after feature fusion by bi-directional feature pyramid network. Then the weights of feature channels were redistributed. The feature semantic information was improved, and the cross-channel location information was captured. The classification confidence and additional fixed position confidence cascade clustering were constructed to non-maximum suppress prediction bounding box at the detection stage, which improved the positioning accuracy of the selected target. The experimental results showed that the mean average precision value of the EL-SSD on the PASCAL VOC2007 dataset was 79.8%, which was 2.6% higher than that of the original SSD algorithm. The mean average precision value of the EL-SSD on the COCO dataset was 29.4%, which was 3.5% higher than that of the original SSD algorithm. The improved SSD algorithm has a better location performance and small target detection performance, which is suitable for application scenarios that require high localization performance.
|
Received: 24 May 2021
Published: 24 April 2022
|
|
Fund: 国家自然科学基金资助项目(6207050141); 浙江省重点研发计划资助项目(2020C03094) |
Corresponding Authors:
Xiao-an BAO
E-mail: zhangna@zstu.edu.cn;baoxiaoan@zstu.edu.cn
|
基于优化预测定位的单阶段目标检测算法
针对单阶段多边框检测(SSD)算法中存在目标定位不准确和小目标检测精度不高的问题,提出基于优化预测定位的单阶段目标检测算法EL-SSD. 通过双向加权特征金字塔将原SSD预测特征图特征融合,对输出特征图进行特征位置信息解码后进行特征通道权重再分配,提升了特征语义信息,捕获了跨通道位置信息. 通过构建分类置信度及额外的定位置信度级联聚类对预测框进行非极大值抑制,提高在检测阶段对选择目标的定位精度. 实验结果表明,EL-SSD算法在PASCAL VOC2007上的平均检测均值达到79.8%,比原SSD算法提高了2.6%. 在COCO数据集上的精度达到29.4%,比原SSD算法提高了3.5%,在检测图片上的目标定位效果及小目标检测效果明显优于SSD, 适用于需要高定位性能的实时应用场景.
关键词:
目标检测,
单阶段多边框检测算法,
特征融合,
非极大值抑制,
定位置信度
|
|
[1] |
高文, 汤洋, 朱明 复杂背景下目标检测的级联分类器算法研究[J]. 物理学报, 2014, 63 (9): 156- 164 GAO Wen, TANG Yang, ZHU Ming Study on the cascade classifier in target detection under complex background[J]. Acta Physica Sinica, 2014, 63 (9): 156- 164
|
|
|
[2] |
王浩, 单文静, 方宝富 基于多层上下文卷积神经网络的目标检测算法[J]. 模式识别与人工智能, 2020, 33 (2): 113- 120 WANG Hao, SHAN Wen-jing, FANG Bao-fu Multi-layers context convolutional neural network for object detection[J]. Pattern Recognition and Artificial Intelligence, 2020, 33 (2): 113- 120
|
|
|
[3] |
REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
|
|
|
[4] |
尉婉青, 禹晶, 柏鳗晏, 等 SSD与时空特征融合的视频目标检测[J]. 中国图象图形学报, 2021, 26 (3): 542- 555 WEI Wan-qing, YU Jing, BAI Man-yan, et al Video object detection using fusion of SSD and spatiotemporal features[J]. Journal of Image and Graphics, 2021, 26 (3): 542- 555
doi: 10.11834/jig.200020
|
|
|
[5] |
徐利锋, 黄海帆, 丁维龙, 等 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报: 工学版, 2021, 55 (2): 377- 385 XU Li-feng, HUANG Hai-fan, DING Wei-long, et al Detection of small fruit target based on improved DenseNet[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (2): 377- 385
|
|
|
[6] |
REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2016: 779–788.
|
|
|
[7] |
REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
|
|
|
[8] |
REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018-08-08). https://arxiv.org/pdf/1804.02767.pdf.
|
|
|
[9] |
SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
doi: 10.1109/TPAMI.2016.2572683
|
|
|
[10] |
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
|
|
|
[11] |
ZHANG S F, CHI C, YAO Y Q, el al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756-9765.
|
|
|
[12] |
LIN T Y, DOLLAR P, GIRSGICK, et al. Feature pyramid networks for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
|
|
|
[13] |
LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.
|
|
|
[14] |
郑浦, 白宏阳, 李伟, 等 复杂背景下的小目标检测算法[J]. 浙江大学学报: 工学版, 2020, 54 (9): 1777- 1784 ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784
|
|
|
[15] |
FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23). https://arxiv.org/pdf/1701.06659.pdf.
|
|
|
[16] |
LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. (2017-12-04). https://arxiv.org/pdf/1712.00960.pdf.
|
|
|
[17] |
SHEN Z Q, LIU Z, LI J G, et al. DSOD: learning deeply supervised object detectors from Scratch [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1937-1945.
|
|
|
[18] |
ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4203-4212.
|
|
|
[19] |
HE Y X, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2883-2892.
|
|
|
[20] |
BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562-5570.
|
|
|
[21] |
LIU S T, HUANG D, WANG Y H. Adaptive NMS: refining pedestrian detection in a crowd [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6452-6461.
|
|
|
[22] |
LUO Y H, CAO X, ZHANG J T, et al. CE-FPN: enhancing channel information for object detection [EB/OL]. (2021-03-09). https://arxiv.org/pdf/2103.10643.pdf.
|
|
|
[23] |
PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: towards balanced learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 821-830.
|
|
|
[24] |
GUO C X, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.
|
|
|
[25] |
WANG K X, LIEW J H, ZHOU D Q, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9196-9205.
|
|
|
[26] |
TAN X M, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
|
|
|
[27] |
CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.
|
|
|
[28] |
HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
|
|
|
[29] |
HOU Q B, ZHOU D Q, FENG J S, et al. Coordinate attention for efficient mobile network design [C]//IEEE International Conference on Computer Vision. [S. l.]: IEEE, 2021: 13708-13717.
|
|
|
[30] |
HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
|
|
|
[31] |
JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection [C]// European Conference on Computer Vision. Munich: Springer, 2018: 816-832.
|
|
|
[32] |
TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9626-9635.
|
|
|
[33] |
LI X, WANG W H, WU L J, et al Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002- 21012
|
|
|
[34] |
WU S K, LI X P, WANG X G IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97: 103911
|
|
|
[35] |
ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: faster and better learning for bounding box regression[C]// AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12993-13000.
|
|
|
[36] |
HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
|
|
|
[37] |
CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|