Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (4): 783-794    DOI: 10.3785/j.issn.1008-973X.2022.04.018
计算机技术、信息工程     
基于优化预测定位的单阶段目标检测算法
张娜1(),戚旭磊1,包晓安1,*(),吴彪1,涂小妹2,金瑜婷2
1. 浙江理工大学 信息学院,浙江 杭州 310018
2. 浙江广厦建设职业技术大学 信息学院,浙江 东阳 322100
Single-stage object detection algorithm based on optimizing position prediction
Na ZHANG1(),Xu-lei QI1,Xiao-an BAO1,*(),Biao WU1,Xiao-mei TU2,Yu-ting JIN2
1. School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Information Science and Technology, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China
 全文: PDF(1624 KB)   HTML
摘要:

针对单阶段多边框检测(SSD)算法中存在目标定位不准确和小目标检测精度不高的问题,提出基于优化预测定位的单阶段目标检测算法EL-SSD. 通过双向加权特征金字塔将原SSD预测特征图特征融合,对输出特征图进行特征位置信息解码后进行特征通道权重再分配,提升了特征语义信息,捕获了跨通道位置信息. 通过构建分类置信度及额外的定位置信度级联聚类对预测框进行非极大值抑制,提高在检测阶段对选择目标的定位精度. 实验结果表明,EL-SSD算法在PASCAL VOC2007上的平均检测均值达到79.8%,比原SSD算法提高了2.6%. 在COCO数据集上的精度达到29.4%,比原SSD算法提高了3.5%,在检测图片上的目标定位效果及小目标检测效果明显优于SSD, 适用于需要高定位性能的实时应用场景.

关键词: 目标检测单阶段多边框检测算法特征融合非极大值抑制定位置信度    
Abstract:

A single-stage object detection algorithm named EL-SSD based on optimizing position prediction was proposed aiming at the problem of inaccurate target positioning and low accuracy of small object detection in the single shot multi-box detector (SSD) algorithm. The prediction feature maps from original SSD were decoded according to feature location information after feature fusion by bi-directional feature pyramid network. Then the weights of feature channels were redistributed. The feature semantic information was improved, and the cross-channel location information was captured. The classification confidence and additional fixed position confidence cascade clustering were constructed to non-maximum suppress prediction bounding box at the detection stage, which improved the positioning accuracy of the selected target. The experimental results showed that the mean average precision value of the EL-SSD on the PASCAL VOC2007 dataset was 79.8%, which was 2.6% higher than that of the original SSD algorithm. The mean average precision value of the EL-SSD on the COCO dataset was 29.4%, which was 3.5% higher than that of the original SSD algorithm. The improved SSD algorithm has a better location performance and small target detection performance, which is suitable for application scenarios that require high localization performance.

Key words: object detection    single shot multi-box detector algorithm    feature fusion    non-maximum suppression    localization confidence
收稿日期: 2021-05-24 出版日期: 2022-04-24
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(6207050141); 浙江省重点研发计划资助项目(2020C03094)
通讯作者: 包晓安     E-mail: zhangna@zstu.edu.cn;baoxiaoan@zstu.edu.cn
作者简介: 张娜(1977—),女,副教授,从事计算机视觉与智能信息处理的研究. orcid.org/0000-0001-5131-6417. E-mail: zhangna@zstu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张娜
戚旭磊
包晓安
吴彪
涂小妹
金瑜婷

引用本文:

张娜,戚旭磊,包晓安,吴彪,涂小妹,金瑜婷. 基于优化预测定位的单阶段目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(4): 783-794.

Na ZHANG,Xu-lei QI,Xiao-an BAO,Biao WU,Xiao-mei TU,Yu-ting JIN. Single-stage object detection algorithm based on optimizing position prediction. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 783-794.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.04.018        https://www.zjujournals.com/eng/CN/Y2022/V56/I4/783

图 1  主流的SSD改进网络结构对比
图 2  检测框的预测分类分数及与真实框IoU
图 3  EL-SSD网络结构
图 4  CBFN网络结构示意图
图 5  CEA模块网络结构的示意图
图 6  CEA模块权重再分配过程的示意图
图 7  同一级别的SSD与EL-SSD检测特征层的热力图
图 8  添加定位置信度预测分支的检测模块
图 9  IoU置信度预测误差的示意图
图 10  DIoU效果示意图
方法 骨干网络 输入尺寸 GPU mAP/% v/(帧·s?1)
Faster R-CNN[3] VGGNet 1000×600 Titan X 73.2 7.0
Faster R-CNN[3] ResNet-101 1000×600 1080Ti 78.8 2.3
Mask R-CNN[36] ResNet-50 1000×600 1080Ti 77.4 4.2
Cascade R-CNN[37] VGGNet 1000×600 1080Ti 79.6 5.3
YOLOV2[6] Darknet-19 352×352 Titan X 73.7 81.0
RefineDet320[18] VGGNet 320×320 1080Ti 80.0 22.1
FCOS[32] ResNet-50 1333×800 1080Ti 73.5 17.6
ATSS[11] ResNet-50 1333×800 1080Ti 75.2 14.9
RetinaNet400[10] ResNet-101 ~640x400 1080Ti 79.4 12.4
FSSD300[16] VGGNet 300×300 1080Ti 78.8 65.0
SSD300[13] VGGNet 300×300 1080Ti 77.2 42.1
ASSD321[38] ResNet-101 321×321 K40 79.5 11.4
DSSD321[15] ResNet-101 321×321 Titan X 78.6 9.5
EL-SSD300 VGGNet 300×300 1080Ti 79.8 27.0
表 1  VOC2007测试集上平均检测精度的对比
图 11  VOC2007测试集20类目标检测精度结果的柱状图
%
方法 IoUt = 0.50 IoUt = 0.75 IoUt = 0.95
Plant Bottle Plant Bottle Plant Bottle
SSD 47.5 50.4 17.4 21.1 0.8 0.1
EL-SSD 52.0 56.6 19.6 26.7 2.1 0.9
表 2  小目标类别位置预测平均精度的对比
方法 mAP/% mAP/%
IoUt = 0.5~0.95 IoUt = 0.5 IoUt = 0.75 S M L
SSD 25.9 44.3 25.4 6.2 26.0 41.5
EL-SSD 29.4 47.2 30.6 10.3 31.6 45.7
表 3  EL-SSD在COCO数据集上实验结果
图 12  COCO数据集检测效果的对比
mAP/% +FPN +CBFN
(无 CEA)
+CEA +OPS-NMS
(无 DIoU)
+DIoU
77.6
77.8
78.0
78.6
78.6
77.7
78.9
79.5
79.8
表 4  EL-SSD在VOC2007测试集上消融实验结果
图 13  在[0.43, 0.48]区间下3种NMS的定位精度折线图
图 14  SSD使用不同NMS的检测结果对比
1 高文, 汤洋, 朱明 复杂背景下目标检测的级联分类器算法研究[J]. 物理学报, 2014, 63 (9): 156- 164
GAO Wen, TANG Yang, ZHU Ming Study on the cascade classifier in target detection under complex background[J]. Acta Physica Sinica, 2014, 63 (9): 156- 164
2 王浩, 单文静, 方宝富 基于多层上下文卷积神经网络的目标检测算法[J]. 模式识别与人工智能, 2020, 33 (2): 113- 120
WANG Hao, SHAN Wen-jing, FANG Bao-fu Multi-layers context convolutional neural network for object detection[J]. Pattern Recognition and Artificial Intelligence, 2020, 33 (2): 113- 120
3 REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (6): 1137- 1149
4 尉婉青, 禹晶, 柏鳗晏, 等 SSD与时空特征融合的视频目标检测[J]. 中国图象图形学报, 2021, 26 (3): 542- 555
WEI Wan-qing, YU Jing, BAI Man-yan, et al Video object detection using fusion of SSD and spatiotemporal features[J]. Journal of Image and Graphics, 2021, 26 (3): 542- 555
doi: 10.11834/jig.200020
5 徐利锋, 黄海帆, 丁维龙, 等 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报: 工学版, 2021, 55 (2): 377- 385
XU Li-feng, HUANG Hai-fan, DING Wei-long, et al Detection of small fruit target based on improved DenseNet[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (2): 377- 385
6 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C] // IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2016: 779–788.
7 REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6517-6525.
8 REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018-08-08). https://arxiv.org/pdf/1804.02767.pdf.
9 SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
doi: 10.1109/TPAMI.2016.2572683
10 LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999-3007.
11 ZHANG S F, CHI C, YAO Y Q, el al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756-9765.
12 LIN T Y, DOLLAR P, GIRSGICK, et al. Feature pyramid networks for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
13 LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision. Amsterdam: Springer, 2016: 21–37.
14 郑浦, 白宏阳, 李伟, 等 复杂背景下的小目标检测算法[J]. 浙江大学学报: 工学版, 2020, 54 (9): 1777- 1784
ZHENG Pu, BAI Hong-yang, LI Wei, et al Small target detection algorithm in complex background[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (9): 1777- 1784
15 FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23). https://arxiv.org/pdf/1701.06659.pdf.
16 LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. (2017-12-04). https://arxiv.org/pdf/1712.00960.pdf.
17 SHEN Z Q, LIU Z, LI J G, et al. DSOD: learning deeply supervised object detectors from Scratch [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1937-1945.
18 ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4203-4212.
19 HE Y X, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2883-2892.
20 BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: improving object detection with one line of code [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 5562-5570.
21 LIU S T, HUANG D, WANG Y H. Adaptive NMS: refining pedestrian detection in a crowd [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6452-6461.
22 LUO Y H, CAO X, ZHANG J T, et al. CE-FPN: enhancing channel information for object detection [EB/OL]. (2021-03-09). https://arxiv.org/pdf/2103.10643.pdf.
23 PANG J M, CHEN K, SHI J P, et al. Libra R-CNN: towards balanced learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 821-830.
24 GUO C X, FAN B, ZHANG Q, et al. AugFPN: improving multi-scale feature learning for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 12592-12601.
25 WANG K X, LIEW J H, ZHOU D Q, et al. PANet: few-shot image semantic segmentation with prototype alignment [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9196-9205.
26 TAN X M, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10778-10787.
27 CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1800-1807.
28 HE J, SHEN L, ALBANIE S, et al Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (8): 2011- 2023
doi: 10.1109/TPAMI.2019.2913372
29 HOU Q B, ZHOU D Q, FENG J S, et al. Coordinate attention for efficient mobile network design [C]//IEEE International Conference on Computer Vision. [S. l.]: IEEE, 2021: 13708-13717.
30 HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
31 JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection [C]// European Conference on Computer Vision. Munich: Springer, 2018: 816-832.
32 TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 9626-9635.
33 LI X, WANG W H, WU L J, et al Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002- 21012
34 WU S K, LI X P, WANG X G IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97: 103911
35 ZHENG Z, WANG P, LIU W, et al. Distance-IoU Loss: faster and better learning for bounding box regression[C]// AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12993-13000.
36 HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN [C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
37 CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6154-6162.
[1] 褚晶辉,史李栋,井佩光,吕卫. 适用于目标检测的上下文感知知识蒸馏网络[J]. 浙江大学学报(工学版), 2022, 56(3): 503-509.
[2] 黄新宇,游帆,张沛,张昭,张柏礼,吕建华,徐立臻. 基于多分类及特征融合的静默活体检测算法[J]. 浙江大学学报(工学版), 2022, 56(2): 263-270.
[3] 陈智超,焦海宁,杨杰,曾华福. 基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报(工学版), 2021, 55(8): 1490-1499.
[4] 周金海,周世镒,常阳,吴耿俊,王依川. 基于超宽带雷达基带信号的多人目标跟踪[J]. 浙江大学学报(工学版), 2021, 55(6): 1208-1214.
[5] 徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[6] 陈岳林,田文靖,蔡晓东,郑淑婷. 基于密集连接网络和多维特征融合的文本匹配模型[J]. 浙江大学学报(工学版), 2021, 55(12): 2352-2358.
[7] 陈雪云,夏瑾,杜珂. 基于多线型特征增强网络的架空输电线检测[J]. 浙江大学学报(工学版), 2021, 55(12): 2382-2389.
[8] 牛英杰,苏燕辰,程敦诚,廖家,赵海波,高永强. 高铁接触网U型抱箍螺母故障检测算法[J]. 浙江大学学报(工学版), 2021, 55(10): 1912-1921.
[9] 郑浦,白宏阳,李伟,郭宏伟. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784.
[10] 周登文,田金月,马路遥,孙秀秀. 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报(工学版), 2020, 54(8): 1516-1524.
[11] 张峻宁,苏群星,刘鹏远,王正军,谷宏强. 基于空间约束的自适应单目3D物体检测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1138-1146.
[12] 郑晨斌,张勇,胡杭,吴颖睿,黄广靖. 目标检测强化上下文模型[J]. 浙江大学学报(工学版), 2020, 54(3): 529-539.
[13] 晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.
[14] 林泓,卢瑶瑶. 聚焦难样本的区分尺度的文字检测方法[J]. 浙江大学学报(工学版), 2019, 53(8): 1506-1516.
[15] 成翔昊,达飞鹏,汪亮. 基于融合约束局部模型的三维人脸特征点定位[J]. 浙江大学学报(工学版), 2019, 53(4): 770-776.