目标检测强化上下文模型

doi:10.3785/j.issn.1008-973X.2020.03.013

浙江大学学报(工学版)

2020, Vol. 54

Issue (3): 529-539 DOI: 10.3785/j.issn.1008-973X.2020.03.013

计算机技术与图像处理

目标检测强化上下文模型

郑晨斌1(

),张勇1,*(

),胡杭2,吴颖睿1,黄广靖3

1. 北京航空航天大学仪器科学与光电工程学院，北京 100191
2. 解放军66133部队，北京 100144
3. 北京航空航天大学航空科学与工程学院，北京 100191

Object detection enhanced context model

Chen-bin ZHENG1(

),Yong ZHANG1,*(

),Hang HU2,Ying-rui WU1,Guang-jing HUANG3

1. School of Instrumetation and Optoelectronic Engineering, Beihang University, Beijing 100191, China
2. Unit 66133 of PLA, Beijing 100144, China
3. School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China

全文: PDF(1492 KB) HTML

摘要：

强化上下文模型中的强化上下文模块（ECM）利用双空洞卷积结构，在节省参数量的同时，通过扩大有效感受野来强化浅层上下文信息，并在较少破坏原始SSD网络的基础上灵活作用于网络中浅预测层，形成强化上下文模型网络（ECMNet）. 当输入图像大小为300×300时，在PASCAL VOC2007测试集上，ECMNet获得的均值平均精度为80.52%，在1080Ti上的速度为73.5 帧/s. 实验结果表明，ECMNet能有效强化上下文信息，并在参数量、速度和精度上达到较优权衡，优于许多先进的目标检测器.

关键词： 目标检测; 上下文信息; 有效感受野; 强化上下文模块（ECM）; 一阶段目标检测器

Abstract:

Double-atrous convolution structure was used in enhanced context module (ECM) of the enhanced context model to reduce parameters while expanding effective receptive field to enhance context information of shallow layers, and ECM flexibly acted on middle shallow prediction layers with less damage to original SSD, forming enhanced context model net (ECMNet). Using input image with size of 300×300, ECMNet obtained mean average precision of 80.52% on PASCAL VOC2007 test set, and achieved 73.5 frames per second on 1080Ti. The experimental results show that ECMNet can effectively enhance context information and achieves a better trade-off in parameter, speed and accuracy, which is superior to many state-of-the-art object detectors.

Key words: object detection context information effective receptive field enhanced context module (ECM) one-stage object detector

收稿日期: 2019-03-01 出版日期: 2020-03-05

CLC:

TP 391

通讯作者: 张勇 E-mail: 13171087@buaa.edu.cn;06952@buaa.edu.cn

作者简介: 郑晨斌（1995—），男，硕士生，从事深度学习、图像处理研究. orcid.org/0000-0002-6413-613X. E-mail： 13171087@buaa.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	郑晨斌
	张勇
	胡杭
	吴颖睿
	黄广靖

引用本文:

郑晨斌,张勇,胡杭,吴颖睿,黄广靖. 目标检测强化上下文模型[J]. 浙江大学学报(工学版), 2020, 54(3): 529-539.

Chen-bin ZHENG,Yong ZHANG,Hang HU,Ying-rui WU,Guang-jing HUANG. Object detection enhanced context model. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 529-539.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.03.013 或 http://www.zjujournals.com/eng/CN/Y2020/V54/I3/529

图 1 3种独立模块的具体网络结构图

图 2 双空洞卷积结构及其理论感受野

图 3 强化上下文模型网络架构

图 4 小角度旋转变换举例和原理图示意

表 1 3种独立模块在VOC2007测试集上的测试结果

图 5 5类网络模型在2种预测层上的有效感受野

表 2 ECM作用下的特征图范围测试结果

表 4 各网络参数量及VOC2007测试集上的均值平均精度

表 3 小角度旋转变换在VOC2007测试集上的测试结果

表 5 各目标检测器在VOC2007测试集上的检测结果

表 6 各目标检测器在VOC2007测试集上的完整检测结果

图 6 ECMNet（偶数行）和SSD（奇数行）在VOC2007测试集上的部分检测结果

图 7 ECMNet在VOC2007测试集上的更多可视化结果

1	LIU S T, HUANG D, WANG Y H. Receptive field block net for accurate and fast object detection [C] // European Conference on Computer Vision. Munich: Springer, 2018: 404-418.
2	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C] // European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37.
3	LUO W J, LI Y J, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks [C] // Neural Information Processing Systems. Barcelona: [s. n.], 2016: 4898-4906.
4	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C] // Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944.
5	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection [EB/OL]. (2017-05-26)[2019-02-26]. https://arxiv.xilesou.top/abs/1705.09587.
6	LI Z X, ZHOU F Q. FSSD: feature fusion single shot multibox detector [EB/OL]. (2018-05-17)[2019-02-26]. https://arxiv.org/abs/1712.00960.
7	SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39 (4): 640- 651
8	BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495 doi: 10.1109/TPAMI.2016.2644615
9	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network [C] // Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.
10	CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
11	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-25)[2019-02-26]. https://arxiv.org/abs/1706.05587.
12	FU C, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. (2017-01-23)[2019-02-26]. https://arxiv.org/abs/1701.06659.
13	WANDELL B A, WINAWER J Computational neuroimaging and population receptive fields[J]. Trends in Cognitive Sciences, 2015, 19 (6): 349- 357 doi: 10.1016/j.tics.2015.03.009
14	REDMON J, FARHADI A. Farhadi. YOLO9000: better, faster, stronger [C] // Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7263-7271.
15	REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018-04-08)[2019-02-26]. https://arxiv.org/abs/1804.02767.
16	HE K M, ZHANG X Y, REN S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification [C] // International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
17	EVERINGHAM M, GOOL L V, WILLIAMS C K I, et al The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338 doi: 10.1007/s11263-009-0275-4
18	HUANG J, RATHOD V, SUN C, M, et al. Speed/accuracy trade-offs for modern convolutional object detectors [C] // Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3296-3297.
19	REN S Q, HE K M, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149 doi: 10.1109/TPAMI.2016.2577031
20	BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks [C] // Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2874-2883.
21	DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks [C] // Neural Information Processing Systems. Barcelona: [s. n.], 2016: 379-387.
22	ZHU Y S, ZHAO C Y, WANG J Q, et al. CoupleNet: coupling global structure with local parts for object detection [C] // International Conference on Computer Vision. Venice: IEEE, 2017: 4146-4154.
23	SHEN Z Q, LIU Z, LI J G, et al. DSOD: learning deeply supervised object detectors from scratch [C] // International Conference on Computer Vision. Venice: IEEE, 2017: 1937-1945.

[1]	徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[2]	郑浦,白宏阳,李伟,郭宏伟. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784.
[3]	张峻宁,苏群星,刘鹏远,王正军,谷宏强. 基于空间约束的自适应单目3D物体检测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1138-1146.
[4]	晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.
[5]	林志洁,罗壮,赵磊,鲁东明. 特征金字塔多尺度全卷积目标检测算法[J]. 浙江大学学报(工学版), 2019, 53(3): 533-540.
[6]	叶芳芳,许力. 实时的静止目标与鬼影检测及判别方法[J]. 浙江大学学报(工学版), 2015, 49(1): 181-185.
[7]	刘辉涛,汪李明,李建龙. 声纳强脉冲干扰的自适应抵消方法[J]. J4, 2011, 45(3): 515-519.
[8]	刘士荣, 王凯, 邱雪娜. 基于自适应混合高斯模型全方位视觉目标检测[J]. J4, 2010, 44(7): 1387-1393.
[9]	刘佳于慧敏. 基于水平集的运动目标检测和速度估算[J]. J4, 2009, 43(2): 244-249.
[10]	赵云峰陈隆道孟庆利. 提高红外经纬仪跟踪弱小目标精度的新算法[J]. J4, 2008, 42(7): 1169-1173.
[11]	楼斌沈海斌严晓浪. 基于运动目标检测的自嵌入视频水印[J]. J4, 2008, 42(3): 382-386.

Viewed

Full text

Abstract

Cited

Shared

Discussed