基于多尺度融合与注意力机制的小目标车辆检测

doi:10.3785/j.issn.1008-973X.2022.11.015

浙江大学学报(工学版)

2022, Vol. 56

Issue (11): 2241-2250 DOI: 10.3785/j.issn.1008-973X.2022.11.015

计算机技术

基于多尺度融合与注意力机制的小目标车辆检测

李凯(

),林宇舜*(

),吴晓琳,廖飞宇

福建农林大学交通与土木工程学院，福建福州 350108

Small target vehicle detection based on multi-scale fusion technology and attention mechanism

Kai LI(

),Yu-shun LIN*(

),Xiao-lin WU,Fei-yu LIAO

School of Transportation and Civil Engineering, Fujian Agriculture and Forestry University, Fuzhou 350108, China

全文: PDF(2082 KB) HTML

摘要：

针对传统目标检测算法（SSD）检测小目标精度低的问题，提出基于注意力机制与多尺度信息融合方法并将其运用于车辆检测任务. 结合浅层特征图与深层特征图的优势，小目标检测分支和大中型目标检测分支的特征图采用5支路和2支路融合. 在基础网络层之间加入注意力机制模块，模型会关注包含更多信息量的通道. 实验结果表明，在自建车辆数据集上的均值平均精度（mAP）达到90.2%，比传统SSD算法提高了10.0%，其中小目标检测精度提高了17.9%；在PASCAL VOC 2012数据集上的类别平均精度mAP为83.1%，比主流的YOLOv5算法提高了6.4%. 此外，提出算法在GTX1 660 Ti PC端的检测速度可以达到25 帧/s，能够满足实时性的需求.

关键词： SSD; 特征金字塔; 多尺度融合; 注意力机制; 车辆检测

Abstract:

A method based on attention mechanism and multi-scale information fusion was proposed to resovle the problem of low accuracy of the traditional single shot multibox detector (SSD) algorithm in detecting small targets. The algorithm was applied to the vehicle detection task. The feature maps of the target detection branch were fused with 5 branches and 2 branches respectively, combining the advantages of the shallow feature map and the deep feature map. The attention mechanism module was added between the basic network layers to make the model pay attention to the channels containing more information. Experimental results showed that the mean average precision of the self-built vehicle data set reached 90.2%, which was 10.0% higher than the traditional SSD algorithm. The detection accuracy of small objects was improved by 17.9%. The mAP on the PASCAL VOC 2012 dataset was 83.1%, which was 6.4% higher than the current mainstream YOLOv5 algorithm. The detection speed of proposed algorithm on the GTX1 660 Ti PC reached 25 frame/s, which satisfied the demand of real-time performance.

Key words: SSD FPN multi-scale fusion attention mechanism vehicle detection

收稿日期: 2021-11-23 出版日期: 2022-12-02

CLC:

TP 751

基金资助: 福建省科技重大事项（2019HZ07011）；福建省自然科学基金资助项目（2020J05029）

通讯作者: 林宇舜 E-mail: 15733152192@163.com;lshun@fafu.edu.cn

作者简介: 李凯（1995—），男，硕士生，从事计算视觉研究. orcid.org/0000-0001-6319-2984. E-mail: 15733152192@163.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	李凯
	林宇舜
	吴晓琳
	廖飞宇

引用本文:

李凯,林宇舜,吴晓琳,廖飞宇. 基于多尺度融合与注意力机制的小目标车辆检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2241-2250.

Kai LI,Yu-shun LIN,Xiao-lin WU,Fei-yu LIAO. Small target vehicle detection based on multi-scale fusion technology and attention mechanism. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2241-2250.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.11.015 或 https://www.zjujournals.com/eng/CN/Y2022/V56/I11/2241

图 1 基于SE模块与多尺度特征融合技术的小目标检测方法网络结构图

图 2 基于SENet的注意力机制模块

图 3 Conv7~Conv11_2特征融合图

图 4 Conv4_3特征融合

图 5 自建车辆数据集样例图

图 6 大、中、小目标检测示意图

表 1 自建车辆数据集各方法测试结果

表 2 自建车辆数据集各类方法测试性能对比

表 3 Conv4_3层不同融合方式性能对比

图 7 SSD与所提方法测试结果对比图

表 4 各类方法在PASCAL VOC数据测试中的AP值

图 8 SSD算法与本文方法对小目标的检测结果对比图

表 5 各类方法在PASCAL VOC数据性能对比结果

1	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). San Diego: IEEE, 2005, 1: 886-893.
2	KUMAR P, HENIKOFF S, NG P C Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm[J]. Nature Protocols, 2009, 4 (8): 1073- 1081
3	CHERKASSKY V, MA Y Practical selection of SVM parameters and noise estimation for SVM regression[J]. Neural networks, 2004, 17 (1): 113- 126 doi: 10.1016/S0893-6080(03)00169-2
4	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
5	GIRSHICK R. Fast r-cnn [C]// Proceedings of the IEEE International Conference on Computer Vision. Boston: IEEE, 2015: 1440-1448.
6	REN S, HE K, GIRSHICK R, et al Faster r-cnn: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 1137- 1149
7	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society. Las Vegas: IEEE, 2016: 779-788.
8	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision, LNCS 9905. Berlin: Springer, 2016: 21-37.
9	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1701.06659v1.
10	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1712.00960.
11	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1705.09587.
12	李航, 朱明基于深度卷积神经网络的小目标检测算法[J]. 计算机工程与科学, 2020, 42 (4): 649- 657 LI Hang, ZHU Ming A small object detection algorithm based on deep convolutional neural network[J]. Computer Engineering and Science, 2020, 42 (4): 649- 657 doi: 10.3969/j.issn.1007-130X.2020.04.011
13	CHEN Yu-kang, ZHANG Pei-zhen, LI Ze-ming, et al. Dynamic scale training for object detection [EB/OL]. [2021-11-23].https://arxiv.org/abs/2004.12432v2.
14	LIU S, HUANG D, WANG Y. Learning spatial fusion for single shot object detection [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1911.09516.
15	ZOPH B, CUBUK E D, GHIASI G, et al. Learning data augmentation strategies for object detection [C]// European Conference on Computer Vision. Springer, Cham, 2020: 566-583.
16	WANG T, ANWER R M, CHOLAKKAL H, et al. Learning rich features at high-speed for single-shot object detection [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 1971-1980.
17	HU J, SHEN L, SUN G. Squeeze and excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 7132-7141.
18	DUAN K, BAI S, XIE L, et al. Centernet: keypoint triplets for object detection [C]// Proceedings of the IEEE/CVF international conference on computer vision. Long Beach: IEEE, 2019: 6569-6578.
19	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-yolov4: scaling cross stage partial network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13029-13038.
20	YANG G, FENG W, JIN J, et al. Face mask recognition system with YOLOV5 based on image recognition [C]// IEEE 6th International Conference on Computer and Communications (ICCC). Seattle: IEEE, 2020: 1398-1404.
21	梁鸿, 李洋, 邵明文, 等基于残差网络和改进特征金字塔的油田作业现场目标检测算法[J]. 科学技术与工程, 2020, 20 (11): 4442- 4450 LIANG Hong, LI Yang, SHAO Ming-wen, et al Field object detection for oilfield operation based on residual network and improved feature pyramid networks[J]. Science Technology and Engineering, 2020, 20 (11): 4442- 4450 doi: 10.3969/j.issn.1671-1815.2020.11.035

[1]	郝琨,王阔,王贝贝. 基于改进Mobilenet-YOLOv3的轻量级水下生物检测算法[J]. 浙江大学学报(工学版), 2022, 56(8): 1622-1632.
[2]	莫仁鹏,司小胜,李天梅,朱旭. 基于多尺度特征与注意力机制的轴承寿命预测[J]. 浙江大学学报(工学版), 2022, 56(7): 1447-1456.
[3]	王友卫,童爽,凤丽洲,朱建明,李洋,陈福. 基于图卷积网络的归纳式微博谣言检测新方法[J]. 浙江大学学报(工学版), 2022, 56(5): 956-966.
[4]	鞠晓臣,赵欣欣,钱胜胜. 基于自注意力机制的桥梁螺栓检测算法[J]. 浙江大学学报(工学版), 2022, 56(5): 901-908.
[5]	张雪芹,李天任. 基于Cycle-GAN和改进DPN网络的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2022, 56(4): 727-735.
[6]	许萌,王丹,李致远,陈远方. IncepA-EEGNet: 融合Inception网络和注意力机制的P300信号检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 745-753, 782.
[7]	柳长源,何先平,毕晓君. 融合注意力机制的高效率网络车型识别[J]. 浙江大学学报(工学版), 2022, 56(4): 775-782.
[8]	杨淑琴,马玉浩,方铭宇,钱伟行,蔡洁萱,刘童. 基于实例分割的复杂环境车道线检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 809-815, 832.
[9]	陈巧红,裴皓磊,孙麒. 基于视觉关系推理与上下文门控机制的图像描述[J]. 浙江大学学报(工学版), 2022, 56(3): 542-549.
[10]	农元君,王俊杰,陈红,孙文涵,耿慧,李书悦. 基于注意力机制和编码-解码架构的施工场景图像描述方法[J]. 浙江大学学报(工学版), 2022, 56(2): 236-244.
[11]	刘英莉,吴瑞刚,么长慧,沈韬. 铝硅合金实体关系抽取数据集的构建方法[J]. 浙江大学学报(工学版), 2022, 56(2): 245-253.
[12]	李飞,胡坤,张勇,王文善,蒋浩. 基于混合域注意力YOLOv4的输送带纵向撕裂多维度检测[J]. 浙江大学学报(工学版), 2022, 56(11): 2156-2167.
[13]	彭向东,潘从成,柯泽浚,朱华强,周肖. 基于并行架构和时空注意力机制的心电分类方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1912-1923.
[14]	董红召,方浩杰,张楠. 旋转框定位的多尺度再生物品目标检测算法[J]. 浙江大学学报(工学版), 2022, 56(1): 16-25.
[15]	王鑫,陈巧红,孙麒,贾宇波. 基于关系推理与门控机制的视觉问答方法[J]. 浙江大学学报(工学版), 2022, 56(1): 36-46.

Viewed

Full text

Abstract

Cited

Shared

Discussed