Small target vehicle detection based on multi-scale fusion technology and attention mechanism

doi:10.3785/j.issn.1008-973X.2022.11.015

Journal of ZheJiang University (Engineering Science)

2022, Vol. 56

Issue (11): 2241-2250 DOI: 10.3785/j.issn.1008-973X.2022.11.015

Small target vehicle detection based on multi-scale fusion technology and attention mechanism

Kai LI(

),Yu-shun LIN*(

),Xiao-lin WU,Fei-yu LIAO

School of Transportation and Civil Engineering, Fujian Agriculture and Forestry University, Fuzhou 350108, China

Download:

HTML

PDF(2082KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A method based on attention mechanism and multi-scale information fusion was proposed to resovle the problem of low accuracy of the traditional single shot multibox detector (SSD) algorithm in detecting small targets. The algorithm was applied to the vehicle detection task. The feature maps of the target detection branch were fused with 5 branches and 2 branches respectively, combining the advantages of the shallow feature map and the deep feature map. The attention mechanism module was added between the basic network layers to make the model pay attention to the channels containing more information. Experimental results showed that the mean average precision of the self-built vehicle data set reached 90.2%, which was 10.0% higher than the traditional SSD algorithm. The detection accuracy of small objects was improved by 17.9%. The mAP on the PASCAL VOC 2012 dataset was 83.1%, which was 6.4% higher than the current mainstream YOLOv5 algorithm. The detection speed of proposed algorithm on the GTX1 660 Ti PC reached 25 frame/s, which satisfied the demand of real-time performance.

Key words： SSD FPN multi-scale fusion attention mechanism vehicle detection

Received: 23 November 2021 Published: 02 December 2022

CLC:

TP 751

Fund: 福建省科技重大事项（2019HZ07011）；福建省自然科学基金资助项目（2020J05029）

Corresponding Authors: Yu-shun LIN E-mail: 15733152192@163.com;lshun@fafu.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Kai LI
	Yu-shun LIN
	Xiao-lin WU
	Fei-yu LIAO

Cite this article:

Kai LI,Yu-shun LIN,Xiao-lin WU,Fei-yu LIAO. Small target vehicle detection based on multi-scale fusion technology and attention mechanism. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2241-2250.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.11.015 OR https://www.zjujournals.com/eng/Y2022/V56/I11/2241

基于多尺度融合与注意力机制的小目标车辆检测

针对传统目标检测算法（SSD）检测小目标精度低的问题，提出基于注意力机制与多尺度信息融合方法并将其运用于车辆检测任务. 结合浅层特征图与深层特征图的优势，小目标检测分支和大中型目标检测分支的特征图采用5支路和2支路融合. 在基础网络层之间加入注意力机制模块，模型会关注包含更多信息量的通道. 实验结果表明，在自建车辆数据集上的均值平均精度（mAP）达到90.2%，比传统SSD算法提高了10.0%，其中小目标检测精度提高了17.9%；在PASCAL VOC 2012数据集上的类别平均精度mAP为83.1%，比主流的YOLOv5算法提高了6.4%. 此外，提出算法在GTX1 660 Ti PC端的检测速度可以达到25 帧/s，能够满足实时性的需求.

关键词： SSD, 特征金字塔, 多尺度融合, 注意力机制, 车辆检测

Fig.1 Network structure diagram of small target detection method based on SE module and multi-scale feature fusion technology

Fig.2 Attention mechanism module based on SENet

Fig.3 Conv7~Conv11_2 feature fusion map

Fig.4 Conv4_3 feature fusion

Fig.5 Sample image of self-built vehicle dataset

Fig.6 Schematic diagram of targets detection

Tab.1 Test results of each method on self-built vehicle dataset

Tab.2 Comparison of test performance of various methods on self-built vehicle datasets

Tab.3 Conv4_3 layer different fusion method performance comparison

Fig.7 Comparison diagram of SSD and proposed method test results

Tab.4 AP value of various methods in PASCAL VOC data test %

Fig.8 Comparison of detection results of SSD algorithm and method for small targets

Tab.5 Comparison of detection speed of various methods


[1]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). San Diego: IEEE, 2005, 1: 886-893.

[2]	KUMAR P, HENIKOFF S, NG P C Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm[J]. Nature Protocols, 2009, 4 (8): 1073- 1081

[3]	CHERKASSKY V, MA Y Practical selection of SVM parameters and noise estimation for SVM regression[J]. Neural networks, 2004, 17 (1): 113- 126 doi: 10.1016/S0893-6080(03)00169-2

[4]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.

[5]	GIRSHICK R. Fast r-cnn [C]// Proceedings of the IEEE International Conference on Computer Vision. Boston: IEEE, 2015: 1440-1448.

[6]	REN S, HE K, GIRSHICK R, et al Faster r-cnn: towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 1137- 1149

[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society. Las Vegas: IEEE, 2016: 779-788.

[8]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// European Conference on Computer Vision, LNCS 9905. Berlin: Springer, 2016: 21-37.

[9]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1701.06659v1.

[10]	LI Z, ZHOU F. FSSD: feature fusion single shot multibox detector [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1712.00960.

[11]	JEONG J, PARK H, KWAK N. Enhancement of SSD by concatenating feature maps for object detection [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1705.09587.

[12]	李航, 朱明基于深度卷积神经网络的小目标检测算法[J]. 计算机工程与科学, 2020, 42 (4): 649- 657 LI Hang, ZHU Ming A small object detection algorithm based on deep convolutional neural network[J]. Computer Engineering and Science, 2020, 42 (4): 649- 657 doi: 10.3969/j.issn.1007-130X.2020.04.011

[13]	CHEN Yu-kang, ZHANG Pei-zhen, LI Ze-ming, et al. Dynamic scale training for object detection [EB/OL]. [2021-11-23].https://arxiv.org/abs/2004.12432v2.

[14]	LIU S, HUANG D, WANG Y. Learning spatial fusion for single shot object detection [EB/OL]. [2021-11-23]. https://arxiv.org/abs/1911.09516.

[15]	ZOPH B, CUBUK E D, GHIASI G, et al. Learning data augmentation strategies for object detection [C]// European Conference on Computer Vision. Springer, Cham, 2020: 566-583.

[16]	WANG T, ANWER R M, CHOLAKKAL H, et al. Learning rich features at high-speed for single-shot object detection [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 1971-1980.

[17]	HU J, SHEN L, SUN G. Squeeze and excitation networks [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 7132-7141.

[18]	DUAN K, BAI S, XIE L, et al. Centernet: keypoint triplets for object detection [C]// Proceedings of the IEEE/CVF international conference on computer vision. Long Beach: IEEE, 2019: 6569-6578.

[19]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-yolov4: scaling cross stage partial network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 13029-13038.

[20]	YANG G, FENG W, JIN J, et al. Face mask recognition system with YOLOV5 based on image recognition [C]// IEEE 6th International Conference on Computer and Communications (ICCC). Seattle: IEEE, 2020: 1398-1404.

[21]	梁鸿, 李洋, 邵明文, 等基于残差网络和改进特征金字塔的油田作业现场目标检测算法[J]. 科学技术与工程, 2020, 20 (11): 4442- 4450 LIANG Hong, LI Yang, SHAO Ming-wen, et al Field object detection for oilfield operation based on residual network and improved feature pyramid networks[J]. Science Technology and Engineering, 2020, 20 (11): 4442- 4450 doi: 10.3969/j.issn.1671-1815.2020.11.035

[1]	Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.

[2]	Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.

[3]	You-wei WANG,Shuang TONG,Li-zhou FENG,Jian-ming ZHU,Yang LI,Fu CHEN. New inductive microblog rumor detection method based on graph convolutional network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 956-966.

[4]	Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.

[5]	Meng XU,Dan WANG,Zhi-yuan LI,Yuan-fang CHEN. IncepA-EEGNet: P300 signal detection method based on fusion of Inception network and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 745-753, 782.

[6]	Chang-yuan LIU,Xian-ping HE,Xiao-jun BI. Efficient network vehicle recognition combined with attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 775-782.

[7]	Xue-qin ZHANG,Tian-ren LI. Breast cancer pathological image classification based on Cycle-GAN and improved DPN network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 727-735.

[8]	Qiao-hong CHEN,Hao-lei PEI,Qi SUN. Image caption based on relational reasoning and context gate mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 542-549.

[9]	Yuan-jun NONG,Jun-jie WANG,Hong CHEN,Wen-han SUN,Hui GENG,Shu-yue LI. A image caption method of construction scene based on attention mechanism and encoding-decoding architecture[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 236-244.

[10]	Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.

[11]	Fei LI,Kun HU,Yong ZHANG,Wen-shan WANG,Hao JIANG. Multi-dimensional detection of longitudinal tearing of conveyor belt based on YOLOv4 of hybrid domain attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2156-2167.

[12]	Xiang-dong PENG,Cong-cheng PAN,Ze-jun KE,Hua-qiang ZHU,Xiao ZHOU. Classification method for electrocardiograph signals based on parallel architecture model and spatial-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1912-1923.

[13]	Xin WANG,Qiao-hong CHEN,Qi SUN,Yu-bo JIA. Visual question answering method based on relational reasoning and gating mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 36-46.

[14]	Zhi-chao CHEN,Hai-ning JIAO,Jie YANG,Hua-fu ZENG. Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1490-1499.

[15]	Zi-ye YONG,Ji-chang GUO,Chong-yi LI. weakly supervised underwater image enhancement algorithm incorporating attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 555-562.

Viewed

Full text

Abstract

Cited

Shared

Discussed