|
|
|
| Object detection algorithm based on multi-azimuth perception deep fusion detection head |
Xiao’an BAO1( ),Shuyou PENG1,Na ZHANG1,Xiaomei TU2,Qingqi ZHANG3,Biao WU4,*( ) |
1. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China 2. School of Civil Engineering and Architecture, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China 3. Graduate School of East Asian Studies, Yamaguchi University, Yamaguchi 753-8514, Japan 4. School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China |
|
|
|
Abstract An object detection algorithm based on multi-azimuth perception deep fusion detection head was proposed to address the challenge that traditional object detection heads struggled to effectively capture global information. An efficient dual-axial-window attention encoder (EDWE) module was designed to enable the network to deeply fuse the captured global information and local information. A reparameterized large kernel convolution (RLK) module was employed after the feature pyramid structure to alleviate feature space discrepancies from the backbone network and enhance the network’s adaptability to small and medium-sized datasets. An encoder selective-save module (ESM) was introduced to selectively accumulate the outputs from the EDWE module and optimize the backpropagation process. Experimental results demonstrated that on the larger-scale MS-COCO2017 dataset, the AP values were improved by 2.9, 2.6, and 3.4 percentage points when the proposed algorithm was applied to the common models RetinaNet, FCOS, and ATSS, respectively. On the smaller-scale PASCAL VOC2007 dataset, the proposed algorithm achieved improvements of 1.3, 1.0, and 1.1 percentage points in the AP values of the three models respectively. Through the synergistic integration of the EDWE, RLK, and ESM modules, the proposed algorithm effectively enhances the object detection accuracy and has significant performance advantages across datasets of varying scales.
|
|
Received: 11 December 2024
Published: 15 December 2025
|
|
|
| Fund: 国家自然科学基金资助项目(6207050141);浙江省重点研发计划资助项目(2020C03094);浙江省教育厅一般科研项目(Y202147659);浙江省教育厅项目(Y202250706, Y202250677);浙江省基础公益研究计划资助项目(QY19E050003). |
|
Corresponding Authors:
Biao WU
E-mail: baoxiaoan@zstu.edu.cn;biaowuzg@zstu.edu.cn
|
基于多方位感知深度融合检测头的目标检测算法
针对传统目标检测头难以有效捕捉全局信息的问题,提出基于多方位感知深度融合检测头的目标检测算法. 通过在检测头部分设计高效双轴窗口注意力编码器(EDWE)模块,使网络能够深度融合捕获到的全局信息与局部信息;在特征金字塔结构之后使用重参化大核卷积(RLK)模块,减小来自主干网络的特征空间差异,增强网络对中小型数据集的适应性;引入编码器选择保留模块(ESM),选择性地累积来自EDWE模块的输出,优化反向传播. 实验结果表明,在规模较大的MS-COCO2017数据集上,所提算法应用于常见模型RetinaNet、FCOS、ATSS时使AP分别提升了2.9、2.6、3.4个百分点;在规模较小的PASCAL VOC2007数据集上,所提算法使3种模型的AP分别实现了1.3、1.0和1.1个百分点的提升. 通过EDWE、RLK和ESM模块的协同作用,所提算法有效提升了目标检测精度,在不同规模的数据集上均展现了显著的性能优势.
关键词:
检测头,
目标检测,
Transformer编码器,
深度融合,
大核卷积
|
|
| [1] |
REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149
doi: 10.1109/TPAMI.2016.2577031
|
|
|
| [2] |
LI W, ZHAO D, YUAN B, et al PETDet: proposal enhancement for two-stage fine-grained object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62: 5602214
|
|
|
| [3] |
LI H, SHI F A DETR-like detector-based semi-supervised object detection method for Brassica Chinensis growth monitoring[J]. Computers and Electronics in Agriculture, 2024, 219: 108788
doi: 10.1016/j.compag.2024.108788
|
|
|
| [4] |
HOU X, LIU M, ZHANG S, et al. Relation DETR: exploring explicit position relation prior for object detection [C]// Proceedings of the European Conference on Computer Vision. Milan: Springer, 2024: 89–105.
|
|
|
| [5] |
ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965–16974.
|
|
|
| [6] |
CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// Proceedings of the European Conference on Computer Vision. Glasgow: Springer, 2020: 213–229.
|
|
|
| [7] |
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999–3007.
|
|
|
| [8] |
TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9626–9635.
|
|
|
| [9] |
DUAN K, BAI S, XIE L, et al. CenterNet: keypoint triplets for object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6568–6577.
|
|
|
| [10] |
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3–19.
|
|
|
| [11] |
LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936–944.
|
|
|
| [12] |
CHEN F, ZHANG H, HU K, et al. Enhanced training of query-based object detection via selective query recollection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 23756–23765.
|
|
|
| [13] |
REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018−04−08) [2024−10−07]. https://arxiv.org/abs/1804.02767.
|
|
|
| [14] |
BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020−04-23) [2024−10−07]. https://arxiv.org/abs/2004.10934.
|
|
|
| [15] |
TIAN Z, CHU X, WANG X, et al. Fully convolutional one-stage 3D object detection on LiDAR range images [EB/OL]. (2022−09−20) [2024−10−07]. https://arxiv.org/abs/2205.13764.
|
|
|
| [16] |
GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021 [EB/OL]. (2021−08−06) [2024−10−07]. https://arxiv.org/abs/2107.08430.
|
|
|
| [17] |
WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10183–10192.
|
|
|
| [18] |
DAI X, CHEN Y, XIAO B, et al. Dynamic head: unifying object detection heads with attentions [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7369–7378.
|
|
|
| [19] |
LIANG J, SONG G, LENG B, et al. Unifying visual perception by dispersible points learning [C]// Proceedings of the European Conference on Computer Vision. Tel Aviv: Springer, 2022: 439–456.
|
|
|
| [20] |
HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
|
|
|
| [21] |
DING X, ZHANG X, HAN J, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953–11965.
|
|
|
| [22] |
LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992–10002.
|
|
|
| [23] |
ZHOU H, YANG R, ZHANG Y, et al UniHead: unifying multi-perception for detection heads[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36 (5): 9565- 9576
|
|
|
| [24] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc, 2017: 6000–6010.
|
|
|
| [25] |
LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context [C]// Proceedings of the European Conference on Computer Vision. Zurich: Springer, 2014: 740–755.
|
|
|
| [26] |
EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338
doi: 10.1007/s11263-009-0275-4
|
|
|
| [27] |
CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark. [EB/OL]. (2019−06−17) [2024−10−07]. https://arxiv.org/abs/1906.07155.
|
|
|
| [28] |
DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248–255.
|
|
|
| [29] |
ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756–9765.
|
|
|
| [30] |
KIM K, LEE H S. Probabilistic anchor assignment with IoU prediction for object detection [C]// Proceedings of the European Conference on Computer Vision. Glasgow: Springer, 2020: 355–371.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|