Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (1): 32-42    DOI: 10.3785/j.issn.1008-973X.2026.01.003
计算机技术     
基于多方位感知深度融合检测头的目标检测算法
包晓安1(),彭书友1,张娜1,涂小妹2,张庆琪3,吴彪4,*()
1. 浙江理工大学 计算机科学与技术学院,浙江 杭州 310018
2. 浙江广厦建设职业技术大学 建筑工程学院,浙江 东阳 322100
3. 山口大学 大学院东亚研究科,日本 山口 753-8514
4. 浙江理工大学 理学院,浙江 杭州 310018
Object detection algorithm based on multi-azimuth perception deep fusion detection head
Xiao’an BAO1(),Shuyou PENG1,Na ZHANG1,Xiaomei TU2,Qingqi ZHANG3,Biao WU4,*()
1. School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. School of Civil Engineering and Architecture, Zhejiang Guangsha Vocational and Technical University of Construction, Dongyang 322100, China
3. Graduate School of East Asian Studies, Yamaguchi University, Yamaguchi 753-8514, Japan
4. School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China
 全文: PDF(3033 KB)   HTML
摘要:

针对传统目标检测头难以有效捕捉全局信息的问题,提出基于多方位感知深度融合检测头的目标检测算法. 通过在检测头部分设计高效双轴窗口注意力编码器(EDWE)模块,使网络能够深度融合捕获到的全局信息与局部信息;在特征金字塔结构之后使用重参化大核卷积(RLK)模块,减小来自主干网络的特征空间差异,增强网络对中小型数据集的适应性;引入编码器选择保留模块(ESM),选择性地累积来自EDWE模块的输出,优化反向传播. 实验结果表明,在规模较大的MS-COCO2017数据集上,所提算法应用于常见模型RetinaNet、FCOS、ATSS时使AP分别提升了2.9、2.6、3.4个百分点;在规模较小的PASCAL VOC2007数据集上,所提算法使3种模型的AP分别实现了1.3、1.0和1.1个百分点的提升. 通过EDWE、RLK和ESM模块的协同作用,所提算法有效提升了目标检测精度,在不同规模的数据集上均展现了显著的性能优势.

关键词: 检测头目标检测Transformer编码器深度融合大核卷积    
Abstract:

An object detection algorithm based on multi-azimuth perception deep fusion detection head was proposed to address the challenge that traditional object detection heads struggled to effectively capture global information. An efficient dual-axial-window attention encoder (EDWE) module was designed to enable the network to deeply fuse the captured global information and local information. A reparameterized large kernel convolution (RLK) module was employed after the feature pyramid structure to alleviate feature space discrepancies from the backbone network and enhance the network’s adaptability to small and medium-sized datasets. An encoder selective-save module (ESM) was introduced to selectively accumulate the outputs from the EDWE module and optimize the backpropagation process. Experimental results demonstrated that on the larger-scale MS-COCO2017 dataset, the AP values were improved by 2.9, 2.6, and 3.4 percentage points when the proposed algorithm was applied to the common models RetinaNet, FCOS, and ATSS, respectively. On the smaller-scale PASCAL VOC2007 dataset, the proposed algorithm achieved improvements of 1.3, 1.0, and 1.1 percentage points in the AP values of the three models respectively. Through the synergistic integration of the EDWE, RLK, and ESM modules, the proposed algorithm effectively enhances the object detection accuracy and has significant performance advantages across datasets of varying scales.

Key words: detection head    object detection    Transformer encoder    deep fusion    large kernel convolution
收稿日期: 2024-12-11 出版日期: 2025-12-15
:  TP 391  
基金资助: 国家自然科学基金资助项目(6207050141);浙江省重点研发计划资助项目(2020C03094);浙江省教育厅一般科研项目(Y202147659);浙江省教育厅项目(Y202250706, Y202250677);浙江省基础公益研究计划资助项目(QY19E050003).
通讯作者: 吴彪     E-mail: baoxiaoan@zstu.edu.cn;biaowuzg@zstu.edu.cn
作者简介: 包晓安(1973—),男,教授,从事机器视觉研究. orcid.org/0000-0001-8305-0369. E-mail:baoxiaoan@zstu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
包晓安
彭书友
张娜
涂小妹
张庆琪
吴彪

引用本文:

包晓安,彭书友,张娜,涂小妹,张庆琪,吴彪. 基于多方位感知深度融合检测头的目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 32-42.

Xiao’an BAO,Shuyou PENG,Na ZHANG,Xiaomei TU,Qingqi ZHANG,Biao WU. Object detection algorithm based on multi-azimuth perception deep fusion detection head. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 32-42.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.01.003        https://www.zjujournals.com/eng/CN/Y2026/V60/I1/32

图 1  基于多方位感知深度融合检测头的目标检测算法的网络结构
图 2  高效双轴注意力模块示意图
图 3  高效双轴窗口注意力编码器(EDWE)模块结构
图 4  由不同注意力模块生成的特征聚焦区域可视化热力图
图 5  重参化大核卷积(RLK)模块结构
图 6  后期编码阶段中出现错误预测的现象
图 7  原始模块、编码器密集保留模块与编码器选择保留模块示意图
检测器方法Np/106FLOPs/109AP/%AP50/%AP75/%FPS/(帧·s?1)
RetinaNet(R101)Baseline56.961282.9138.557.641.023.2
RetinaNet(R101)MdfHead58.013283.4241.460.743.622.9
FCOS(R101)Baseline51.287248.2639.158.342.123.1
FCOS(R101)MdfHead52.117249.1941.761.044.123.0
CenterNet(R50)Baseline32.293179.9940.258.343.932.9
CenterNet(R50)MdfHead33.311183.3742.060.445.832.5
ATSS(R101)Baseline51.283252.5241.559.945.223.4
ATSS(R101)MdfHead52.108253.0444.963.649.123.2
PAA(R101)Baseline51.435255.1142.660.846.620.5
PAA(R101)MdfHead52.303255.8445.263.248.420.3
表 1  不同目标检测器在MS-COCO2017数据集上使用MdfHead的结果
RLKEDWEESMNp/106FLOPs/109AP/%AP50/%AP75/%
56.961282.9138.557.641.0
54.413243.2034.654.138.4
55.501274.1036.755.539.8
58.013283.4240.359.642.3
55.501274.1039.758.742.1
58.013283.4241.460.743.6
表 2  Mdfhead中不同模块的消融实验
方法Np/106FLOPs/109AP/%AP50/%AP75/%
标准编码器61.101293.1140.159.441.9
EDA57.462281.9239.359.041.2
W-MSA58.704285.1338.758.340.7
EDWE58.013283.4241.460.743.6
表 3  使用不同编码器模块的对比实验
NclsNregNp/106FLOPs/109AP/%AP50/%AP75/%
56.961282.9138.557.641.0
1155.313253.2437.757.040.1
2256.213263.3138.457.840.7
3357.113273.3639.358.041.8
4458.013283.4241.460.743.6
5560.021292.1841.760.843.8
表 4  EDWE模块数量的选择实验
卷积核大小Np/106FLOPs/109AP/%AP50/%AP75/%
3-3-3-3-355.742272.4539.959.342.3
7-7-7-7-756.313272.8840.560.042.8
13-13-9-9-758.013283.4241.460.743.6
13-13-13-13-1358.518284.0141.360.942.9
25-25-25-25-1363.729285.8341.661.043.7
表 5  RLK模块中卷积核大小的选择实验
检测器方法Np/106FLOPs/109mAP/%
RetinaNetBaseline36.724106.7171.1
未使用RLK35.817101.3968.3
使用RLK37.901108.0572.4
FCOSBaseline32.15799.1669.6
未使用RLK31.25095.1567.1
使用RLK33.334100.7270.6
ATSSBaseline32.157101.5469.2
未使用RLK31.48999.9567.3
使用RLK32.677102.3670.3
表 6  RLK模块在较小数据集上的有效性实验
方法开始阶段训练时长AP/%AP50/%AP75/%
Base1.00×40.359.642.3
EDM2.31×41.260.343.4
ESM11.67×41.460.743.6
ESM21.44×41.260.643.0
ESM31.28×41.060.042.9
ESM41.11×40.659.842.5
表 7  ESM的消融实验
图 8  MdfHead与原始检测头的可视化对比实验结果
图 9  MdfHead与原始检测头对野生动物数据集中不同类别的检测精度
方法Np/106FLOPs/109FPS/(帧·s?1)
MdfHead37.697106.0341.3
Baseline36.518104.6241.5
表 8  不同检测头在野生动物数据集上的对比实验
1 REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149
doi: 10.1109/TPAMI.2016.2577031
2 LI W, ZHAO D, YUAN B, et al PETDet: proposal enhancement for two-stage fine-grained object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62: 5602214
3 LI H, SHI F A DETR-like detector-based semi-supervised object detection method for Brassica Chinensis growth monitoring[J]. Computers and Electronics in Agriculture, 2024, 219: 108788
doi: 10.1016/j.compag.2024.108788
4 HOU X, LIU M, ZHANG S, et al. Relation DETR: exploring explicit position relation prior for object detection [C]// Proceedings of the European Conference on Computer Vision. Milan: Springer, 2024: 89–105.
5 ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965–16974.
6 CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// Proceedings of the European Conference on Computer Vision. Glasgow: Springer, 2020: 213–229.
7 LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2999–3007.
8 TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9626–9635.
9 DUAN K, BAI S, XIE L, et al. CenterNet: keypoint triplets for object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6568–6577.
10 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3–19.
11 LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936–944.
12 CHEN F, ZHANG H, HU K, et al. Enhanced training of query-based object detection via selective query recollection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 23756–23765.
13 REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018−04−08) [2024−10−07]. https://arxiv.org/abs/1804.02767.
14 BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020−04-23) [2024−10−07]. https://arxiv.org/abs/2004.10934.
15 TIAN Z, CHU X, WANG X, et al. Fully convolutional one-stage 3D object detection on LiDAR range images [EB/OL]. (2022−09−20) [2024−10−07]. https://arxiv.org/abs/2205.13764.
16 GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021 [EB/OL]. (2021−08−06) [2024−10−07]. https://arxiv.org/abs/2107.08430.
17 WU Y, CHEN Y, YUAN L, et al. Rethinking classification and localization for object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 10183–10192.
18 DAI X, CHEN Y, XIAO B, et al. Dynamic head: unifying object detection heads with attentions [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 7369–7378.
19 LIANG J, SONG G, LENG B, et al. Unifying visual perception by dispersible points learning [C]// Proceedings of the European Conference on Computer Vision. Tel Aviv: Springer, 2022: 439–456.
20 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
21 DING X, ZHANG X, HAN J, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 11953–11965.
22 LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992–10002.
23 ZHOU H, YANG R, ZHANG Y, et al UniHead: unifying multi-perception for detection heads[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36 (5): 9565- 9576
24 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc, 2017: 6000–6010.
25 LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context [C]// Proceedings of the European Conference on Computer Vision. Zurich: Springer, 2014: 740–755.
26 EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88 (2): 303- 338
doi: 10.1007/s11263-009-0275-4
27 CHEN K, WANG J, PANG J, et al. MMDetection: open MMLab detection toolbox and benchmark. [EB/OL]. (2019−06−17) [2024−10−07]. https://arxiv.org/abs/1906.07155.
28 DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248–255.
29 ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 9756–9765.
30 KIM K, LEE H S. Probabilistic anchor assignment with IoU prediction for object detection [C]// Proceedings of the European Conference on Computer Vision. Glasgow: Springer, 2020: 355–371.
[1] 肖剑,何昕泽,程鸿亮,杨小苑,胡欣. 基于多尺度特征增强的航拍小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 19-31.
[2] 董超群,汪战,廖平,谢帅,荣玉杰,周靖淞. 轻量化YOLOv5s-OCG的轨枕裂纹检测算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1838-1845.
[3] 翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[4] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[5] 徐慧智,王秀青. 基于车辆图像特征的前车距离与速度感知[J]. 浙江大学学报(工学版), 2025, 59(6): 1219-1232.
[6] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[7] 王浚银,文斌,沈艳军,张俊,王子豪. 基于改进YOLOv7-tiny的铝型材表面缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(3): 523-534.
[8] 董红召,林少轩,佘翊妮. 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报(工学版), 2025, 59(2): 249-260.
[9] 何永福,谢世维,于佳禄,陈思宇. 考虑跨层特征融合的抛洒风险车辆检测方法[J]. 浙江大学学报(工学版), 2025, 59(2): 300-309.
[10] 陈江浩,杨军. 结合深度可分离卷积的多源遥感融合影像目标检测[J]. 浙江大学学报(工学版), 2025, 59(12): 2545-2555.
[11] 于家艺,吴秦. 基于上下文信息增强和深度引导的单目3D目标检测[J]. 浙江大学学报(工学版), 2025, 59(1): 89-99.
[12] 宋娟,贺龙喜,龙会平. 基于深度学习的隧道衬砌多病害检测算法[J]. 浙江大学学报(工学版), 2024, 58(6): 1161-1173.
[13] 曹寅,秦俊平,高彤,马千里,任家琪. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报(工学版), 2024, 58(4): 674-683.
[14] 邓天民,程鑫鑫,刘金凤,张曦月. 基于特征复用机制的航拍图像小目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 437-448.
[15] 张会娟,李坤鹏,姬淼鑫,刘振江,刘建娟,张弛. 基于空间相关性增强的无人机检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 468-479.