Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (12): 2545-2555    DOI: 10.3785/j.issn.1008-973X.2025.12.009
计算机技术     
结合深度可分离卷积的多源遥感融合影像目标检测
陈江浩1,2,3(),杨军1,2,3,4,*()
1. 兰州交通大学 测绘与地理信息学院,甘肃 兰州 730070
2. 地理国情监测技术应用国家地方联合工程研究中心,甘肃 兰州 730070
3. 甘肃省地理国情监测工程实验室,甘肃 兰州 730070
4. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070
Object detection for multi-source remote sensing fused images based on depthwise separable convolution
Jianghao CHEN1,2,3(),Jun YANG1,2,3,4,*()
1. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China
2. National and Local Joint Engineering Research Center of Geographical Monitoring Technology Application, Lanzhou 730070, China
3. Gansu Provincial Engineering Laboratory of Geographical Monitoring, Lanzhou 730070, China
4. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
 全文: PDF(2620 KB)   HTML
摘要:

针对卷积下采样在遥感影像处理中特征提取能力不足,以及传统特征级融合方法未能充分发挥多源遥感数据互补优势的问题,提出结合改进深度可分离卷积与多尺度特征提取模块的多源遥感融合影像目标检测网络. 设计双分支可分离卷积模块,通过深度卷积与残差连接增强深层语义特征表达,提升复杂背景下的判别性能. 构建全局-局部自适应特征融合模块,利用分离卷积将特征图拆分为不同维度分量,分别捕获全局结构与局部细节,再通过自适应机制进行融合,实现跨源影像信息互补与多尺度特征协同. 实验在VEDAI多源数据集上验证,平均检测精度达到82.80%,较ICAfusion提升2.00个百分点,在与YOLOrs、YOLOfusion、SuperYOLO、MF-YOLO等方法对比中保持更优表现. 所提网络在多源遥感影像特征级融合方面展现出较高有效性,在目标检测任务中取得显著性能提升.

关键词: 多源遥感影像特征提取特征级融合深度可分离卷积多尺度特征目标检测    
Abstract:

A multi-source remote sensing image fusion and object detection network based on improved depthwise separable convolution and a multi-scale feature extraction module was proposed to address the limitation of convolutional downsampling in feature extraction and the problem of traditional feature-level fusion methods failing to fully leverage the complementary advantages of multi-source remote sensing data. A dual-branch separable convolution module was designed to enhance deep semantic feature representation through depthwise convolution and residual connections, thereby improving discriminative performance under complex backgrounds. Furthermore, a global-local adaptive feature fusion module was constructed, where feature maps were decomposed into different dimensional components using separable convolution to capture global structures and local details separately. These features were then fused via an adaptive mechanism to achieve cross-source information complementarity and multi-scale feature collaboration. Experiments on the VEDAI multi-source dataset demonstrated that the proposed method achieved a mean average precision (mAP) of 82.80%, which was 2.00 percentage points higher than that of ICAfusion, while also outperforming YOLOrs, YOLOfusion, SuperYOLO, and MF-YOLO. The network shows high effectiveness in feature-level fusion of multi-source remote sensing images and yields significant performance improvements in object detection tasks.

Key words: multi-source remote sensing image    feature extraction    feature-level fusion    depthwise separable convolution    multi-scale feature    object detection
收稿日期: 2025-01-23 出版日期: 2025-11-25
CLC:  TP 751.1  
基金资助: 国家自然科学基金资助项目(42261067);2025年度甘肃省重点人才资助项目(2025RCXM031).
通讯作者: 杨军     E-mail: 11220897@stu.lzjtu.edu.cn;yangj@mail.lzjtu.cn
作者简介: 陈江浩(1996—),男,硕士生,从事遥感影像智能解译研究. orcid.org/0009-0002-3014-2424. E-mail:11220897@stu.lzjtu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈江浩
杨军

引用本文:

陈江浩,杨军. 结合深度可分离卷积的多源遥感融合影像目标检测[J]. 浙江大学学报(工学版), 2025, 59(12): 2545-2555.

Jianghao CHEN,Jun YANG. Object detection for multi-source remote sensing fused images based on depthwise separable convolution. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2545-2555.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.12.009        https://www.zjujournals.com/eng/CN/Y2025/V59/I12/2545

图 1  网络总体框架
图 2  双分支可分离卷积模块
图 3  全局-局部自适应特征融合模块
类别AP50/%AP75/%
cars97.3174.72
pickup96.8278.62
camping93.2176.28
truck87.0377.13
tractor98.1755.31
boat59.4348.67
van32.0428.91
其他88.2559.53
mAP82.8063.41
表 1  本研究算法在VEDAI数据集上的目标检测精度
图 4  本研究算法和ICAfusion算法在VEDAI数据集上的目标检测可视化结果对比
图 5  本研究算法和ICAfusion算法在VEDAI数据集上的目标检测热力图可视化结果对比
类别mAP50/%GFLOPs/106Params/106FPS/Hz
YOLOrs[26]58.9746.420.223.9
YOLOfusion[27]78.6027.312.518.2
SuperYOLO[16]79.494.816.612.7
MF-YOLO[17]76.62
ICAfusion[21]80.8058.2120.228.4
本研究算法82.8060.7139.330.2
表 2  本研究算法在VEDAI数据集上与其他主流算法的目标检测结果对比
类别AP50/%
carspickupcampingtrucktractorboatvan其他
YOLOrs[26]83.4876.9665.6953.5169.0722.2856.8843.88
YOLOfusion[27]91.7285.9178.9478.1571.9671.1475.2354.77
SuperYOLO[16]91.6186.8079.2589.3386.3954.2681.5168.79
MF-YOLO[17]92.0386.6178.1972.5882.8864.6478.6657.36
ICAfusion[21]97.0596.2189.6492.6694.5064.5328.3383.40
本研究算法97.3196.8293.2187.0398.1759.4332.0488.25
表 3  本研究算法在VEDAI数据集上与其他主流算法的目标检测精度对比
类别mAP50/%GFLOPs/106Params/106FPS/Hz
MMTOD-UNIT[28]61.50
CFR[29]72.40
BU-LTT[30]73.2073.5149.328.0
CFT[27]78.30224.4206.032.3
ICAfusion[21]79.2058.2120.227.7
本研究算法80.1660.7139.327.9
表 4  本研究算法在FLIR数据集上与其他主流算法的目标检测精度对比
类别MR/%FPS/Hz
MBNet[31]8.4014.3
MLPD[32]7.58
MSDS-RCNN[33]8.234.6
ICAfusion[21]7.1738.9
本研究算法7.1434.3
表 5  本研究算法在KAIST数据集上与其他主流算法的目标检测精度对比
模型mAP50/%mAP75/%mAP50:95/%
Baseline(模型1)80.8054.1348.33
Baseline+DBSConv(模型2)81.5354.1950.44
Baseline+GLAFF(模型3)81.0455.2849.14
Baseline+DBSConv+GLAFF
(本研究模型)
82.8063.4153.31
表 6  VEDAI数据集上消融实验的目标检测精度对比
模块AP50/%mAP/%
carspickupcampingtrucktractorboatvan其他
Baseline(模型1)97.0596.2189.6492.6694.5064.5328.3383.4080.80
Baseline+DBSConv(模型2)96.0395.1792.3396.0494.0645.4140.3483.3381.53
Baseline+GLAFF(模型3)96.9195.9389.1396.6291.9753.0128.8988.2181.04
Baseline+DBSConv+GLAFF(本研究模型)97.3196.8293.2187.0398.1759.4332.0488.2582.80
表 7  VEDAI数据集上消融实验的不同类别目标检测精度对比
影像来源mAP50/%mAP75/%mAP50:95/%
Visible79.6958.2249.51
Infrared77.1255.8448.90
Visible+Vnfrared(本研究模型)82.8063.4153.31
表 8  本研究算法在VEDAI数据集上基于单源遥感数据的目标检测精度对比
影像来源AP50/%mAP/%
carspickupcampingtrucktractorboatvan其他
Visible96.7394.7089.8386.5992.1848.0428.6977.4477.12
Infrared95.5494.3788.1290.2084.1342.7435.7277.2179.69
Visible+Infrared(本研究模型)97.3196.8293.2187.0398.1759.4332.0488.2582.80
表 9  VEDAI数据集上基于单源遥感数据的不同目标类别目标检测精度对比
1 SUN X, TIAN Y, LU W, et al From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy[J]. Science China Information Sciences, 2023, 66 (4): 140301
2 李树涛, 李聪妤, 康旭东 多源遥感图像融合发展现状与未来展望[J]. 遥感学报, 2021, 25 (1): 148- 166
LI Shutao, LI Congyu, KANG Xudong Development status and future prospects of multi-source remote sensing image fusion[J]. National Remote Sensing Bulletin, 2021, 25 (1): 148- 166
doi: 10.11834/jrs.20210259
3 WU Y, GUAN X, ZHAO B, et al Vehicle detection based on adaptive multimodal feature fusion and cross-modal vehicle index using RGB-T images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 8166- 8177
4 GÜNTHER A, NAJJAR H, DENGEL A Explainable multimodal learning in remote sensing: challenges and future directions[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 1- 5
5 ZANG Y, WANG S, GUAN H, et al VAM-Net: vegetation-Attentive deep network for Multi-modal fusion of visible-light and vegetation-sensitive images[J]. International Journal of Applied Earth Observation and Geoinformation, 2024, 127: 103642
6 JIANG C, REN H, YANG H, et al M2FNet: multi-modal fusion network for object detection from visible and thermal infrared images[J]. International Journal of Applied Earth Observation and Geoinformation, 2024, 130: 103918
7 KULKARNI S C, REGE P P Pixel level fusion techniques for SAR and optical images: a review[J]. Information Fusion, 2020, 59: 13- 29
8 WU J, HAO F, LIANG W, et al Transformer fusion and pixel-level contrastive learning for RGB-D salient object detection[J]. IEEE Transactions on Multimedia, 2023, 26: 1011- 1026
9 FENG P, LIN Y, GUAN J, et al. Embranchment cnn based local climate zone classification using sar and multispectral remote sensing data [C]// IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE, 2019: 6344–6347.
10 曹琼, 马爱龙, 钟燕飞, 等 高光谱-LiDAR多级融合城区地表覆盖分类[J]. 遥感学报, 2019, 23 (5): 892- 903
CAO Qiong, MA Ailong, ZHONG Yanfei, et al Urban classification by multi-feature fusion of hyperspectral image and LiDAR data[J]. Journal of Remote Sensing, 2019, 23 (5): 892- 903
11 LI W, GAO Y, ZHANG M, et al Asymmetric feature fusion network for hyperspectral and SAR image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34 (10): 8057- 8070
12 YE Y, ZHANG J, ZHOU L, et al Optical and SAR image fusion based on complementary feature decomposition and visual saliency features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1- 15
13 LI L, HAN L, DING M, et al Multimodal image fusion framework for end-to-end remote sensing image registration[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1- 14
14 董红召, 林少轩, 佘翊妮 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报: 工学版, 2025, 59 (2): 249- 260
DONG Hongzhao, LIN Shaoxuan, SHE Yini Research progress of YOLO detection technology for traffic object[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (2): 249- 260
15 宋耀莲, 王粲, 李大焱, 等 基于改进YOLOv5s的无人机小目标检测算法[J]. 浙江大学学报: 工学版, 2024, 58 (12): 2417- 2426
SONG Yaolian, WANG Can, LI Dayan, et al UAV small target detection algorithm based on improved YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (12): 2417- 2426
16 ZHANG J, LEI J, XIE W, et al SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1- 15
17 LI W, LI A, KONG X, et al. MF-YOLO: multimodal fusion for remote sensing object detection based on YOLOv5s [C]// 27th International Conference on Computer Supported Cooperative Work in Design. Tianjin: IEEE, 2024: 897–903.
18 ULTRALYTICS. YOLOv5 [EB/OL]. (2024−04−01) [2025−01−16]. https://github.com/ultralytics/yolov5.
19 MA X, DAI X, BAI Y, et al. Rewrite the Stars [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 5694–5703.
20 ZHENG M, SUN L, DONG J, et al. SMFANet: a lightweight self-modulation feature aggregation network for efficient image super-resolution [C]// European Conference on Computer Vision. Cham: Springer, 2024: 359–375.
21 SHEN J, CHEN Y, LIU Y, et al ICAFusion: iterative cross-attention guided feature fusion for multispectral object detection[J]. Pattern Recognition, 2024, 145: 109913
22 BOCHKOVSKIY A, WANG C, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020−04−23) [2024−12−11]. https://arxiv.org/ abs/2004.10934.
23 RAZAKARIVONY S, JURIE F Vehicle detection in aerial imagery: a small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 34: 187- 203
24 FLIR ADA Team. FREE Teledyne FLIR Thermal Dataset for Algorithm Training [EB/OL]. (2024−05−01) [2025−01−21]. https://www.flir.com/oem/adas/adasdatasetform/.
25 HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1037–1045.
26 SHARMA M, DHANARAJ M, KARNAM S, et al YOLOrs: object detection in multimodal remote sensing imagery[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 14: 1497- 1508
27 FANG Q, WANG Z Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery[J]. Pattern Recognition, 2022, 130: 108786
28 DEVAGUPTAPU C, AKOLEKAR N, SHARMA M M, et al. Borrow from anywhere: pseudo multi-modal object detection in thermal imagery [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach: IEEE, 2019: 1029–1038.
29 ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks [C]// IEEE International Conference on Image Processing. Abu Dhabi: IEEE, 2020: 276–280.
30 KIEU M, BAGDANOV A D, BERTINI M Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17 (1): 1- 19
31 ZHOU K, CHEN L, CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems [C]// Computer Vision–ECCV 2020: 16th European Conference. Cham: Springer, 2020: 787–803.
32 KIM J, KIM H, KIM T, et al MLPD: multi-label pedestrian detector in multispectral domain[J]. IEEE Robotics and Automation Letters, 2021, 6 (4): 7846- 7853
[1] 董超群,汪战,廖平,谢帅,荣玉杰,周靖淞. 轻量化YOLOv5s-OCG的轨枕裂纹检测算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1838-1845.
[2] 周著国,鲁玉军,吕利叶. 基于改进YOLOv5s的印刷电路板缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1608-1616.
[3] 翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[4] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[5] 魏新雨,饶蕾,范光宇,陈年生,程松林,杨定裕. 用于无人机遥感图像的高精度实时语义分割网络[J]. 浙江大学学报(工学版), 2025, 59(7): 1411-1420.
[6] 徐慧智,王秀青. 基于车辆图像特征的前车距离与速度感知[J]. 浙江大学学报(工学版), 2025, 59(6): 1219-1232.
[7] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[8] 刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[9] 姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.
[10] 梁礼明,龙鹏威,金家新,李仁杰,曾璐. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522.
[11] 王浚银,文斌,沈艳军,张俊,王子豪. 基于改进YOLOv7-tiny的铝型材表面缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(3): 523-534.
[12] 傅幼萍,张航,厉梦菡,孟濬. 基于脉搏波信号多维度特征的身份识别[J]. 浙江大学学报(工学版), 2025, 59(3): 566-576.
[13] 王博特,王卿,刘强,金波. 基于多通道振动主元特征的风电机组叶片自监督异常识别方法[J]. 浙江大学学报(工学版), 2025, 59(3): 653-660.
[14] 董红召,林少轩,佘翊妮. 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报(工学版), 2025, 59(2): 249-260.
[15] 何永福,谢世维,于佳禄,陈思宇. 考虑跨层特征融合的抛洒风险车辆检测方法[J]. 浙江大学学报(工学版), 2025, 59(2): 300-309.