Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (12): 2545-2555    DOI: 10.3785/j.issn.1008-973X.2025.12.009
    
Object detection for multi-source remote sensing fused images based on depthwise separable convolution
Jianghao CHEN1,2,3(),Jun YANG1,2,3,4,*()
1. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China
2. National and Local Joint Engineering Research Center of Geographical Monitoring Technology Application, Lanzhou 730070, China
3. Gansu Provincial Engineering Laboratory of Geographical Monitoring, Lanzhou 730070, China
4. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
Download: HTML     PDF(2620KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multi-source remote sensing image fusion and object detection network based on improved depthwise separable convolution and a multi-scale feature extraction module was proposed to address the limitation of convolutional downsampling in feature extraction and the problem of traditional feature-level fusion methods failing to fully leverage the complementary advantages of multi-source remote sensing data. A dual-branch separable convolution module was designed to enhance deep semantic feature representation through depthwise convolution and residual connections, thereby improving discriminative performance under complex backgrounds. Furthermore, a global-local adaptive feature fusion module was constructed, where feature maps were decomposed into different dimensional components using separable convolution to capture global structures and local details separately. These features were then fused via an adaptive mechanism to achieve cross-source information complementarity and multi-scale feature collaboration. Experiments on the VEDAI multi-source dataset demonstrated that the proposed method achieved a mean average precision (mAP) of 82.80%, which was 2.00 percentage points higher than that of ICAfusion, while also outperforming YOLOrs, YOLOfusion, SuperYOLO, and MF-YOLO. The network shows high effectiveness in feature-level fusion of multi-source remote sensing images and yields significant performance improvements in object detection tasks.



Key wordsmulti-source remote sensing image      feature extraction      feature-level fusion      depthwise separable convolution      multi-scale feature      object detection     
Received: 23 January 2025      Published: 25 November 2025
CLC:  TP 751.1  
Fund:  国家自然科学基金资助项目(42261067);2025年度甘肃省重点人才资助项目(2025RCXM031).
Corresponding Authors: Jun YANG     E-mail: 11220897@stu.lzjtu.edu.cn;yangj@mail.lzjtu.cn
Cite this article:

Jianghao CHEN,Jun YANG. Object detection for multi-source remote sensing fused images based on depthwise separable convolution. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2545-2555.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.12.009     OR     https://www.zjujournals.com/eng/Y2025/V59/I12/2545


结合深度可分离卷积的多源遥感融合影像目标检测

针对卷积下采样在遥感影像处理中特征提取能力不足,以及传统特征级融合方法未能充分发挥多源遥感数据互补优势的问题,提出结合改进深度可分离卷积与多尺度特征提取模块的多源遥感融合影像目标检测网络. 设计双分支可分离卷积模块,通过深度卷积与残差连接增强深层语义特征表达,提升复杂背景下的判别性能. 构建全局-局部自适应特征融合模块,利用分离卷积将特征图拆分为不同维度分量,分别捕获全局结构与局部细节,再通过自适应机制进行融合,实现跨源影像信息互补与多尺度特征协同. 实验在VEDAI多源数据集上验证,平均检测精度达到82.80%,较ICAfusion提升2.00个百分点,在与YOLOrs、YOLOfusion、SuperYOLO、MF-YOLO等方法对比中保持更优表现. 所提网络在多源遥感影像特征级融合方面展现出较高有效性,在目标检测任务中取得显著性能提升.


关键词: 多源遥感影像,  特征提取,  特征级融合,  深度可分离卷积,  多尺度特征,  目标检测 
Fig.1 Overall network framework
Fig.2 Dual branch separable convolution module
Fig.3 Global-local adaptive feature fusion module
类别AP50/%AP75/%
cars97.3174.72
pickup96.8278.62
camping93.2176.28
truck87.0377.13
tractor98.1755.31
boat59.4348.67
van32.0428.91
其他88.2559.53
mAP82.8063.41
Tab.1 Object detection accuracy of proposed algorithm on VEDAI dataset
Fig.4 Comparison of object detection visualization results between proposed algorithm and ICAfusion on VEDAI dataset
Fig.5 Comparison of object detection heatmap visualization results between proposed algorithm and ICAfusion on VEDAI dataset
类别mAP50/%GFLOPs/106Params/106FPS/Hz
YOLOrs[26]58.9746.420.223.9
YOLOfusion[27]78.6027.312.518.2
SuperYOLO[16]79.494.816.612.7
MF-YOLO[17]76.62
ICAfusion[21]80.8058.2120.228.4
本研究算法82.8060.7139.330.2
Tab.2 Comparison of object detection results between proposed algorithm and state of arts on VEDAI dataset
类别AP50/%
carspickupcampingtrucktractorboatvan其他
YOLOrs[26]83.4876.9665.6953.5169.0722.2856.8843.88
YOLOfusion[27]91.7285.9178.9478.1571.9671.1475.2354.77
SuperYOLO[16]91.6186.8079.2589.3386.3954.2681.5168.79
MF-YOLO[17]92.0386.6178.1972.5882.8864.6478.6657.36
ICAfusion[21]97.0596.2189.6492.6694.5064.5328.3383.40
本研究算法97.3196.8293.2187.0398.1759.4332.0488.25
Tab.3 Comparison of object detection accuracy between proposed algorithm and state of arts on VEDAI dataset
类别mAP50/%GFLOPs/106Params/106FPS/Hz
MMTOD-UNIT[28]61.50
CFR[29]72.40
BU-LTT[30]73.2073.5149.328.0
CFT[27]78.30224.4206.032.3
ICAfusion[21]79.2058.2120.227.7
本研究算法80.1660.7139.327.9
Tab.4 Comparison of object detection accuracy between proposed algorithm and state of arts on FLIR dataset
类别MR/%FPS/Hz
MBNet[31]8.4014.3
MLPD[32]7.58
MSDS-RCNN[33]8.234.6
ICAfusion[21]7.1738.9
本研究算法7.1434.3
Tab.5 Comparison of object detection accuracy between proposed algorithm and state of arts on KAIST dataset
模型mAP50/%mAP75/%mAP50:95/%
Baseline(模型1)80.8054.1348.33
Baseline+DBSConv(模型2)81.5354.1950.44
Baseline+GLAFF(模型3)81.0455.2849.14
Baseline+DBSConv+GLAFF
(本研究模型)
82.8063.4153.31
Tab.6 Comparison of object detection accuracy of ablation study on VEDAI dataset
模块AP50/%mAP/%
carspickupcampingtrucktractorboatvan其他
Baseline(模型1)97.0596.2189.6492.6694.5064.5328.3383.4080.80
Baseline+DBSConv(模型2)96.0395.1792.3396.0494.0645.4140.3483.3381.53
Baseline+GLAFF(模型3)96.9195.9389.1396.6291.9753.0128.8988.2181.04
Baseline+DBSConv+GLAFF(本研究模型)97.3196.8293.2187.0398.1759.4332.0488.2582.80
Tab.7 Comparison of object detection accuracy for different objects of ablation study on VEDAI dataset
影像来源mAP50/%mAP75/%mAP50:95/%
Visible79.6958.2249.51
Infrared77.1255.8448.90
Visible+Vnfrared(本研究模型)82.8063.4153.31
Tab.8 Comparison of object detection accuracy based on single-source remote sensing data on VEDAI dataset
影像来源AP50/%mAP/%
carspickupcampingtrucktractorboatvan其他
Visible96.7394.7089.8386.5992.1848.0428.6977.4477.12
Infrared95.5494.3788.1290.2084.1342.7435.7277.2179.69
Visible+Infrared(本研究模型)97.3196.8293.2187.0398.1759.4332.0488.2582.80
Tab.9 Comparison of object detection accuracy for different objects based on single-source remote sensing data on VEDAI dataset
[1]   SUN X, TIAN Y, LU W, et al From single- to multi-modal remote sensing imagery interpretation: a survey and taxonomy[J]. Science China Information Sciences, 2023, 66 (4): 140301
[2]   李树涛, 李聪妤, 康旭东 多源遥感图像融合发展现状与未来展望[J]. 遥感学报, 2021, 25 (1): 148- 166
LI Shutao, LI Congyu, KANG Xudong Development status and future prospects of multi-source remote sensing image fusion[J]. National Remote Sensing Bulletin, 2021, 25 (1): 148- 166
doi: 10.11834/jrs.20210259
[3]   WU Y, GUAN X, ZHAO B, et al Vehicle detection based on adaptive multimodal feature fusion and cross-modal vehicle index using RGB-T images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 8166- 8177
[4]   GÜNTHER A, NAJJAR H, DENGEL A Explainable multimodal learning in remote sensing: challenges and future directions[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 1- 5
[5]   ZANG Y, WANG S, GUAN H, et al VAM-Net: vegetation-Attentive deep network for Multi-modal fusion of visible-light and vegetation-sensitive images[J]. International Journal of Applied Earth Observation and Geoinformation, 2024, 127: 103642
[6]   JIANG C, REN H, YANG H, et al M2FNet: multi-modal fusion network for object detection from visible and thermal infrared images[J]. International Journal of Applied Earth Observation and Geoinformation, 2024, 130: 103918
[7]   KULKARNI S C, REGE P P Pixel level fusion techniques for SAR and optical images: a review[J]. Information Fusion, 2020, 59: 13- 29
[8]   WU J, HAO F, LIANG W, et al Transformer fusion and pixel-level contrastive learning for RGB-D salient object detection[J]. IEEE Transactions on Multimedia, 2023, 26: 1011- 1026
[9]   FENG P, LIN Y, GUAN J, et al. Embranchment cnn based local climate zone classification using sar and multispectral remote sensing data [C]// IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE, 2019: 6344–6347.
[10]   曹琼, 马爱龙, 钟燕飞, 等 高光谱-LiDAR多级融合城区地表覆盖分类[J]. 遥感学报, 2019, 23 (5): 892- 903
CAO Qiong, MA Ailong, ZHONG Yanfei, et al Urban classification by multi-feature fusion of hyperspectral image and LiDAR data[J]. Journal of Remote Sensing, 2019, 23 (5): 892- 903
[11]   LI W, GAO Y, ZHANG M, et al Asymmetric feature fusion network for hyperspectral and SAR image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 34 (10): 8057- 8070
[12]   YE Y, ZHANG J, ZHOU L, et al Optical and SAR image fusion based on complementary feature decomposition and visual saliency features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1- 15
[13]   LI L, HAN L, DING M, et al Multimodal image fusion framework for end-to-end remote sensing image registration[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1- 14
[14]   董红召, 林少轩, 佘翊妮 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报: 工学版, 2025, 59 (2): 249- 260
DONG Hongzhao, LIN Shaoxuan, SHE Yini Research progress of YOLO detection technology for traffic object[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (2): 249- 260
[15]   宋耀莲, 王粲, 李大焱, 等 基于改进YOLOv5s的无人机小目标检测算法[J]. 浙江大学学报: 工学版, 2024, 58 (12): 2417- 2426
SONG Yaolian, WANG Can, LI Dayan, et al UAV small target detection algorithm based on improved YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (12): 2417- 2426
[16]   ZHANG J, LEI J, XIE W, et al SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1- 15
[17]   LI W, LI A, KONG X, et al. MF-YOLO: multimodal fusion for remote sensing object detection based on YOLOv5s [C]// 27th International Conference on Computer Supported Cooperative Work in Design. Tianjin: IEEE, 2024: 897–903.
[18]   ULTRALYTICS. YOLOv5 [EB/OL]. (2024−04−01) [2025−01−16]. https://github.com/ultralytics/yolov5.
[19]   MA X, DAI X, BAI Y, et al. Rewrite the Stars [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 5694–5703.
[20]   ZHENG M, SUN L, DONG J, et al. SMFANet: a lightweight self-modulation feature aggregation network for efficient image super-resolution [C]// European Conference on Computer Vision. Cham: Springer, 2024: 359–375.
[21]   SHEN J, CHEN Y, LIU Y, et al ICAFusion: iterative cross-attention guided feature fusion for multispectral object detection[J]. Pattern Recognition, 2024, 145: 109913
[22]   BOCHKOVSKIY A, WANG C, LIAO H Y. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020−04−23) [2024−12−11]. https://arxiv.org/ abs/2004.10934.
[23]   RAZAKARIVONY S, JURIE F Vehicle detection in aerial imagery: a small target detection benchmark[J]. Journal of Visual Communication and Image Representation, 2016, 34: 187- 203
[24]   FLIR ADA Team. FREE Teledyne FLIR Thermal Dataset for Algorithm Training [EB/OL]. (2024−05−01) [2025−01−21]. https://www.flir.com/oem/adas/adasdatasetform/.
[25]   HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1037–1045.
[26]   SHARMA M, DHANARAJ M, KARNAM S, et al YOLOrs: object detection in multimodal remote sensing imagery[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 14: 1497- 1508
[27]   FANG Q, WANG Z Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery[J]. Pattern Recognition, 2022, 130: 108786
[28]   DEVAGUPTAPU C, AKOLEKAR N, SHARMA M M, et al. Borrow from anywhere: pseudo multi-modal object detection in thermal imagery [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Long Beach: IEEE, 2019: 1029–1038.
[29]   ZHANG H, FROMONT E, LEFEVRE S, et al. Multispectral fusion for object detection with cyclic fuse-and-refine blocks [C]// IEEE International Conference on Image Processing. Abu Dhabi: IEEE, 2020: 276–280.
[30]   KIEU M, BAGDANOV A D, BERTINI M Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17 (1): 1- 19
[31]   ZHOU K, CHEN L, CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems [C]// Computer Vision–ECCV 2020: 16th European Conference. Cham: Springer, 2020: 787–803.
[32]   KIM J, KIM H, KIM T, et al MLPD: multi-label pedestrian detector in multispectral domain[J]. IEEE Robotics and Automation Letters, 2021, 6 (4): 7846- 7853
[1] Chaoqun DONG,Zhan WANG,Ping LIAO,Shuai XIE,Yujie RONG,Jingsong ZHOU. Lightweight YOLOv5s-OCG rail sleeper crack detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1838-1845.
[2] Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.
[3] Jiarui FU,Zhaofei LI,Hao ZHOU,Wei HUANG. Camouflaged object detection based on Convnextv2 and texture-edge guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1718-1726.
[4] Xinyu WEI,Lei RAO,Guangyu FAN,Niansheng CHEN,Songlin CHENG,Dingyu YANG. High-precision real-time semantic segmentation network for UAV remote sensing images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1411-1420.
[5] Huizhi XU,Xiuqing WANG. Perception of distance and speed of front vehicle based on vehicle image features[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1219-1232.
[6] Shenchong LI,Xinhua ZENG,Chuanqu LIN. Multi-task environment perception algorithm for autonomous driving based on axial attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 769-777.
[7] Dengfeng LIU,Wenjing GUO,Shihai CHEN. Content-guided attention-based lane detection network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 451-459.
[8] Liming LIANG,Pengwei LONG,Jiaxin JIN,Renjie LI,Lu ZENG. Steel surface defect detection algorithm based on improved YOLOv8s[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 512-522.
[9] Youping FU,Hang ZHANG,Menghan LI,Jun MENG. Identity recognition based on multi-dimensional features of pulse wave signals[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 566-576.
[10] Hongzhao DONG,Shaoxuan LIN,Yini SHE. Research progress of YOLO detection technology for traffic object[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 249-260.
[11] Yongfu HE,Shiwei XIE,Jialu YU,Siyu CHEN. Detection method for spillage risk vehicle considering cross-level feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 300-309.
[12] Shuo YANG,Xu LOU,Shuo LIU,Jiarui LI,Lei WANG. Analysis of clinical difference in Parkinson’s disease subtype based on non-parametric brain network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(11): 2451-2458.
[13] Bing YANG,Chuyang XU,Jinliang YAO,Xueqin XIANG. 3D hand pose estimation method based on monocular RGB images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 18-26.
[14] Jiayi YU,Qin WU. Monocular 3D object detection based on context information enhancement and depth guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 89-99.
[15] Junjie LIN,Yaguang ZHU,Chunchao LIU,Haoyang LIU. Digital twin system of legged robot for mobile operation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1956-1969.