Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (7): 1411-1420    DOI: 10.3785/j.issn.1008-973X.2025.07.009
计算机技术与控制工程     
用于无人机遥感图像的高精度实时语义分割网络
魏新雨1(),饶蕾1,*(),范光宇1,陈年生1,程松林1,杨定裕2
1. 上海电机学院 电子信息学院,上海 201306
2. 浙江大学 区块链与数据安全全国重点实验室,浙江 杭州 310058
High-precision real-time semantic segmentation network for UAV remote sensing images
Xinyu WEI1(),Lei RAO1,*(),Guangyu FAN1,Niansheng CHEN1,Songlin CHENG1,Dingyu YANG2
1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China
2. State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310058, China
 全文: PDF(2440 KB)   HTML
摘要:

用于无人机图像的语义分割模型存在推理效率低和分割效果差的问题,为此提出共享浅层特征网络(SSFNet). 细节分支共享语义分支下采样时的1/4和1/8阶段,简化细节分支的下采样阶段,提高推理效率. 在语义分支部分,提出基于通道分解和堆叠连接的高效感受野模块(ERFB),在几乎不增加推理成本的情况下提高多尺度特征的提取能力. 为了整合语义分支中的上下文信息,提出快速聚合上下文(FAC)模块,利用门控机制控制下采样时的1/16和1/32阶段为最终阶段的语义补充信息. 在解码阶段,利用混合激活函数构建双边融合模块(BFM)以充分融合细节和语义信息. 结果表明,SSFNet在UAVid、LoveDA和Potsdam数据集上的平均交并比分别为68.5%、52.7%和87.1%;在NVIDIA RTX 3090 GPU输入分辨率为1 024×1 024的情况下,SSFNet的推理速度达到131.1 帧/s,实时分割效果良好.

关键词: 实时语义分割无人机图像遥感图像卷积神经网络多尺度特征    
Abstract:

A shared shallow feature network (SSFNet) was proposed to address the issues of low inference efficiency and poor segmentation performance in semantic segmentation models for UAV images. The detail branch shared the 1/4 and 1/8 stages of downsampling in the semantic branch, simplifying the downsampling process of the detail branch and improving inference efficiency. In the semantic branch, a stacked connection approach was combined with a channel decomposition mechanism to construct an efficient receptive field block (ERFB), enhancing multi-scale feature extraction with minimal additional inference cost. To integrate contextual information within the semantic branch, a fast aggregation of context (FAC) module was proposed, and a gated mechanism was utilized to supplement the semantics of the final phase during the 1/16 and 1/32 downsampling stages. During the decoding phase, a bilateral fusion module (BFM) was constructed using a hybrid activation function to fully integrate detail and semantic information. Results show that SSFNet achieves mean intersection over union scores of 68.5%, 52.7%, and 87.1% on the UAVid, LoveDA, and Potsdam datasets, respectively. SSFNet achieves an inference speed of 131.1 frames per second at a 1 024×1 024 input resolution on an NVIDIA RTX 3090 GPU, indicating strong real-time segmentation performance.

Key words: real-time semantic segmentation    UAV images    remote sensing images    convolutional neural networks    multi-scale features.
收稿日期: 2024-06-25 出版日期: 2025-07-25
CLC:  TP 391.41  
基金资助: 国家自然科学基金资助项目(61702320);上海市晨光计划(15CG62).
通讯作者: 饶蕾     E-mail: 226003010214@sdju.edu.cn;raol@sdju.edu.cn
作者简介: 魏新雨(2000—),男,硕士生,从事计算机视觉和图像处理研究. orcid.org/0009-0004-2085-2451. E-mail:226003010214@sdju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
魏新雨
饶蕾
范光宇
陈年生
程松林
杨定裕

引用本文:

魏新雨,饶蕾,范光宇,陈年生,程松林,杨定裕. 用于无人机遥感图像的高精度实时语义分割网络[J]. 浙江大学学报(工学版), 2025, 59(7): 1411-1420.

Xinyu WEI,Lei RAO,Guangyu FAN,Niansheng CHEN,Songlin CHENG,Dingyu YANG. High-precision real-time semantic segmentation network for UAV remote sensing images. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1411-1420.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.07.009        https://www.zjujournals.com/eng/CN/Y2025/V59/I7/1411

图 1  不同语义分割网络架构对比
图 2  共享浅层特征网络的整体架构
图 3  不同模块的图像处理过程
图 4  双边融合模块和加法操作的可视化特征图对比
数据集SSFBFMFACERFBmIoU/%vf/(帧·s?1)
UAVid65.9122.3
66.1142.9
67.1140.1
68.3136.2
70.6131.1
LoveDA49.4122.3
49.5142.9
50.3140.1
51.2136.2
52.3131.1
表 1  不同验证集上的模块消融实验结果
Sigmoid所在分支GELU所在分支mIoU/%
69.7
70.2
70.3
70.6
表 2  双边融合模块中各个分支的消融实验
ASPPERFBmIoU/%Np/106vf/(帧·s?1)
68.312.9136.2
70.313.6124.5
70.613.1131.1
表 3  UAVid验证集上感受野模块的消融实验结果
算法骨干网络Np/106vf/(帧·s?1)
FCN 8SVgg1668.586.0
BiSeNetResNet1812.9121.9
PSPNetResNet5047.152.2
DeepLab V3+ResNet5041.453.7
BANetResT-Lite15.567.7
CoaTResNet5030.210.6
SegFormerMiT-B113.731.3
DC-SwinSwin-Tiny45.623.6
UNetFormerResNet1811.7115.3
MobileViT V3MobileViT V3-1.013.687.9
RSSFormerRSS-Base30.828.5
SSFNetResNet1813.1131.1
表 4  分割算法的参数量和推理速度比较
图 5  UAVid数据集上不同算法的图像分割效果可视化比较
算法IoUmIoU
杂物建筑物道路树木植被移动车辆静止车辆行人
FCN 8S63.684.676.177.660.262.347.114.560.8
BiSeNet64.785.777.378.361.163.448.617.562.1
PSPNet65.485.779.579.261.572.649.419.464.1
DeepLab V3+65.386.180.078.160.371.449.121.864.0
BANet66.785.480.778.962.169.352.821.064.6
CoaT69.088.580.079.362.070.059.118.965.8
SegFormer66.686.380.179.662.372.552.528.566.0
DC-Swin67.586.580.480.161.972.254.327.366.3
UNetFormer68.487.481.580.263.573.656.431.067.8
MobileViT V367.787.381.480.163.673.854.829.767.3
RSSFormer67.286.881.179.963.372.558.430.867.8
SSFNet68.788.481.580.563.977.156.731.568.5
表 5  UAVid测试集上分割算法的性能比较
图 6  LoveDA数据集上不同算法的图像分割效果可视化比较
算法IoUmIoU
背景建筑物道路水体荒地森林农业
FCN 8S42.649.548.173.111.843.558.346.7
BiSeNet47.2
PSPNet44.452.153.576.59.744.157.948.3
DeepLab V3+43.050.952.074.410.444.258.547.6
BANet43.751.551.176.916.644.962.549.6
CoaT49.9
SegFormer50.4
DC-Swin41.354.556.278.114.547.262.450.6
UNetFormer44.758.854.979.620.146.062.552.4
MobileViT V343.060.156.981.317.648.156.151.9
RSSFormer52.460.755.276.318.745.458.352.4
SSFNet45.657.456.981.518.445.563.452.7
表 6  LoveDA测试集上分割算法的性能比较
图 7  Potsdam数据集上不同算法的图像分割效果可视化比较
算法IoUmIoU
不透水表面建筑物低植被树木车辆
FCN 8S86.191.576.377.490.784.4
BiSeNet87.291.776.978.991.085.1
PSPNet87.992.177.179.791.585.7
DeepLab V3+87.792.277.479.491.285.5
BANet88.992.778.279.891.486.2
CoaT88.592.978.580.491.986.1
SegFormer87.992.678.879.991.686.0
DC-Swin88.793.378.080.192.086.5
UNetFormer89.093.579.280.591.986.8
MobileViT V388.993.779.080.292.186.8
RSSFormer89.193.779.880.491.786.9
SSFNet89.393.979.180.992.287.1
表 7  Potsdam测试集上分割算法的性能比较
设备CUDA数量显存位宽/bvf/(帧·s?1)
RTX 309010 496384131.1
RTX 30603 58419247.6
表 8  共享浅层特征网络在不同硬件环境下的推理速度
1 LI R, ZHENG S, DUAN C, et al Land cover classification from remote sensing images based on multi-scale fully convolutional network[J]. Geo-spatial Information Science, 2022, 25 (2): 278- 294
doi: 10.1080/10095020.2021.2017237
2 SHI W, ZHANG M, KE H, et al Landslide recognition by deep convolutional neural network and change detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59 (6): 4654- 4672
doi: 10.1109/TGRS.2020.3015826
3 GRIFFITHS D, BOEHM J Improving public data for building segmentation from convolutional neural networks (CNNs) for fused airborne lidar and image data using active contours[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 154: 70- 83
doi: 10.1016/j.isprsjprs.2019.05.013
4 刘毅, 陈一丹, 高琳, 等 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报: 工学版, 2024, 58 (5): 951- 959
LIU Yi, CHEN Yidan, GAO Lin, et al Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (5): 951- 959
5 ZHU X X, TUIA D, MOU L, et al Deep learning in remote sensing: a comprehensive review and list of resources[J]. IEEE Geoscience and Remote Sensing Magazine, 2017, 5 (4): 8- 36
doi: 10.1109/MGRS.2017.2762307
6 吴泽康, 赵姗, 李宏伟, 等 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报: 工学版, 2022, 56 (4): 795- 802
WU Zekang, ZHAO Shan, LI Hongwei, et al Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 795- 802
7 LECUN Y, BENGIO Y, HINTON G Deep learning[J]. Nature, 2015, 521 (7553): 436- 444
doi: 10.1038/nature14539
8 LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431–3440.
9 ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230–6239.
10 CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848
doi: 10.1109/TPAMI.2017.2699184
11 CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 833–851.
12 SHI X, HUANG H, PU C, et al CSA-UNet: channel-spatial attention-based encoder–decoder network for rural blue-roofed building extraction from UAV imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6514405
13 XU R, WANG C, ZHANG J, et al RSSFormer: foreground saliency enhancement for remote sensing land-cover segmentation[J]. IEEE Transactions on Image Processing, 2023, 32: 1052- 1064
doi: 10.1109/TIP.2023.3238648
14 YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 334–349.
15 YU C, GAO C, WANG J, et al BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068
doi: 10.1007/s11263-021-01515-2
16 WANG L, LI R, ZHANG C, et al UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196- 214
doi: 10.1016/j.isprsjprs.2022.06.008
17 WADEKAR S N, CHAURASIA A. MobileViTv3: mobile-friendly vision transformer with simple and effective fusion of local, global and input features [EB/OL]. (2022–10–06) [2024–08–31]. https://arxiv.org/abs/2209.15159.
18 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
19 LYU Y, VOSSELMAN G, XIA G S, et al UAVid: a semantic segmentation dataset for UAV imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 165: 108- 119
doi: 10.1016/j.isprsjprs.2020.05.009
20 WANG J, ZHENG Z, MA A, et al. LoveDA: a remote sensing land-cover dataset for domain adaptive semantic segmentation [EB/OL]. (2021–10–17)[2024–08–31]. https://arxiv.org/abs/2110.08733.
21 LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 404–419.
22 WANG L, LI R, WANG D, et al Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13 (16): 3065
doi: 10.3390/rs13163065
23 XU W, XU Y, CHANG T, et al. Co-scale conv-attentional image transformers [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9961–9970.
24 STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: transformer for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 7242–7252.
[1] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.
[2] 张梦瑶,周杰,李文婷,赵勇. 结合全局信息和局部信息的三维网格分割框架[J]. 浙江大学学报(工学版), 2025, 59(5): 912-919.
[3] 胡明志,孙俊,杨彪,常开荣,杨俊龙. 基于CNN和Transformer聚合的遥感图像超分辨率重建[J]. 浙江大学学报(工学版), 2025, 59(5): 938-946.
[4] 钱新宇,谢清林,陶功权,温泽峰. 基于多结构数据驱动的车轮扁疤定量识别方法[J]. 浙江大学学报(工学版), 2025, 59(4): 688-697.
[5] 张振利,胡新凯,李凡,冯志成,陈智超. 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报(工学版), 2025, 59(4): 778-786.
[6] 刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[7] 杨冰,徐楚阳,姚金良,向学勤. 基于单目RGB图像的三维手部姿态估计方法[J]. 浙江大学学报(工学版), 2025, 59(1): 18-26.
[8] 赖凌轩,柳景青,周一粟,李秀娟. 基于时频卷积神经网络的供水管道漏损识别[J]. 浙江大学学报(工学版), 2025, 59(1): 196-204.
[9] 刘欢,李云红,张蕾涛,郭越,苏雪平,朱耀麟,侯乐乐. 基于MA-ConvNext网络和分步关系知识蒸馏的苹果叶片病害识别[J]. 浙江大学学报(工学版), 2024, 58(9): 1757-1767.
[10] 王海军,王涛,俞慈君. 基于递归量化分析的CFRP超声检测缺陷识别方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1604-1617.
[11] 李劲业,李永强. 融合知识图谱的时空多图卷积交通流量预测[J]. 浙江大学学报(工学版), 2024, 58(7): 1366-1376.
[12] 韩康,战洪飞,余军合,王瑞. 基于空洞卷积和增强型多尺度特征自适应融合的滚动轴承故障诊断[J]. 浙江大学学报(工学版), 2024, 58(6): 1285-1295.
[13] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[14] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[15] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.