|
|
High-precision real-time semantic segmentation network for UAV remote sensing images |
Xinyu WEI1( ),Lei RAO1,*( ),Guangyu FAN1,Niansheng CHEN1,Songlin CHENG1,Dingyu YANG2 |
1. School of Electronic Information Engineering, Shanghai Dianji University, Shanghai 201306, China 2. State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou 310058, China |
|
|
Abstract A shared shallow feature network (SSFNet) was proposed to address the issues of low inference efficiency and poor segmentation performance in semantic segmentation models for UAV images. The detail branch shared the 1/4 and 1/8 stages of downsampling in the semantic branch, simplifying the downsampling process of the detail branch and improving inference efficiency. In the semantic branch, a stacked connection approach was combined with a channel decomposition mechanism to construct an efficient receptive field block (ERFB), enhancing multi-scale feature extraction with minimal additional inference cost. To integrate contextual information within the semantic branch, a fast aggregation of context (FAC) module was proposed, and a gated mechanism was utilized to supplement the semantics of the final phase during the 1/16 and 1/32 downsampling stages. During the decoding phase, a bilateral fusion module (BFM) was constructed using a hybrid activation function to fully integrate detail and semantic information. Results show that SSFNet achieves mean intersection over union scores of 68.5%, 52.7%, and 87.1% on the UAVid, LoveDA, and Potsdam datasets, respectively. SSFNet achieves an inference speed of 131.1 frames per second at a 1 024×1 024 input resolution on an NVIDIA RTX 3090 GPU, indicating strong real-time segmentation performance.
|
Received: 25 June 2024
Published: 25 July 2025
|
|
Fund: 国家自然科学基金资助项目(61702320);上海市晨光计划(15CG62). |
Corresponding Authors:
Lei RAO
E-mail: 226003010214@sdju.edu.cn;raol@sdju.edu.cn
|
用于无人机遥感图像的高精度实时语义分割网络
用于无人机图像的语义分割模型存在推理效率低和分割效果差的问题,为此提出共享浅层特征网络(SSFNet). 细节分支共享语义分支下采样时的1/4和1/8阶段,简化细节分支的下采样阶段,提高推理效率. 在语义分支部分,提出基于通道分解和堆叠连接的高效感受野模块(ERFB),在几乎不增加推理成本的情况下提高多尺度特征的提取能力. 为了整合语义分支中的上下文信息,提出快速聚合上下文(FAC)模块,利用门控机制控制下采样时的1/16和1/32阶段为最终阶段的语义补充信息. 在解码阶段,利用混合激活函数构建双边融合模块(BFM)以充分融合细节和语义信息. 结果表明,SSFNet在UAVid、LoveDA和Potsdam数据集上的平均交并比分别为68.5%、52.7%和87.1%;在NVIDIA RTX 3090 GPU输入分辨率为1 024×1 024的情况下,SSFNet的推理速度达到131.1 帧/s,实时分割效果良好.
关键词:
实时语义分割,
无人机图像,
遥感图像,
卷积神经网络,
多尺度特征
|
|
[1] |
LI R, ZHENG S, DUAN C, et al Land cover classification from remote sensing images based on multi-scale fully convolutional network[J]. Geo-spatial Information Science, 2022, 25 (2): 278- 294
doi: 10.1080/10095020.2021.2017237
|
|
|
[2] |
SHI W, ZHANG M, KE H, et al Landslide recognition by deep convolutional neural network and change detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59 (6): 4654- 4672
doi: 10.1109/TGRS.2020.3015826
|
|
|
[3] |
GRIFFITHS D, BOEHM J Improving public data for building segmentation from convolutional neural networks (CNNs) for fused airborne lidar and image data using active contours[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 154: 70- 83
doi: 10.1016/j.isprsjprs.2019.05.013
|
|
|
[4] |
刘毅, 陈一丹, 高琳, 等 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报: 工学版, 2024, 58 (5): 951- 959 LIU Yi, CHEN Yidan, GAO Lin, et al Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (5): 951- 959
|
|
|
[5] |
ZHU X X, TUIA D, MOU L, et al Deep learning in remote sensing: a comprehensive review and list of resources[J]. IEEE Geoscience and Remote Sensing Magazine, 2017, 5 (4): 8- 36
doi: 10.1109/MGRS.2017.2762307
|
|
|
[6] |
吴泽康, 赵姗, 李宏伟, 等 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报: 工学版, 2022, 56 (4): 795- 802 WU Zekang, ZHAO Shan, LI Hongwei, et al Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 795- 802
|
|
|
[7] |
LECUN Y, BENGIO Y, HINTON G Deep learning[J]. Nature, 2015, 521 (7553): 436- 444
doi: 10.1038/nature14539
|
|
|
[8] |
LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431–3440.
|
|
|
[9] |
ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230–6239.
|
|
|
[10] |
CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848
doi: 10.1109/TPAMI.2017.2699184
|
|
|
[11] |
CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 833–851.
|
|
|
[12] |
SHI X, HUANG H, PU C, et al CSA-UNet: channel-spatial attention-based encoder–decoder network for rural blue-roofed building extraction from UAV imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6514405
|
|
|
[13] |
XU R, WANG C, ZHANG J, et al RSSFormer: foreground saliency enhancement for remote sensing land-cover segmentation[J]. IEEE Transactions on Image Processing, 2023, 32: 1052- 1064
doi: 10.1109/TIP.2023.3238648
|
|
|
[14] |
YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 334–349.
|
|
|
[15] |
YU C, GAO C, WANG J, et al BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068
doi: 10.1007/s11263-021-01515-2
|
|
|
[16] |
WANG L, LI R, ZHANG C, et al UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196- 214
doi: 10.1016/j.isprsjprs.2022.06.008
|
|
|
[17] |
WADEKAR S N, CHAURASIA A. MobileViTv3: mobile-friendly vision transformer with simple and effective fusion of local, global and input features [EB/OL]. (2022–10–06) [2024–08–31]. https://arxiv.org/abs/2209.15159.
|
|
|
[18] |
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
|
|
|
[19] |
LYU Y, VOSSELMAN G, XIA G S, et al UAVid: a semantic segmentation dataset for UAV imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 165: 108- 119
doi: 10.1016/j.isprsjprs.2020.05.009
|
|
|
[20] |
WANG J, ZHENG Z, MA A, et al. LoveDA: a remote sensing land-cover dataset for domain adaptive semantic segmentation [EB/OL]. (2021–10–17)[2024–08–31]. https://arxiv.org/abs/2110.08733.
|
|
|
[21] |
LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection [C]// Computer Vision – ECCV 2018. [S.l.]: Springer, 2018: 404–419.
|
|
|
[22] |
WANG L, LI R, WANG D, et al Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13 (16): 3065
doi: 10.3390/rs13163065
|
|
|
[23] |
XU W, XU Y, CHANG T, et al. Co-scale conv-attentional image transformers [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9961–9970.
|
|
|
[24] |
STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: transformer for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 7242–7252.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|