Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (11): 2270-2279    DOI: 10.3785/j.issn.1008-973X.2024.11.008
    
Remote sensing image semantic segmentation network based on global information extraction and reconstruction
Longxue LIANG1(),Chenglong HE1,Xiaosuo WU1,*(),Haowen YAN2
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. School of Surveying and Mapping and Geographic Information, Lanzhou Jiaotong University, Lanzhou 730070, China
Download: HTML     PDF(2507KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A network for multi-scale attention extraction and global information reconstruction was proposed in order to enhance the segmentation of remote sensing scene images for downstream tasks. A multi-scale convolutional attention backbone was introduced into the remote sensing deep learning semantic segmentation model in the encoder. Multi-scale convolutional attention can capture multi-scale information and provide richer global deep and shallow information to the decoder. A global multi-branch local Transformer block was designed in the decoder. Multi-scale channel-wise striped convolution reconstructed multi-scale spatial context information, compensating for the spatial information fragmentation in the global branch. The global information segmentation map was reconstructed together with global semantic context information. A polarized feature refinement head was designed at the end of the decoder. A combination of softmax and sigmoid was used to construct a probability distribution function on the channel, which fitted a better output distribution, repaired potential high-resolution information loss in shallow layers, guided and integrated deep information. Then fine spatial texture was obtained. The experimental results showed that high accuracy was achieved by the network, with a mean intersection over union (MIoU) of 82.9% on the ISPRS Vaihingen dataset and 87.1% on the ISPRS Potsdam dataset.



Key wordssemantic segmentation      Transformer      multi-scale convolutional attention      global multi-branch local attention      global information reconstruction     
Received: 29 August 2023      Published: 23 October 2024
CLC:  TP 751  
Fund:  国家重点研发计划资助项目(2022YFB3903604);甘肃省自然科学基金资助项目(21JR7RA310);兰州交通大学青年科学基金资助项目(2021029).
Corresponding Authors: Xiaosuo WU     E-mail: 1367194087@qq.com;wuxs_laser@lzjtu.edu.cn
Cite this article:

Longxue LIANG,Chenglong HE,Xiaosuo WU,Haowen YAN. Remote sensing image semantic segmentation network based on global information extraction and reconstruction. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2270-2279.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.11.008     OR     https://www.zjujournals.com/eng/Y2024/V58/I11/2270


全局信息提取与重建的遥感图像语义分割网络

为了将遥感场景图像更好地进行分割,供给下游任务使用,提出多尺度注意力提取与全局信息重建网络. 编码器引入多尺度卷积注意力骨干到遥感深度学习语义分割模型中. 多尺度卷积注意力能够捕获多尺度信息,给解码器提供更丰富的全局深浅层信息. 在解码器,设计了全局多分支局部Transformer块. 多尺度逐通道条带卷积重建多尺度空间上下文信息,弥补全局分支存在的空间信息割裂,与全局语义上下文信息共同重建全局信息分割图. 解码器末端设计极化特征精炼头. 通道上利用softmax和sigmoid组合,构建概率分布函数,拟合更好的输出分布,修复浅层中潜在的高分辨率信息损失,指导和融合深层信息,获得精细的空间纹理. 实验结果表明,网络实现了很高的精确度,在ISPRS Vaihingen数据集上达到82.9%的平均交并比,在ISPRS Potsdam数据集上达到87.1%的平均交并比.


关键词: 语义分割,  Transformer,  多尺度卷积注意力,  全局多分支局部注意力,  全局信息重建 
Fig.1 Overall network architecture diagram of Transformer for multi-scale attention extraction and global information reconstruction
Fig.2 Network architecture diagram of multi scale convolutional attention module
Fig.3 Overall network architecture diagram of global multi branch local Transformer block
Fig.4 Overall network architecture diagram of polarization feature refining head
数据集方法OA/%F1mean/%mIoU/%

Vaihingen
Baseline89.3289.3080.87
Baseline+MSCAN89.7689.4681.12
Baseline+MSCAN+GMSLTB90.7390.3182.74
Baseline+ MSCAN+GMSLTB+PFRH90.8890.5482.94
Tab.1 Ablation study of each component of GMSLTransFormer
Fig.5 Comparison of visualization results of model ablation experiments on Vaihingen dataset
方法骨干C/MBNp/106Nf/109F1mean/%OA/%mIoU/%
DANet(2019)Resnet182024.912.6120.2490.790.283.1
BANet(2021)ResTLi3248.012.729.3892.190.685.6
ABCNet(2021)Resnet181573.214.062.1692.290.885.8
MANet(2021)Resnet182091.612.088.2592.490.886.1
UnetFormer(2022)Resnet181481.711.711.6792.791.186.6
MAResUNet(2022)Resnet18638.5116.225.2992.791.386.7
DCswin(2022)Swin-tiny4265.945.689.3092.991.386.9
MAGIFormerMSCAN_tiny5015.313.962.7093.091.487.1
Tab.2 Comparison results on Potsdam test set with state-of-art remote sensing semantic segmentation network
方法骨干F1/%F1mean/%OA/%mIoU/%
不可渗透建筑低矮植被
DANet(2019)Resnet1892.396.086.688.490.390.790.283.1
MAResUNet(2021)Resnet1893.396.887.989.096.692.791.386.7
ABCNet(2021)Resnet1893.096.587.588.296.292.290.885.8
BANet(2021)ResT-Lite92.596.187.188.896.092.190.685.6
MANet(2022)Resnet1892.996.187.588.896.692.490.986.1
UnetFormer(2022)Resnet1893.196.587.889.296.792.791.186.6
DCswin(2022)Swin-tiny93.396.788.189.796.692.991.386.9
MAGIFormerMSCAN-tiny93.397.188.189.596.993.091.487.1
Tab.3 Quantitative comparison result with advanced high-precision network on Potsdam test set
Fig.6 Visualization comparison of experimental result of different segmentation network on ISPRS Potsdam dataset
方法骨干F1/%F1mean/%OA/%mIoU/%
不可渗透建筑植被杂物
DANet(2019)Resnet1890.393.982.588.375.854.186.288.876.2
ABCNet(2021)Resnet1890.693.081.589.684.238.187.888.778.5
MANet(2022)Resnet1892.094.583.589.488.050.989.590.081.1
BANet(2021)ResT-Lite92.495.183.889.889.054.590.090.582.1
MAResUNet(2021)Resnet1892.295.184.390.088.550.990.090.582.0
UnetFormer(2022)Resnet1892.795.484.490.189.757.190.590.882.8
DCswin(2022)Swin-tiny92.595.584.790.288.844.990.490.882.6
MAGIFormerMSCAN-tiny92.795.384.790.389.853.790.690.982.9
Tab.4 Quantitative comparison with advanced high-precision network on Vaihingen test set
Fig.7 Visualization comparison of experimental result of different segmentation network on ISPRS Vaihingen dataset
[1]   LONG J, SHELHAMER E, DARRELL T, et al. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Santiago: IEEE, 2015: 3431-3440.
[2]   RONNEBERGER O, FISCHER P, BROX T, et al. U-net: convolutional networks for biomedical image segmentation [C]// Proceedings of the Medical Image Computing and Computer-Assisted Intervention . Munich: Springer, 2015: 234-241.
[3]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
[4]   CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Proceedings of the European Conference on Computer Vision . Munich: [s. n. ], 2018: 801-818.
[5]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30.
[6]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. [2023-08-01]. https://arxiv.org/abs/2010.11929.
[7]   WANG L, LI R, DUAN C, et al. A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images [J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[8]   GUO M H, LU C Z, HOU Q, et al. Segnext: rethinking convolutional attention design for semantic segmentation [EB/OL]. [2023-08-01]. https://arxiv.org/abs/2209.08575.
[9]   WANG L, LI R, ZHANG C, et al UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196- 214
doi: 10.1016/j.isprsjprs.2022.06.008
[10]   DIAKOGIANNIS F I, WALDNER F, CACCETTA P, et al ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 162: 94- 114
doi: 10.1016/j.isprsjprs.2020.01.013
[11]   SUN Y, TIAN Y, XU Y Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning[J]. Neurocomputing, 2019, 330: 297- 304
doi: 10.1016/j.neucom.2018.11.051
[12]   吴泽康, 赵姗, 李宏伟, 等 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报: 工学版, 2022, 56 (4): 795- 802
WU Zekang, ZHAO Shan, LI Hongwei, et al Remote sensing image semantic segmentation space global context information network[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 795- 802
[13]   LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2021: 10012-10022.
[14]   FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 3146-3154.
[15]   HUANG Z, WANG X, HUANG L, et al. Ccnet: criss-cross attention for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Long Beach: IEEE, 2019: 603-612.
[16]   LI R, ZHENG S, ZHANG C, et al ABCNet: attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 181: 84- 98
doi: 10.1016/j.isprsjprs.2021.09.005
[17]   JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks [J]. Advances in Neural Information Processing Systems, 2015, 28: 1–19.
[18]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Santiago: IEEE, 2018: 7132-7141.
[19]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 3-19.
[20]   HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2021: 13713-13722.
[21]   LIU H, LIU F, FAN X, et al Polarized self-attention: towards high-quality pixel-wise mapping[J]. Neurocomputing, 2022, 506: 158- 167
doi: 10.1016/j.neucom.2022.07.054
[22]   WANG L, LI R, WANG D, et al Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13 (16): 3065
doi: 10.3390/rs13163065
[23]   LI R, ZHENG S, ZHANG C, et al Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1- 13
[1] Fan LI,Jie YANG,Zhicheng FENG,Zhichao CHEN,Yunxiao FU. Pantograph-catenary contact point detection method based on image recognition[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1801-1810.
[2] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.
[3] Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.
[4] Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.
[5] Kang FAN,Ming’en ZHONG,Jiawei TAN,Zehui ZHAN,Yan FENG. Traffic scene perception algorithm with joint semantic segmentation and depth estimation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 684-695.
[6] Canlin LI,Wenjiao ZHANG,Zhiwen SHAO,Lizhuang MA,Xinyue WANG. Semantic segmentation method on nighttime road scene based on Trans-nightSeg[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 294-303.
[7] Shaojie WEN,Ruigang WU,Chaowen FENG,Yingli LIU. Multimodal cascaded document layout analysis network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 317-324.
[8] Xiaofeng FU,Weiqi CHEN,Yao SUN,Yuze PAN. Bimodal software classification model based on bidirectional encoder representation from transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2239-2246.
[9] Dingjian DU,Zunhai GAO,Zhuo CHEN. Wolfberry pest detection based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 1992-2000.
[10] Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.
[11] Hao-ran GUO,Ji-chang GUO,Yu-dong WANG. Lightweight semantic segmentation network for underwater image[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1278-1286.
[12] Chun-juan LIU,Ze QIAO,Hao-wen YAN,Xiao-suo WU,Jia-wei WANG,Yu-qiang XIN. Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1335-1344.
[13] Hai-bo ZHANG,Lei CAI,Jun-ping REN,Ru-yan WANG,Fu LIU. Efficient and adaptive semantic segmentation network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1205-1214.
[14] Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU. Compound fault decoupling diagnosis method based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 855-864.
[15] Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.