Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (11): 2270-2279    DOI: 10.3785/j.issn.1008-973X.2024.11.008
计算机技术、控制工程     
全局信息提取与重建的遥感图像语义分割网络
梁龙学1(),贺成龙1,吴小所1,*(),闫浩文2
1. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070
2. 兰州交通大学 测绘与地理信息学院,甘肃 兰州 730070
Remote sensing image semantic segmentation network based on global information extraction and reconstruction
Longxue LIANG1(),Chenglong HE1,Xiaosuo WU1,*(),Haowen YAN2
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. School of Surveying and Mapping and Geographic Information, Lanzhou Jiaotong University, Lanzhou 730070, China
 全文: PDF(2507 KB)   HTML
摘要:

为了将遥感场景图像更好地进行分割,供给下游任务使用,提出多尺度注意力提取与全局信息重建网络. 编码器引入多尺度卷积注意力骨干到遥感深度学习语义分割模型中. 多尺度卷积注意力能够捕获多尺度信息,给解码器提供更丰富的全局深浅层信息. 在解码器,设计了全局多分支局部Transformer块. 多尺度逐通道条带卷积重建多尺度空间上下文信息,弥补全局分支存在的空间信息割裂,与全局语义上下文信息共同重建全局信息分割图. 解码器末端设计极化特征精炼头. 通道上利用softmax和sigmoid组合,构建概率分布函数,拟合更好的输出分布,修复浅层中潜在的高分辨率信息损失,指导和融合深层信息,获得精细的空间纹理. 实验结果表明,网络实现了很高的精确度,在ISPRS Vaihingen数据集上达到82.9%的平均交并比,在ISPRS Potsdam数据集上达到87.1%的平均交并比.

关键词: 语义分割Transformer多尺度卷积注意力全局多分支局部注意力全局信息重建    
Abstract:

A network for multi-scale attention extraction and global information reconstruction was proposed in order to enhance the segmentation of remote sensing scene images for downstream tasks. A multi-scale convolutional attention backbone was introduced into the remote sensing deep learning semantic segmentation model in the encoder. Multi-scale convolutional attention can capture multi-scale information and provide richer global deep and shallow information to the decoder. A global multi-branch local Transformer block was designed in the decoder. Multi-scale channel-wise striped convolution reconstructed multi-scale spatial context information, compensating for the spatial information fragmentation in the global branch. The global information segmentation map was reconstructed together with global semantic context information. A polarized feature refinement head was designed at the end of the decoder. A combination of softmax and sigmoid was used to construct a probability distribution function on the channel, which fitted a better output distribution, repaired potential high-resolution information loss in shallow layers, guided and integrated deep information. Then fine spatial texture was obtained. The experimental results showed that high accuracy was achieved by the network, with a mean intersection over union (MIoU) of 82.9% on the ISPRS Vaihingen dataset and 87.1% on the ISPRS Potsdam dataset.

Key words: semantic segmentation    Transformer    multi-scale convolutional attention    global multi-branch local attention    global information reconstruction
收稿日期: 2023-08-29 出版日期: 2024-10-23
CLC:  TP 751  
基金资助: 国家重点研发计划资助项目(2022YFB3903604);甘肃省自然科学基金资助项目(21JR7RA310);兰州交通大学青年科学基金资助项目(2021029).
通讯作者: 吴小所     E-mail: 1367194087@qq.com;wuxs_laser@lzjtu.edu.cn
作者简介: 梁龙学(1965—),男,副教授,从事电子器件及遥感图像处理的研究. https://orcid.org/0000-0002-3938-7359.E-mail:1367194087@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
梁龙学
贺成龙
吴小所
闫浩文

引用本文:

梁龙学,贺成龙,吴小所,闫浩文. 全局信息提取与重建的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2024, 58(11): 2270-2279.

Longxue LIANG,Chenglong HE,Xiaosuo WU,Haowen YAN. Remote sensing image semantic segmentation network based on global information extraction and reconstruction. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2270-2279.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.11.008        https://www.zjujournals.com/eng/CN/Y2024/V58/I11/2270

图 1  多尺度注意力提取与全局信息重建Transformer的整体网络架构图
图 2  多尺度卷积注意力模块的网络架构图
图 3  全局多分支局部Transformer模块的整体网络架构图
图 4  极化特征精炼头的整体网络架构图
数据集方法OA/%F1mean/%mIoU/%

Vaihingen
Baseline89.3289.3080.87
Baseline+MSCAN89.7689.4681.12
Baseline+MSCAN+GMSLTB90.7390.3182.74
Baseline+ MSCAN+GMSLTB+PFRH90.8890.5482.94
表 1  GMSLTransFormer每个组件的消融研究
图 5  在Vaihingen数据集上的模型消融实验可视化结果对比
方法骨干C/MBNp/106Nf/109F1mean/%OA/%mIoU/%
DANet(2019)Resnet182024.912.6120.2490.790.283.1
BANet(2021)ResTLi3248.012.729.3892.190.685.6
ABCNet(2021)Resnet181573.214.062.1692.290.885.8
MANet(2021)Resnet182091.612.088.2592.490.886.1
UnetFormer(2022)Resnet181481.711.711.6792.791.186.6
MAResUNet(2022)Resnet18638.5116.225.2992.791.386.7
DCswin(2022)Swin-tiny4265.945.689.3092.991.386.9
MAGIFormerMSCAN_tiny5015.313.962.7093.091.487.1
表 2  在Potsdam测试集与先进的遥感语义分割网络结果进行对比
方法骨干F1/%F1mean/%OA/%mIoU/%
不可渗透建筑低矮植被
DANet(2019)Resnet1892.396.086.688.490.390.790.283.1
MAResUNet(2021)Resnet1893.396.887.989.096.692.791.386.7
ABCNet(2021)Resnet1893.096.587.588.296.292.290.885.8
BANet(2021)ResT-Lite92.596.187.188.896.092.190.685.6
MANet(2022)Resnet1892.996.187.588.896.692.490.986.1
UnetFormer(2022)Resnet1893.196.587.889.296.792.791.186.6
DCswin(2022)Swin-tiny93.396.788.189.796.692.991.386.9
MAGIFormerMSCAN-tiny93.397.188.189.596.993.091.487.1
表 3  在Potsdam测试集上与先进的高精度网络的定量比较结果
图 6  不同分割网络在ISPRS Potsdam数据集上的实验结果可视化对比
方法骨干F1/%F1mean/%OA/%mIoU/%
不可渗透建筑植被杂物
DANet(2019)Resnet1890.393.982.588.375.854.186.288.876.2
ABCNet(2021)Resnet1890.693.081.589.684.238.187.888.778.5
MANet(2022)Resnet1892.094.583.589.488.050.989.590.081.1
BANet(2021)ResT-Lite92.495.183.889.889.054.590.090.582.1
MAResUNet(2021)Resnet1892.295.184.390.088.550.990.090.582.0
UnetFormer(2022)Resnet1892.795.484.490.189.757.190.590.882.8
DCswin(2022)Swin-tiny92.595.584.790.288.844.990.490.882.6
MAGIFormerMSCAN-tiny92.795.384.790.389.853.790.690.982.9
表 4  在Vaihingen测试集上与先进的高精度网络定量比较结果
图 7  不同分割网络在ISPRS Vaihingen数据集上的实验结果可视化对比
1 LONG J, SHELHAMER E, DARRELL T, et al. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Santiago: IEEE, 2015: 3431-3440.
2 RONNEBERGER O, FISCHER P, BROX T, et al. U-net: convolutional networks for biomedical image segmentation [C]// Proceedings of the Medical Image Computing and Computer-Assisted Intervention . Munich: Springer, 2015: 234-241.
3 CHEN L C, PAPANDREOU G, KOKKINOS I, et al Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
4 CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Proceedings of the European Conference on Computer Vision . Munich: [s. n. ], 2018: 801-818.
5 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30.
6 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. [2023-08-01]. https://arxiv.org/abs/2010.11929.
7 WANG L, LI R, DUAN C, et al. A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images [J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
8 GUO M H, LU C Z, HOU Q, et al. Segnext: rethinking convolutional attention design for semantic segmentation [EB/OL]. [2023-08-01]. https://arxiv.org/abs/2209.08575.
9 WANG L, LI R, ZHANG C, et al UNetFormer: a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196- 214
doi: 10.1016/j.isprsjprs.2022.06.008
10 DIAKOGIANNIS F I, WALDNER F, CACCETTA P, et al ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 162: 94- 114
doi: 10.1016/j.isprsjprs.2020.01.013
11 SUN Y, TIAN Y, XU Y Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning[J]. Neurocomputing, 2019, 330: 297- 304
doi: 10.1016/j.neucom.2018.11.051
12 吴泽康, 赵姗, 李宏伟, 等 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报: 工学版, 2022, 56 (4): 795- 802
WU Zekang, ZHAO Shan, LI Hongwei, et al Remote sensing image semantic segmentation space global context information network[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 795- 802
13 LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2021: 10012-10022.
14 FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 3146-3154.
15 HUANG Z, WANG X, HUANG L, et al. Ccnet: criss-cross attention for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Long Beach: IEEE, 2019: 603-612.
16 LI R, ZHENG S, ZHANG C, et al ABCNet: attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 181: 84- 98
doi: 10.1016/j.isprsjprs.2021.09.005
17 JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks [J]. Advances in Neural Information Processing Systems, 2015, 28: 1–19.
18 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Santiago: IEEE, 2018: 7132-7141.
19 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 3-19.
20 HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2021: 13713-13722.
21 LIU H, LIU F, FAN X, et al Polarized self-attention: towards high-quality pixel-wise mapping[J]. Neurocomputing, 2022, 506: 158- 167
doi: 10.1016/j.neucom.2022.07.054
22 WANG L, LI R, WANG D, et al Transformer meets convolution: a bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13 (16): 3065
doi: 10.3390/rs13163065
23 LI R, ZHENG S, ZHANG C, et al Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1- 13
[1] 李凡,杨杰,冯志成,陈智超,付云骁. 基于图像识别的弓网接触点检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1801-1810.
[2] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[3] 杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[4] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[5] 范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[6] 李灿林,张文娇,邵志文,马利庄,王新玥. 基于Trans-nightSeg的夜间道路场景语义分割方法[J]. 浙江大学学报(工学版), 2024, 58(2): 294-303.
[7] 温绍杰,吴瑞刚,冯超文,刘英莉. 基于Transformer的多模态级联文档布局分析网络[J]. 浙江大学学报(工学版), 2024, 58(2): 317-324.
[8] 冯志成,杨杰,陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学学报(工学版), 2024, 58(1): 40-49.
[9] 刘春娟,乔泽,闫浩文,吴小所,王嘉伟,辛钰强. 基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1335-1344.
[10] 郭浩然,郭继昌,汪昱东. 面向水下场景的轻量级图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1278-1286.
[11] 张海波,蔡磊,任俊平,王汝言,刘富. 基于Transformer的高效自适应语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(6): 1205-1214.
[12] 王誉翔,钟智伟,夏鹏程,黄亦翔,刘成良. 基于改进Transformer的复合故障解耦诊断方法[J]. 浙江大学学报(工学版), 2023, 57(5): 855-864.
[13] 吕鑫栋,李娇,邓真楠,冯浩,崔欣桐,邓红霞. 基于改进Transformer的结构化图像超分辨网络[J]. 浙江大学学报(工学版), 2023, 57(5): 865-874.
[14] 陆昱翔,徐冠华,唐波. 基于视觉Transformer时空自注意力的工人行为识别[J]. 浙江大学学报(工学版), 2023, 57(3): 446-454.
[15] 杨长春,叶赞挺,刘半藤,王柯,崔海东. 基于多源信息融合的医学图像分割方法[J]. 浙江大学学报(工学版), 2023, 57(2): 226-234.