Please wait a minute...
浙江大学学报(理学版)  2024, Vol. 51 Issue (2): 131-142    DOI: 10.3785/j.issn.1008-9497.2024.02.001
地理信息系统(GIS)专栏     
结合通道交互空间组注意力与金字塔池化的高分影像语义分割网络
汪超宇1,2,杜震洪1,2(),汪愿愿2,3
1.浙江大学 地球科学学院 地理信息科学研究所, 浙江 杭州 310058
2.浙江大学 浙江省资源与环境信息系统重点实验室,浙江 杭州 310028
3.浙江大学 海洋研究院,浙江 舟山 316021
High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling
Chaoyu WANG1,2,Zhenhong DU1,2(),Yuanyuan WANG2,3
1.Department of Geographic Information Science,Zhejiang University,Hangzhou 310058,China
2.Zhejiang Provincial Key Lab of Geographic Information Science,Zhejiang University,Hangzhou 310028,China
3.Ocean Academy,Zhejiang University,Zhoushan 316021,Zhejiang Province,China
 全文: PDF(7595 KB)   HTML( 10 )
摘要:

高空间分辨率(高分)遥感影像中存在海量信息,因此对高分影像的语义分割研究十分重要。传统机器学习方法的语义分割精度和效率均不高,近年来,深度学习方法迅速发展,逐渐成为影像语义分割领域的常用方法,已有研究将SegNet、Deeplabv3+、U-Net等神经网络引入遥感影像语义分割,但效果有限。考虑高分影像的特性,对用于遥感影像语义分割的U-Net网络进行了改进。首先,在U-Net网络特征提取过程中使用通道交互空间组注意力模块(channel interaction and spatial group attention module,CISGAM),使得网络能够获取更多有效特征。其次,在编码过程中将普通卷积层变换为残差模块,并在U-Net的编码器和解码器之间用加入了CISGAM的注意力金字塔池化模块(attention pyramid pooling module,APPM)连接,以加强网络对多尺度特征的提取。最后,在0.3 m分辨率的UC Merced数据集和1 m分辨率的GID数据集上进行实验,与U-Net、Deeplabv3+等原始网络相比,在UC Merced数据集上的平均交并比(mean intersection over union,MIoU)分别提升了14.56%和8.72%,平均像素准确率(mean pixel accuracy,MPA)分别提升了12.71%和8.24%。在GID数据集的分割结果中,水体、建筑物等地物的综合分割精度大幅提升,在平均分割精度上,CISGAM和APPM较常用的CBAM和PPM有一定提升。实验结果表明,加入CISGAM和APPM的网络可行性与鲁棒性均较传统网络强,其较强的特征提取能力有利于提升高分辨率遥感影像语义分割的精度,为高分辨率遥感影像智能解译提供新方案。

关键词: 高分辨率遥感影像深度学习语义分割注意力机制金字塔池化    
Abstract:

High spatial resolution remote sensing images contain rich information, it is therefore very important to study their semantic segmentation. Traditional machine learning methods appear low accuracy and efficiency when used for segmenting high-resolution remote sensing images. In recent years, the deep learning method has developed rapidly and has become the mainstream method of image semantic segmentation. Some scholars have introduced SegNet, Deeplabv3+, U-Net and other neural networks into remote sensing image semantic segmentation, but these networks have only limited effect in remote sensing image semantic segmentation. This paper improves the U-Net network for semantic segmentation of remote sensing images. Firstly, an improved convolutional attention module channel interaction and spatial group attention module (CISGAM) is embedded in the feature extraction stage of the U-Net network, so that the network can obtain more effective features; secondly, a residual module is used in the decoding layer to replace the ordinary convolutional layer to avoid the degradation of the model. In addition, we use an attention pyramid pooling module (APPM) with CISGAM to connect the encoder and decoder of U-Net to enhance the network's extraction of multi-scale features. Finally, experiments are carried out on the UC Merced dataset with 0.3m resolution and the GID dataset with 1m resolution. Compared with the original networks such as U-Net and Deeplabv3+, the mean intersection over union (MIoU) of our method on the UCM dataset has increased by 14.56% and 8.72%, and the mean pixel accuracy (MPA) has increased by 12.71% and 8.24%, respectively. In the classification results on the GID dataset, the classification accuracy of waters, buildings and other objects has also been greatly improved. Compared with the original CBAM and PPM, the CISGAM and APPM also achieve certain performance improvement. The experimental results show that the feasibility and robustness of the model is stronger than traditional networks, and it can improve the accuracy of semantic segmentation of high-resolution remote sensing images through stronger feature extraction capabilities, hence providing a new approach for intelligent interpretation of high-resolution remote sensing images.

Key words: high-resolution remote sensing images    deep learning    semantic segmentation    attention mechanism    pyramid pooling
收稿日期: 2022-09-21 出版日期: 2024-03-08
CLC:  P 208  
基金资助: 高分综合交通遥感应用示范系统(二期)(07-Y30B30-9001-19/21);浙江省重点研发计划项目(2021C01031)
通讯作者: 杜震洪     E-mail: duzhenhong@zju.edu.cn
作者简介: 汪超宇(1998—),ORCID:https://orcid.org/0000-0002-4286-3379,男,硕士研究生,主要从事高分辨率遥感图像处理研究。
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
汪超宇
杜震洪
汪愿愿

引用本文:

汪超宇,杜震洪,汪愿愿. 结合通道交互空间组注意力与金字塔池化的高分影像语义分割网络[J]. 浙江大学学报(理学版), 2024, 51(2): 131-142.

Chaoyu WANG,Zhenhong DU,Yuanyuan WANG. High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling. Journal of Zhejiang University (Science Edition), 2024, 51(2): 131-142.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2024.02.001        https://www.zjujournals.com/sci/CN/Y2024/V51/I2/131

图1  模型结构示意
图2  CBAM的通道注意力模块
图3  CISGAM的通道注意力模块
图4  CBAM的空间注意力模块
图5  CISGAM的空间注意力模块
图6  CISGAM的操作过程
图7  APPM
图8  UC Merced数据集简介
图9  GID数据集简介
项目系统CPU内存硬盘显卡
配置Windosw11Intel Core I7 1070032 GB2 TBNvidia RTX3070
表1  基础系统平台配置
项目GPU-DriverCUDAPythonPytorchCUDNN
配置512.1511.33.91.0.118.3.2
表2  重要软件配置
模型MPAMIoU
PSPNet0.677 10.693 7
Deeplabv3+0.726 30.735 8
U-Net0.681 60.677 4
U-Net + CISGAM0.707 50.697 0
U-Net + CISGAM+PPM0.724 80.734 8
U-Net + CISGAM+APPM0.735 00.746 7
ResUnet0. 737 30.686 9
ResUnet + CBAM0.739 00.713 6
ResUnet + CISGAM0.758 50.767 2
ResUnet + CBAM + PPM0.761 80.779 1
ResUnet + CISGAM + PPM0.791 50.805 7
ResUnet + CBAM + APPM0.775 30.781 8
本文方法0.808 70.823 0
表3  各模型在UC Merced数据集上的分割性能对比
图10  UC Merced数据集分割结果
模型MIoUIoUMPA
水体建筑草地农田森林
PSPNet0.6320.5950.7240.5560.6840.7340.660
Deeplabv3+0.6910.7050.7400.6840.6950.7110.681
U-Net0.6360.7140.7590.6150.6500.6460.663
U-Net + CISGAM0.6470.7350.7700.6210.6710.6650.666
U-Net + CISGAM+PPM0.6660.7720.7790.6680.6930.6820.679
U-Net + CISGAM+APPM0.7150.8020.7990.6960.7000.7210.695
ResUnet0.6760.7560.7810.6280.6670.6740.678
ResUnet + CBAM0.6890.7790.7940.6600.6690.7010.694
ResUnet+ CISGAM0.7030.7970.8010.6730.6880.7270.718
ResUnet + CBAM + PPM0.7240.8250.8200.7210.7050.7460.735
ResUnet + CISGAM + PPM0.7380.8290.8280.7240.7250.7580.749
ResUnet + CBAM + APPM0.7320.8330.8230.7190.7120.7470.730
本文方法0.7620.8360.8220.7310.7340.7550.756
表4  各模型在GID数据集上的分割性能对比
图11  GID数据集分割结果对比
图12  不同池化窗口的分割结果对比
池化窗口大小组合MPAMIoU
1,2,30.766 20.756 3
1,4,80.785 60.778 2
1,2,3,60.805 40.809 7
1,3,6,90.794 20.808 6
1,4,8,120.788 20.805 3
1,2,4,8,120.779 40.778 2
1,3,6,10,160.773 50.770 4
1,2,4,8,12,160.753 70.747 4
表5  不同大小池化窗口的效果分析
1 徐佳伟, 刘伟, 单浩宇, 等. 基于PRCUnet的高分遥感影像建筑物提取[J]. 地球信息科学学报, 2021, 23(10): 1838-1849. doi:10.12082/dqxxkx.2021.210283
XU J W, LIU W, SHAN H Y, et al. High-resolution remote sensing image building extraction based on PRCUnet[J]. Journal of Geo-Information Science, 2021, 23(10): 1838-1849. doi:10.12082/dqxxkx.2021.210283
doi: 10.12082/dqxxkx.2021.210283
2 邵振峰, 孙悦鸣, 席江波, 等. 智能优化学习的高空间分辨率遥感影像语义分割[J]. 武汉大学学报(信息科学版), 2022, 47(2): 234-241. DOI:10.13203/j.whugis20200640
SHAO Z F, SUN Y M, XI J B, et al. Intelligent optimization learning for semantic segmentation of high spatial resolution remote sensing images[J]. Geomatics and Information Science of Wuhan University, 2022, 47(2): 234-241. DOI:10.13203/j.whugis20200640
doi: 10.13203/j.whugis20200640
3 张万福. 基于随机森林的图像语义分割算法的研究[J]. 电子科技, 2017, 30(2): 72-75. DOI:10.16180/j.cnki.issn1007-7820.2017.02.019
ZHANG W F. Semantic segmentation algorithm based on random forests[J]. Electronic Science and Technology, 2017, 30(2): 72-75. DOI:10.16180/j.cnki.issn1007-7820.2017.02.019
doi: 10.16180/j.cnki.issn1007-7820.2017.02.019
4 潘欣欣. 基于各向异性马尔科夫随机场的遥感影像分割[D]. 开封: 河南大学, 2020. DOI:10.27114/d.cnki.ghnau.2020.000266
PAN X X. Remote Sensing Image Segmentation based on Anisotropic Markov Random Field[D]. Kaifeng: Henan University, 2020. DOI:10.27114/d.cnki.ghnau.2020.000266
doi: 10.27114/d.cnki.ghnau.2020.000266
5 袁正午, 朱冠宇, 丰江帆, 等. 基于支持向量机的视频语义场景分割算法研究[J]. 重庆邮电大学学报(自然科学版), 2010, 22(4): 458-463.
YUAN Z W, ZHU G Y, FENG J F, et al. Research on the method of video semantic scene constructing based on SVM[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2010, 22(4): 458-463.
6 LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Patter-n Recognition (CVPR). Boston: IEEE, 2015: 3431-3440. DOI:10.1109/CVPR.2015.7298965
doi: 10.1109/CVPR.2015.7298965
7 RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segme-ntation[C]// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015: 234-241. DOI:10.1007/978-3-319-24574-4_28
doi: 10.1007/978-3-319-24574-4_28
8 ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE Transactions on Medical Imaging,2020, 39(6): 1856-1867. DOI:10.1109/TMI.2019.2959609 .
doi: 10.1109/TMI.2019.2959609
9 李传林, 黄风华, 胡威, 等. 基于Res_AttentionUnet的高分辨率遥感影像建筑物提取方法[J]. 地球信息科学学报, 2021, 23(12): 2232-2243. DOI:10.12082/dqxxkx.2021.210008
LI C L, HUANG F H, HU W, et al. Buildingextraction from high-resolution remote sensing image based on Res_AttentionUnet[J]. Journal of Geo-Information Science, 2021, 23(12): 2232-2243. DOI:10.12082/dqxxkx.2021. 210008
doi: 10.12082/dqxxkx.2021. 210008
10 王振庆,周艺,王世新,等. IEU-Net高分辨率遥感影像房屋建筑物提取[J]. 遥感学报, 2021, 25(11): 2245-2254. DOI:10.11834/jrs.20210042
WANG Z Q, ZHOU Y, WANG S X,et al. House building extraction from high-resolution remote sensing images based on IEU-Net[J]. National Remote Sensing Bulletin, 2021, 25(11): 2245-2254. DOI:10.11834/jrs.20210042
doi: 10.11834/jrs.20210042
11 董子意, 杜震洪, 吴森森, 等. 基于改进U-Net网络的海洋中尺度涡自动检测模型[J]. 海洋学报, 2022, 44(2): 123-131. DOI:10.12284/hyxb2022038
DONG Z Y, DU Z H, WU S S, et al. An automatic marine mesoscale eddy detection model based on improved U-Net network[J]. Acta Oceanologica Sinica, 2022, 44(2): 123-131. DOI:10.12284/hyxb2022038
doi: 10.12284/hyxb2022038
12 李鑫伟, 李彦胜, 张永军. 弱监督深度语义分割网络的多源遥感影像水体检测[J]. 中国图象图形学报, 2021, 26(12): 3015-3026. DOI:10.11834/jig.200192
LI X W, LI Y S, ZHANG Y J. Weakly supervised deep semantic segmentation network for water body extraction based on multisource remote sensing imagery[J]. Journal of Image and Graphics, 2021, 26(12): 3015-3026. DOI:10.11834/jig.200192
doi: 10.11834/jig.200192
13 QI L, YAO Y, DAVID E E, et al. Remote sensing of brine shrimp cysts in Salt Lakes[J]. Remote Sensing of Environment, 2021, 266: 112695. DOI:10.1016/j.rse.2021.112695
doi: 10.1016/j.rse.2021.112695
14 杨佳林, 郭学俊, 陈泽华. 改进U-Net型网络的遥感图像道路提取[J]. 中国图象图形学报, 2021, 26(12): 3005-3014. DOI:10.11834/jig.200579
YANG J L, GUO X J, CHEN Z H. Road extraction method from remote sensing images based on improved U-Net network[J]. Journal of Image and Graphics, 2021, 26(12): 3005-3014. DOI:10.11834/jig.200579
doi: 10.11834/jig.200579
15 HE K M, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. DOI:10.1109/CVPR.2016.90
doi: 10.1109/CVPR.2016.90
16 侯向丹, 赵一浩, 刘洪普, 等. 融合残差注意力机制的UNet视盘分割[J]. 中国图象图形学报, 2020, 25(9): 1915-1929. DOI:10.11834/jig.190527
HOU X D, ZHAO Y H, LIU H P, et al. Optic disk segmentation by combining UNet and residual attention mechanism[J]. Journal of Image and Graphics, 2020, 25(9): 1915-1929. DOI:10.11834/jig.190527
doi: 10.11834/jig.190527
17 SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from over fitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
18 CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// European Conference on Computer Vision. Munich: ECCV/NewYork: Springer, 2018: 833-851. DOI:10.1007/978-3-030-01234-2_49
doi: 10.1007/978-3-030-01234-2_49
19 何红术, 黄晓霞, 李红旮, 等. 基于改进U-Net网络的高分遥感影像水体提取[J]. 地球信息科学学报, 2020, 22(10): 2010-2022. DOI:10.12082/dqxxkx. 2020.190622
HE H S, HUANG X X, LI H G, et al. Water body extraction of high-resolution remote sensing image based on improved U-Net network [J]. Journal of Geo-Information Science, 2020, 22(10): 2010-2022. DOI:10.12082/dqxxkx.2020.190622
doi: 10.12082/dqxxkx.2020.190622
20 WOO S, PARK J, LEE J, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Munich: ECCV, 2018: 3-19. DOI:10.1007/978-3-030-01234-2_1
doi: 10.1007/978-3-030-01234-2_1
21 WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 11531-11539. DOI:10.1109/CVPR42600.2020.01155 .
doi: 10.1109/CVPR42600.2020.01155
22 WU Y, HE K. Group normalization[J]. International Journal of Computer Vision, 2020, 128: 742-755. DOI:10.1007/s11263-019-01198-w
doi: 10.1007/s11263-019-01198-w
23 ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[J]. Computer Science, 2014. DOI:10.48550/arXiv. 1412.6856
doi: 10.48550/arXiv. 1412.6856
24 HE K M, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI:10.1109/TPAMI.2015.2389824
doi: 10.1109/TPAMI.2015.2389824
25 ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 6230-6239. DOI:10.1109/CVPR. 2017.660 .
doi: 10.1109/CVPR. 2017.660
[1] 窦丰,马会文,谢昕洋,杨万文,石雪,韩丽,林彬. 广义无监督函数映射学习的三维形状密集对应方法[J]. 浙江大学学报(理学版), 2023, 50(6): 736-744.
[2] 刘玉杰,原亚夫,孙晓瑞,李宗民. 局部信息和全局信息相结合的点云处理网络[J]. 浙江大学学报(理学版), 2023, 50(6): 770-780.
[3] 方于华,叶枫. MFDC-Net:一种融合多尺度特征和注意力机制的乳腺癌病理图像分类算法[J]. 浙江大学学报(理学版), 2023, 50(4): 455-464.
[4] 徐圣嘉,苏程,朱孔阳,章孝灿. 基于深度学习的岩石薄片矿物自动识别方法[J]. 浙江大学学报(理学版), 2022, 49(6): 743-752.
[5] 刘华玲,张国祥,马俊. 图嵌入算法研究进展[J]. 浙江大学学报(理学版), 2022, 49(4): 443-456.
[6] 祝锦泰, 叶继华, 郭凤, 江蕗, 江爱文. FSAGN: 一种自主选择关键帧的表情识别方法[J]. 浙江大学学报(理学版), 2022, 49(2): 141-150.
[7] 杨冰, 徐丹, 张豪远, 罗海妮. 基于改进的DenseNet-BC对少数民族服饰的识别[J]. 浙江大学学报(理学版), 2021, 48(6): 676-683.
[8] 钱立辉, 王斌, 郑云飞, 章佳杰, 李马丁, 于冰. 基于图像深度预测的景深视频分类算法[J]. 浙江大学学报(理学版), 2021, 48(3): 282-288.
[9] 陈园琼, 邹北骥, 张美华, 廖望旻, 黄嘉儿, 朱承璋. 医学影像处理的深度学习可解释性研究进展[J]. 浙江大学学报(理学版), 2021, 48(1): 18-29.
[10] 傅颖颖, 张丰, 杜震洪, 刘仁义. 融合图卷积神经网络和注意力机制的PM2.5小时浓度多步预测[J]. 浙江大学学报(理学版), 2021, 48(1): 74-83.
[11] 李君轶, 任涛, 陆路正. 游客情感计算的文本大数据挖掘方法比较研究[J]. 浙江大学学报(理学版), 2020, 47(4): 507-520.
[12] 陈善雄, 王小龙, 韩旭, 刘云, 王明贵. 一种基于深度学习的古彝文识别方法[J]. 浙江大学学报(理学版), 2019, 46(3): 261-269.
[13] 郑锐, 钱文华, 徐丹, 普园媛. 基于卷积神经网络的刺绣风格数字合成[J]. 浙江大学学报(理学版), 2019, 46(3): 270-278.
[14] 黄婕, 张丰, 杜震洪, 刘仁义, 曹晓裴. 基于RNN-CNN集成深度学习模型的PM2.5小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3): 370-379.
[15] 胡伟俭, 陈为, 冯浩哲, 张天平, 朱正茂, 潘巧明. 应用于平扫CT图像肺结节检测的深度学习方法综述[J]. 浙江大学学报(理学版), 2017, 44(4): 379-384.