Please wait a minute...
Journal of Zhejiang University (Science Edition)  2024, Vol. 51 Issue (2): 131-142    DOI: 10.3785/j.issn.1008-9497.2024.02.001
Geographic Information Science     
High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling
Chaoyu WANG1,2,Zhenhong DU1,2(),Yuanyuan WANG2,3
1.Department of Geographic Information Science,Zhejiang University,Hangzhou 310058,China
2.Zhejiang Provincial Key Lab of Geographic Information Science,Zhejiang University,Hangzhou 310028,China
3.Ocean Academy,Zhejiang University,Zhoushan 316021,Zhejiang Province,China
Download: HTML( 10 )   PDF(7595KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

High spatial resolution remote sensing images contain rich information, it is therefore very important to study their semantic segmentation. Traditional machine learning methods appear low accuracy and efficiency when used for segmenting high-resolution remote sensing images. In recent years, the deep learning method has developed rapidly and has become the mainstream method of image semantic segmentation. Some scholars have introduced SegNet, Deeplabv3+, U-Net and other neural networks into remote sensing image semantic segmentation, but these networks have only limited effect in remote sensing image semantic segmentation. This paper improves the U-Net network for semantic segmentation of remote sensing images. Firstly, an improved convolutional attention module channel interaction and spatial group attention module (CISGAM) is embedded in the feature extraction stage of the U-Net network, so that the network can obtain more effective features; secondly, a residual module is used in the decoding layer to replace the ordinary convolutional layer to avoid the degradation of the model. In addition, we use an attention pyramid pooling module (APPM) with CISGAM to connect the encoder and decoder of U-Net to enhance the network's extraction of multi-scale features. Finally, experiments are carried out on the UC Merced dataset with 0.3m resolution and the GID dataset with 1m resolution. Compared with the original networks such as U-Net and Deeplabv3+, the mean intersection over union (MIoU) of our method on the UCM dataset has increased by 14.56% and 8.72%, and the mean pixel accuracy (MPA) has increased by 12.71% and 8.24%, respectively. In the classification results on the GID dataset, the classification accuracy of waters, buildings and other objects has also been greatly improved. Compared with the original CBAM and PPM, the CISGAM and APPM also achieve certain performance improvement. The experimental results show that the feasibility and robustness of the model is stronger than traditional networks, and it can improve the accuracy of semantic segmentation of high-resolution remote sensing images through stronger feature extraction capabilities, hence providing a new approach for intelligent interpretation of high-resolution remote sensing images.



Key wordshigh-resolution remote sensing images      deep learning      semantic segmentation      attention mechanism      pyramid pooling     
Received: 21 September 2022      Published: 08 March 2024
CLC:  P 208  
Corresponding Authors: Zhenhong DU     E-mail: duzhenhong@zju.edu.cn
Cite this article:

Chaoyu WANG,Zhenhong DU,Yuanyuan WANG. High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling. Journal of Zhejiang University (Science Edition), 2024, 51(2): 131-142.

URL:

https://www.zjujournals.com/sci/EN/Y2024/V51/I2/131


结合通道交互空间组注意力与金字塔池化的高分影像语义分割网络

高空间分辨率(高分)遥感影像中存在海量信息,因此对高分影像的语义分割研究十分重要。传统机器学习方法的语义分割精度和效率均不高,近年来,深度学习方法迅速发展,逐渐成为影像语义分割领域的常用方法,已有研究将SegNet、Deeplabv3+、U-Net等神经网络引入遥感影像语义分割,但效果有限。考虑高分影像的特性,对用于遥感影像语义分割的U-Net网络进行了改进。首先,在U-Net网络特征提取过程中使用通道交互空间组注意力模块(channel interaction and spatial group attention module,CISGAM),使得网络能够获取更多有效特征。其次,在编码过程中将普通卷积层变换为残差模块,并在U-Net的编码器和解码器之间用加入了CISGAM的注意力金字塔池化模块(attention pyramid pooling module,APPM)连接,以加强网络对多尺度特征的提取。最后,在0.3 m分辨率的UC Merced数据集和1 m分辨率的GID数据集上进行实验,与U-Net、Deeplabv3+等原始网络相比,在UC Merced数据集上的平均交并比(mean intersection over union,MIoU)分别提升了14.56%和8.72%,平均像素准确率(mean pixel accuracy,MPA)分别提升了12.71%和8.24%。在GID数据集的分割结果中,水体、建筑物等地物的综合分割精度大幅提升,在平均分割精度上,CISGAM和APPM较常用的CBAM和PPM有一定提升。实验结果表明,加入CISGAM和APPM的网络可行性与鲁棒性均较传统网络强,其较强的特征提取能力有利于提升高分辨率遥感影像语义分割的精度,为高分辨率遥感影像智能解译提供新方案。


关键词: 高分辨率遥感影像,  深度学习,  语义分割,  注意力机制,  金字塔池化 
Fig.1 Model structure diagram
Fig.2 Channel attention in CBAM
Fig.3 Channel attention in CISGAM
Fig.4 Spatial attention in CBAM
Fig.5 Spatial attention in CISGAM
Fig.6 The operation process of CISGAM
Fig. 7 Attention pyramid pooling module
Fig.8 Schematic of the UC Merced dataset
Fig.9 Schematic of the GID dataset
项目系统CPU内存硬盘显卡
配置Windosw11Intel Core I7 1070032 GB2 TBNvidia RTX3070
Table 1 Basic system platform configuration
项目GPU-DriverCUDAPythonPytorchCUDNN
配置512.1511.33.91.0.118.3.2
Table 2 Important software configuration
模型MPAMIoU
PSPNet0.677 10.693 7
Deeplabv3+0.726 30.735 8
U-Net0.681 60.677 4
U-Net + CISGAM0.707 50.697 0
U-Net + CISGAM+PPM0.724 80.734 8
U-Net + CISGAM+APPM0.735 00.746 7
ResUnet0. 737 30.686 9
ResUnet + CBAM0.739 00.713 6
ResUnet + CISGAM0.758 50.767 2
ResUnet + CBAM + PPM0.761 80.779 1
ResUnet + CISGAM + PPM0.791 50.805 7
ResUnet + CBAM + APPM0.775 30.781 8
本文方法0.808 70.823 0
Table 3 Comparison of segmentation performance on UC Merced dataset
Fig.10 Comparison of the results on UC Merced dataset
模型MIoUIoUMPA
水体建筑草地农田森林
PSPNet0.6320.5950.7240.5560.6840.7340.660
Deeplabv3+0.6910.7050.7400.6840.6950.7110.681
U-Net0.6360.7140.7590.6150.6500.6460.663
U-Net + CISGAM0.6470.7350.7700.6210.6710.6650.666
U-Net + CISGAM+PPM0.6660.7720.7790.6680.6930.6820.679
U-Net + CISGAM+APPM0.7150.8020.7990.6960.7000.7210.695
ResUnet0.6760.7560.7810.6280.6670.6740.678
ResUnet + CBAM0.6890.7790.7940.6600.6690.7010.694
ResUnet+ CISGAM0.7030.7970.8010.6730.6880.7270.718
ResUnet + CBAM + PPM0.7240.8250.8200.7210.7050.7460.735
ResUnet + CISGAM + PPM0.7380.8290.8280.7240.7250.7580.749
ResUnet + CBAM + APPM0.7320.8330.8230.7190.7120.7470.730
本文方法0.7620.8360.8220.7310.7340.7550.756
Table 4 Comparison of segmentation performance on GID
Fig.11 Comparison of the results on GID
Fig.12 Comparison of the results from different pooling size
池化窗口大小组合MPAMIoU
1,2,30.766 20.756 3
1,4,80.785 60.778 2
1,2,3,60.805 40.809 7
1,3,6,90.794 20.808 6
1,4,8,120.788 20.805 3
1,2,4,8,120.779 40.778 2
1,3,6,10,160.773 50.770 4
1,2,4,8,12,160.753 70.747 4
Table 5 Comparison of segmentation performance from different pooling size on UC Merced dataset
[1]   徐佳伟, 刘伟, 单浩宇, 等. 基于PRCUnet的高分遥感影像建筑物提取[J]. 地球信息科学学报, 2021, 23(10): 1838-1849. doi:10.12082/dqxxkx.2021.210283
XU J W, LIU W, SHAN H Y, et al. High-resolution remote sensing image building extraction based on PRCUnet[J]. Journal of Geo-Information Science, 2021, 23(10): 1838-1849. doi:10.12082/dqxxkx.2021.210283
doi: 10.12082/dqxxkx.2021.210283
[2]   邵振峰, 孙悦鸣, 席江波, 等. 智能优化学习的高空间分辨率遥感影像语义分割[J]. 武汉大学学报(信息科学版), 2022, 47(2): 234-241. DOI:10.13203/j.whugis20200640
SHAO Z F, SUN Y M, XI J B, et al. Intelligent optimization learning for semantic segmentation of high spatial resolution remote sensing images[J]. Geomatics and Information Science of Wuhan University, 2022, 47(2): 234-241. DOI:10.13203/j.whugis20200640
doi: 10.13203/j.whugis20200640
[3]   张万福. 基于随机森林的图像语义分割算法的研究[J]. 电子科技, 2017, 30(2): 72-75. DOI:10.16180/j.cnki.issn1007-7820.2017.02.019
ZHANG W F. Semantic segmentation algorithm based on random forests[J]. Electronic Science and Technology, 2017, 30(2): 72-75. DOI:10.16180/j.cnki.issn1007-7820.2017.02.019
doi: 10.16180/j.cnki.issn1007-7820.2017.02.019
[4]   潘欣欣. 基于各向异性马尔科夫随机场的遥感影像分割[D]. 开封: 河南大学, 2020. DOI:10.27114/d.cnki.ghnau.2020.000266
PAN X X. Remote Sensing Image Segmentation based on Anisotropic Markov Random Field[D]. Kaifeng: Henan University, 2020. DOI:10.27114/d.cnki.ghnau.2020.000266
doi: 10.27114/d.cnki.ghnau.2020.000266
[5]   袁正午, 朱冠宇, 丰江帆, 等. 基于支持向量机的视频语义场景分割算法研究[J]. 重庆邮电大学学报(自然科学版), 2010, 22(4): 458-463.
YUAN Z W, ZHU G Y, FENG J F, et al. Research on the method of video semantic scene constructing based on SVM[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2010, 22(4): 458-463.
[6]   LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Patter-n Recognition (CVPR). Boston: IEEE, 2015: 3431-3440. DOI:10.1109/CVPR.2015.7298965
doi: 10.1109/CVPR.2015.7298965
[7]   RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segme-ntation[C]// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015: 234-241. DOI:10.1007/978-3-319-24574-4_28
doi: 10.1007/978-3-319-24574-4_28
[8]   ZHOU Z W, SIDDIQUEE M M R, TAJBAKHSH N, et al. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE Transactions on Medical Imaging,2020, 39(6): 1856-1867. DOI:10.1109/TMI.2019.2959609 .
doi: 10.1109/TMI.2019.2959609
[9]   李传林, 黄风华, 胡威, 等. 基于Res_AttentionUnet的高分辨率遥感影像建筑物提取方法[J]. 地球信息科学学报, 2021, 23(12): 2232-2243. DOI:10.12082/dqxxkx.2021.210008
LI C L, HUANG F H, HU W, et al. Buildingextraction from high-resolution remote sensing image based on Res_AttentionUnet[J]. Journal of Geo-Information Science, 2021, 23(12): 2232-2243. DOI:10.12082/dqxxkx.2021. 210008
doi: 10.12082/dqxxkx.2021. 210008
[10]   王振庆,周艺,王世新,等. IEU-Net高分辨率遥感影像房屋建筑物提取[J]. 遥感学报, 2021, 25(11): 2245-2254. DOI:10.11834/jrs.20210042
WANG Z Q, ZHOU Y, WANG S X,et al. House building extraction from high-resolution remote sensing images based on IEU-Net[J]. National Remote Sensing Bulletin, 2021, 25(11): 2245-2254. DOI:10.11834/jrs.20210042
doi: 10.11834/jrs.20210042
[11]   董子意, 杜震洪, 吴森森, 等. 基于改进U-Net网络的海洋中尺度涡自动检测模型[J]. 海洋学报, 2022, 44(2): 123-131. DOI:10.12284/hyxb2022038
DONG Z Y, DU Z H, WU S S, et al. An automatic marine mesoscale eddy detection model based on improved U-Net network[J]. Acta Oceanologica Sinica, 2022, 44(2): 123-131. DOI:10.12284/hyxb2022038
doi: 10.12284/hyxb2022038
[12]   李鑫伟, 李彦胜, 张永军. 弱监督深度语义分割网络的多源遥感影像水体检测[J]. 中国图象图形学报, 2021, 26(12): 3015-3026. DOI:10.11834/jig.200192
LI X W, LI Y S, ZHANG Y J. Weakly supervised deep semantic segmentation network for water body extraction based on multisource remote sensing imagery[J]. Journal of Image and Graphics, 2021, 26(12): 3015-3026. DOI:10.11834/jig.200192
doi: 10.11834/jig.200192
[13]   QI L, YAO Y, DAVID E E, et al. Remote sensing of brine shrimp cysts in Salt Lakes[J]. Remote Sensing of Environment, 2021, 266: 112695. DOI:10.1016/j.rse.2021.112695
doi: 10.1016/j.rse.2021.112695
[14]   杨佳林, 郭学俊, 陈泽华. 改进U-Net型网络的遥感图像道路提取[J]. 中国图象图形学报, 2021, 26(12): 3005-3014. DOI:10.11834/jig.200579
YANG J L, GUO X J, CHEN Z H. Road extraction method from remote sensing images based on improved U-Net network[J]. Journal of Image and Graphics, 2021, 26(12): 3005-3014. DOI:10.11834/jig.200579
doi: 10.11834/jig.200579
[15]   HE K M, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. DOI:10.1109/CVPR.2016.90
doi: 10.1109/CVPR.2016.90
[16]   侯向丹, 赵一浩, 刘洪普, 等. 融合残差注意力机制的UNet视盘分割[J]. 中国图象图形学报, 2020, 25(9): 1915-1929. DOI:10.11834/jig.190527
HOU X D, ZHAO Y H, LIU H P, et al. Optic disk segmentation by combining UNet and residual attention mechanism[J]. Journal of Image and Graphics, 2020, 25(9): 1915-1929. DOI:10.11834/jig.190527
doi: 10.11834/jig.190527
[17]   SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from over fitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[18]   CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// European Conference on Computer Vision. Munich: ECCV/NewYork: Springer, 2018: 833-851. DOI:10.1007/978-3-030-01234-2_49
doi: 10.1007/978-3-030-01234-2_49
[19]   何红术, 黄晓霞, 李红旮, 等. 基于改进U-Net网络的高分遥感影像水体提取[J]. 地球信息科学学报, 2020, 22(10): 2010-2022. DOI:10.12082/dqxxkx. 2020.190622
HE H S, HUANG X X, LI H G, et al. Water body extraction of high-resolution remote sensing image based on improved U-Net network [J]. Journal of Geo-Information Science, 2020, 22(10): 2010-2022. DOI:10.12082/dqxxkx.2020.190622
doi: 10.12082/dqxxkx.2020.190622
[20]   WOO S, PARK J, LEE J, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Munich: ECCV, 2018: 3-19. DOI:10.1007/978-3-030-01234-2_1
doi: 10.1007/978-3-030-01234-2_1
[21]   WANG Q L, WU B G, ZHU P F, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 11531-11539. DOI:10.1109/CVPR42600.2020.01155 .
doi: 10.1109/CVPR42600.2020.01155
[22]   WU Y, HE K. Group normalization[J]. International Journal of Computer Vision, 2020, 128: 742-755. DOI:10.1007/s11263-019-01198-w
doi: 10.1007/s11263-019-01198-w
[23]   ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Object detectors emerge in deep scene CNNs[J]. Computer Science, 2014. DOI:10.48550/arXiv. 1412.6856
doi: 10.48550/arXiv. 1412.6856
[24]   HE K M, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI:10.1109/TPAMI.2015.2389824
doi: 10.1109/TPAMI.2015.2389824
[25]   ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 6230-6239. DOI:10.1109/CVPR. 2017.660 .
doi: 10.1109/CVPR. 2017.660
[1] Feng DOU,Huiwen MA,Xinyang XIE,Wanwen YANG,Xue SHI,Li HAN,Bin LIN. Unsupervised generalized functional map learning for arbitrary 3D shape dense correspondence[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 736-744.
[2] Yujie LIU,Yafu YUAN,Xiaorui SUN,Zongmin LI. A point cloud processing network combining global and local information[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 770-780.
[3] Shengjia XU,Cheng SU,Kongyang ZHU,Xiaocan ZHANG. Automatic identification of mineral in petrographic thin sections based on images using a deep learning method[J]. Journal of Zhejiang University (Science Edition), 2022, 49(6): 743-752.
[4] Hualing LIU,Guoxiang ZHANG,Jun MA. Research progress of graph embedding algorithms[J]. Journal of Zhejiang University (Science Edition), 2022, 49(4): 443-456.
[5] YANG Bing, XU Dan, ZHANG Haoyuan, LUO Haini. Minority clothing recognition based on improved DenseNet-BC[J]. Journal of Zhejiang University (Science Edition), 2021, 48(6): 676-683.
[6] QIAN Lihui, WANG Bin, ZHENG Yunfei, ZHANG Jiajie, LI Mading, YU Bing. Depth of field videos classification based on image depth prediction[J]. Journal of Zhejiang University (Science Edition), 2021, 48(3): 282-288.
[7] CHEN Yuanqiong, ZOU Beiji, ZHANG Meihua, LIAO Wangmin, HUANG Jiaer, ZHU Chengzhang. A review on deep learning interpretability in medical image processing[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 18-29.
[8] FU Yingying, ZHANG Feng, DU Zhenhong, LIU Renyi. Multi-step prediction of PM2.5 hourly concentration by fusing graph convolution neural network and attention mechanism[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 74-83.
[9] LI Junyi, REN Tao, LU Luzheng. A comparative study of big text data mining methods on tourist emotion computing[J]. Journal of Zhejiang University (Science Edition), 2020, 47(4): 507-520.
[10] Shanxiong CHEN, Xiaolong WANG, Xu HAN, Yun LIU, Minggui WANG. A recognition method of Ancient Yi character based on deep learning[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 261-269.
[11] Rui ZHENG, Wenhua QIAN, Dan XU, Yuanyuan PU. Synthesis of embroidery based on convolutional neural network[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 270-278.
[12] Jie HUANG, Feng ZHANG, Zhenhong DU, Renyi LIU, Xiaopei CAO. Hourly concentration prediction of PM2.5 based on RNN-CNN ensemble deep learning model[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379.
[13] HU Weijian, CHEN Wei, FENG Haozhe, ZHANG Tianping, ZHU Zhengmao, PAN Qiaoming. A survey of depth learning methods for detecting lung nodules by CT images[J]. Journal of Zhejiang University (Science Edition), 2017, 44(4): 379-384.