High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling

doi:10.3785/j.issn.1008-9497.2024.02.001

Journal of Zhejiang University (Science Edition)

2024, Vol. 51

Issue (2): 131-142 DOI: 10.3785/j.issn.1008-9497.2024.02.001

Geographic Information Science

High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling

Chaoyu WANG^1,²,Zhenhong DU^1,²(

),Yuanyuan WANG^2,³

^1.Department of Geographic Information Science，Zhejiang University，Hangzhou 310058，China
^2.Zhejiang Provincial Key Lab of Geographic Information Science，Zhejiang University，Hangzhou 310028，China
^3.Ocean Academy，Zhejiang University，Zhoushan 316021，Zhejiang Province，China

Download:

HTML( 10 )

PDF(7595KB)
Export: BibTeX | EndNote (RIS)

Abstract

High spatial resolution remote sensing images contain rich information, it is therefore very important to study their semantic segmentation. Traditional machine learning methods appear low accuracy and efficiency when used for segmenting high-resolution remote sensing images. In recent years, the deep learning method has developed rapidly and has become the mainstream method of image semantic segmentation. Some scholars have introduced SegNet, Deeplabv3+, U-Net and other neural networks into remote sensing image semantic segmentation, but these networks have only limited effect in remote sensing image semantic segmentation. This paper improves the U-Net network for semantic segmentation of remote sensing images. Firstly, an improved convolutional attention module channel interaction and spatial group attention module (CISGAM) is embedded in the feature extraction stage of the U-Net network, so that the network can obtain more effective features; secondly, a residual module is used in the decoding layer to replace the ordinary convolutional layer to avoid the degradation of the model. In addition, we use an attention pyramid pooling module (APPM) with CISGAM to connect the encoder and decoder of U-Net to enhance the network's extraction of multi-scale features. Finally, experiments are carried out on the UC Merced dataset with 0.3m resolution and the GID dataset with 1m resolution. Compared with the original networks such as U-Net and Deeplabv3+, the mean intersection over union （MIoU) of our method on the UCM dataset has increased by 14.56% and 8.72%, and the mean pixel accuracy （MPA) has increased by 12.71% and 8.24%, respectively. In the classification results on the GID dataset, the classification accuracy of waters, buildings and other objects has also been greatly improved. Compared with the original CBAM and PPM, the CISGAM and APPM also achieve certain performance improvement. The experimental results show that the feasibility and robustness of the model is stronger than traditional networks, and it can improve the accuracy of semantic segmentation of high-resolution remote sensing images through stronger feature extraction capabilities, hence providing a new approach for intelligent interpretation of high-resolution remote sensing images.

Key words： high-resolution remote sensing images deep learning semantic segmentation attention mechanism pyramid pooling

Received: 21 September 2022 Published: 08 March 2024

CLC:

P 208

Corresponding Authors: Zhenhong DU E-mail: duzhenhong@zju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Chaoyu WANG
	Zhenhong DU
	Yuanyuan WANG

Cite this article:

Chaoyu WANG,Zhenhong DU,Yuanyuan WANG. High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling. Journal of Zhejiang University (Science Edition), 2024, 51(2): 131-142.

URL:

https://www.zjujournals.com/sci/EN/Y2024/V51/I2/131

结合通道交互空间组注意力与金字塔池化的高分影像语义分割网络

高空间分辨率（高分）遥感影像中存在海量信息，因此对高分影像的语义分割研究十分重要。传统机器学习方法的语义分割精度和效率均不高，近年来，深度学习方法迅速发展，逐渐成为影像语义分割领域的常用方法，已有研究将SegNet、Deeplabv3+、U-Net等神经网络引入遥感影像语义分割，但效果有限。考虑高分影像的特性，对用于遥感影像语义分割的U-Net网络进行了改进。首先，在U-Net网络特征提取过程中使用通道交互空间组注意力模块（channel interaction and spatial group attention module，CISGAM），使得网络能够获取更多有效特征。其次，在编码过程中将普通卷积层变换为残差模块，并在U-Net的编码器和解码器之间用加入了CISGAM的注意力金字塔池化模块（attention pyramid pooling module，APPM）连接，以加强网络对多尺度特征的提取。最后，在0.3 m分辨率的UC Merced数据集和1 m分辨率的GID数据集上进行实验，与U-Net、Deeplabv3+等原始网络相比，在UC Merced数据集上的平均交并比（mean intersection over union，MIoU）分别提升了14.56%和8.72%，平均像素准确率（mean pixel accuracy，MPA）分别提升了12.71%和8.24%。在GID数据集的分割结果中，水体、建筑物等地物的综合分割精度大幅提升，在平均分割精度上，CISGAM和APPM较常用的CBAM和PPM有一定提升。实验结果表明，加入CISGAM和APPM的网络可行性与鲁棒性均较传统网络强，其较强的特征提取能力有利于提升高分辨率遥感影像语义分割的精度，为高分辨率遥感影像智能解译提供新方案。

关键词： 高分辨率遥感影像, 深度学习, 语义分割, 注意力机制, 金字塔池化

Fig.1 Model structure diagram

Fig.2 Channel attention in CBAM

Fig.3 Channel attention in CISGAM

Fig.4 Spatial attention in CBAM

Fig.5 Spatial attention in CISGAM

Fig.6 The operation process of CISGAM

Fig. 7 Attention pyramid pooling module

Fig.8 Schematic of the UC Merced dataset

Fig.9 Schematic of the GID dataset

Table 1 Basic system platform configuration

Table 2 Important software configuration

Table 3 Comparison of segmentation performance on UC Merced dataset

Fig.10 Comparison of the results on UC Merced dataset

Table 4 Comparison of segmentation performance on GID

Fig.11 Comparison of the results on GID

Fig.12 Comparison of the results from different pooling size

Table 5 Comparison of segmentation performance from different pooling size on UC Merced dataset


[1]	徐佳伟，刘伟，单浩宇，等. 基于PRCUnet的高分遥感影像建筑物提取［J］. 地球信息科学学报， 2021， 23（10）： 1838-1849. doi:10.12082/dqxxkx.2021.210283 XU J W， LIU W， SHAN H Y， et al. High-resolution remote sensing image building extraction based on PRCUnet［J］. Journal of Geo-Information Science， 2021， 23（10）： 1838-1849. doi:10.12082/dqxxkx.2021.210283 doi: 10.12082/dqxxkx.2021.210283

[2]	邵振峰，孙悦鸣，席江波，等. 智能优化学习的高空间分辨率遥感影像语义分割［J］. 武汉大学学报（信息科学版）， 2022， 47（2）： 234-241. DOI：10.13203/j.whugis20200640 SHAO Z F， SUN Y M， XI J B， et al. Intelligent optimization learning for semantic segmentation of high spatial resolution remote sensing images［J］. Geomatics and Information Science of Wuhan University， 2022， 47（2）： 234-241. DOI：10.13203/j.whugis20200640 doi: 10.13203/j.whugis20200640

[3]	张万福. 基于随机森林的图像语义分割算法的研究［J］. 电子科技， 2017， 30（2）： 72-75. DOI：10.16180/j.cnki.issn1007-7820.2017.02.019 ZHANG W F. Semantic segmentation algorithm based on random forests［J］. Electronic Science and Technology， 2017， 30（2）： 72-75. DOI：10.16180/j.cnki.issn1007-7820.2017.02.019 doi: 10.16180/j.cnki.issn1007-7820.2017.02.019

[4]	潘欣欣. 基于各向异性马尔科夫随机场的遥感影像分割［D］. 开封：河南大学， 2020. DOI：10.27114/d.cnki.ghnau.2020.000266 PAN X X. Remote Sensing Image Segmentation based on Anisotropic Markov Random Field［D］. Kaifeng： Henan University， 2020. DOI：10.27114/d.cnki.ghnau.2020.000266 doi: 10.27114/d.cnki.ghnau.2020.000266

[5]	袁正午，朱冠宇，丰江帆，等. 基于支持向量机的视频语义场景分割算法研究［J］. 重庆邮电大学学报（自然科学版）， 2010， 22（4）： 458-463. YUAN Z W， ZHU G Y， FENG J F， et al. Research on the method of video semantic scene constructing based on SVM［J］. Journal of Chongqing University of Posts and Telecommunications （Natural Science Edition）， 2010， 22（4）： 458-463.

[6]	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// 2015 IEEE Conference on Computer Vision and Patter-n Recognition （CVPR）. Boston： IEEE， 2015： 3431-3440. DOI：10.1109/CVPR.2015.7298965 doi: 10.1109/CVPR.2015.7298965

[7]	RONNEBERGER O， FISCHER P， BROX T. U-Net： Convolutional networks for biomedical image segme-ntation［C］// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich： Springer， 2015： 234-241. DOI：10.1007/978-3-319-24574-4_28 doi: 10.1007/978-3-319-24574-4_28

[8]	ZHOU Z W， SIDDIQUEE M M R， TAJBAKHSH N， et al. UNet++： Redesigning skip connections to exploit multiscale features in image segmentation［J］. IEEE Transactions on Medical Imaging，2020， 39（6）： 1856-1867. DOI：10.1109/TMI.2019.2959609 . doi: 10.1109/TMI.2019.2959609

[9]	李传林，黄风华，胡威，等. 基于Res_AttentionUnet的高分辨率遥感影像建筑物提取方法［J］. 地球信息科学学报， 2021， 23（12）： 2232-2243. DOI：10.12082/dqxxkx.2021.210008 LI C L， HUANG F H， HU W， et al. Buildingextraction from high-resolution remote sensing image based on Res_AttentionUnet［J］. Journal of Geo-Information Science， 2021， 23（12）： 2232-2243. DOI：10.12082/dqxxkx.2021. 210008 doi: 10.12082/dqxxkx.2021. 210008

[10]	王振庆，周艺，王世新，等. IEU-Net高分辨率遥感影像房屋建筑物提取［J］. 遥感学报， 2021， 25（11）： 2245-2254. DOI：10.11834/jrs.20210042 WANG Z Q， ZHOU Y， WANG S X，et al. House building extraction from high-resolution remote sensing images based on IEU-Net［J］. National Remote Sensing Bulletin， 2021， 25（11）： 2245-2254. DOI：10.11834/jrs.20210042 doi: 10.11834/jrs.20210042

[11]	董子意，杜震洪，吴森森，等. 基于改进U-Net网络的海洋中尺度涡自动检测模型［J］. 海洋学报， 2022， 44（2）： 123-131. DOI：10.12284/hyxb2022038 DONG Z Y， DU Z H， WU S S， et al. An automatic marine mesoscale eddy detection model based on improved U-Net network［J］. Acta Oceanologica Sinica， 2022， 44（2）： 123-131. DOI：10.12284/hyxb2022038 doi: 10.12284/hyxb2022038

[12]	李鑫伟，李彦胜，张永军. 弱监督深度语义分割网络的多源遥感影像水体检测［J］. 中国图象图形学报， 2021， 26（12）： 3015-3026. DOI：10.11834/jig.200192 LI X W， LI Y S， ZHANG Y J. Weakly supervised deep semantic segmentation network for water body extraction based on multisource remote sensing imagery［J］. Journal of Image and Graphics， 2021， 26（12）： 3015-3026. DOI：10.11834/jig.200192 doi: 10.11834/jig.200192

[13]	QI L， YAO Y， DAVID E E， et al. Remote sensing of brine shrimp cysts in Salt Lakes［J］. Remote Sensing of Environment， 2021， 266： 112695. DOI：10.1016/j.rse.2021.112695 doi: 10.1016/j.rse.2021.112695

[14]	杨佳林，郭学俊，陈泽华. 改进U-Net型网络的遥感图像道路提取［J］. 中国图象图形学报， 2021， 26（12）： 3005-3014. DOI：10.11834/jig.200579 YANG J L， GUO X J， CHEN Z H. Road extraction method from remote sensing images based on improved U-Net network［J］. Journal of Image and Graphics， 2021， 26（12）： 3005-3014. DOI：10.11834/jig.200579 doi: 10.11834/jig.200579

[15]	HE K M， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas： IEEE， 2016： 770-778. DOI：10.1109/CVPR.2016.90 doi: 10.1109/CVPR.2016.90

[16]	侯向丹，赵一浩，刘洪普，等. 融合残差注意力机制的UNet视盘分割［J］. 中国图象图形学报， 2020， 25（9）： 1915-1929. DOI：10.11834/jig.190527 HOU X D， ZHAO Y H， LIU H P， et al. Optic disk segmentation by combining UNet and residual attention mechanism［J］. Journal of Image and Graphics， 2020， 25（9）： 1915-1929. DOI：10.11834/jig.190527 doi: 10.11834/jig.190527

[17]	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： A simple way to prevent neural networks from over fitting［J］. The Journal of Machine Learning Research， 2014， 15（1）： 1929-1958.

[18]	CHEN L C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// European Conference on Computer Vision. Munich： ECCV/NewYork： Springer， 2018： 833-851. DOI：10.1007/978-3-030-01234-2_49 doi: 10.1007/978-3-030-01234-2_49

[19]	何红术，黄晓霞，李红旮，等. 基于改进U-Net网络的高分遥感影像水体提取［J］. 地球信息科学学报， 2020， 22（10）： 2010-2022. DOI：10.12082/dqxxkx. 2020.190622 HE H S， HUANG X X， LI H G， et al. Water body extraction of high-resolution remote sensing image based on improved U-Net network ［J］. Journal of Geo-Information Science， 2020， 22（10）： 2010-2022. DOI：10.12082/dqxxkx.2020.190622 doi: 10.12082/dqxxkx.2020.190622

[20]	WOO S， PARK J， LEE J， et al. CBAM： Convolutional block attention module［C］//European Conference on Computer Vision. Munich： ECCV， 2018： 3-19. DOI：10.1007/978-3-030-01234-2_1 doi: 10.1007/978-3-030-01234-2_1

[21]	WANG Q L， WU B G， ZHU P F， et al. ECA-Net： Efficient channel attention for deep convolutional neural networks［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle： IEEE， 2020： 11531-11539. DOI：10.1109/CVPR42600.2020.01155 . doi: 10.1109/CVPR42600.2020.01155

[22]	WU Y， HE K. Group normalization［J］. International Journal of Computer Vision， 2020， 128： 742-755. DOI：10.1007/s11263-019-01198-w doi: 10.1007/s11263-019-01198-w

[23]	ZHOU B， KHOSLA A， LAPEDRIZA A， et al. Object detectors emerge in deep scene CNNs［J］. Computer Science， 2014. DOI：10.48550/arXiv. 1412.6856 doi: 10.48550/arXiv. 1412.6856

[24]	HE K M， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. DOI：10.1109/TPAMI.2015.2389824 doi: 10.1109/TPAMI.2015.2389824

[25]	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Honolulu： IEEE， 2017： 6230-6239. DOI：10.1109/CVPR. 2017.660 . doi: 10.1109/CVPR. 2017.660

[1]	Feng DOU,Huiwen MA,Xinyang XIE,Wanwen YANG,Xue SHI,Li HAN,Bin LIN. Unsupervised generalized functional map learning for arbitrary 3D shape dense correspondence[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 736-744.

[2]	Yujie LIU,Yafu YUAN,Xiaorui SUN,Zongmin LI. A point cloud processing network combining global and local information[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 770-780.

[3]	Shengjia XU,Cheng SU,Kongyang ZHU,Xiaocan ZHANG. Automatic identification of mineral in petrographic thin sections based on images using a deep learning method[J]. Journal of Zhejiang University (Science Edition), 2022, 49(6): 743-752.

[4]	Hualing LIU,Guoxiang ZHANG,Jun MA. Research progress of graph embedding algorithms[J]. Journal of Zhejiang University (Science Edition), 2022, 49(4): 443-456.

[5]	YANG Bing, XU Dan, ZHANG Haoyuan, LUO Haini. Minority clothing recognition based on improved DenseNet-BC[J]. Journal of Zhejiang University (Science Edition), 2021, 48(6): 676-683.

[6]	QIAN Lihui, WANG Bin, ZHENG Yunfei, ZHANG Jiajie, LI Mading, YU Bing. Depth of field videos classification based on image depth prediction[J]. Journal of Zhejiang University (Science Edition), 2021, 48(3): 282-288.

[7]	CHEN Yuanqiong, ZOU Beiji, ZHANG Meihua, LIAO Wangmin, HUANG Jiaer, ZHU Chengzhang. A review on deep learning interpretability in medical image processing[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 18-29.

[8]	FU Yingying, ZHANG Feng, DU Zhenhong, LIU Renyi. Multi-step prediction of PM_2.5 hourly concentration by fusing graph convolution neural network and attention mechanism[J]. Journal of Zhejiang University (Science Edition), 2021, 48(1): 74-83.

[9]	LI Junyi, REN Tao, LU Luzheng. A comparative study of big text data mining methods on tourist emotion computing[J]. Journal of Zhejiang University (Science Edition), 2020, 47(4): 507-520.

[10]	Shanxiong CHEN, Xiaolong WANG, Xu HAN, Yun LIU, Minggui WANG. A recognition method of Ancient Yi character based on deep learning[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 261-269.

[11]	Rui ZHENG, Wenhua QIAN, Dan XU, Yuanyuan PU. Synthesis of embroidery based on convolutional neural network[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 270-278.

[12]	Jie HUANG, Feng ZHANG, Zhenhong DU, Renyi LIU, Xiaopei CAO. Hourly concentration prediction of PM_2.5 based on RNN-CNN ensemble deep learning model[J]. Journal of Zhejiang University (Science Edition), 2019, 46(3): 370-379.

[13]	HU Weijian, CHEN Wei, FENG Haozhe, ZHANG Tianping, ZHU Zhengmao, PAN Qiaoming. A survey of depth learning methods for detecting lung nodules by CT images[J]. Journal of Zhejiang University (Science Edition), 2017, 44(4): 379-384.

Viewed

Full text

Abstract

Cited

Shared

Discussed