Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (7): 1335-1344    DOI: 10.3785/j.issn.1008-973X.2023.07.008
    
Semantic segmentation network for remote sensing image based on multi-scale mutual attention
Chun-juan LIU1(),Ze QIAO1,Hao-wen YAN2,3,Xiao-suo WU1,3,*(),Jia-wei WANG1,Yu-qiang XIN1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. School of Surveying, Mapping and Geographic Information, Lanzhou Jiaotong University, Lanzhou 730070, China
3. Academician Expert Workstation of Gansu Dayu Jiuzhou Space Information Technology Limited Company, Lanzhou 730070, China
Download: HTML     PDF(1726KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A network with multi-scale mutual attention and guidance upsampling was proposed in order to solve the segmentation accuracy degradation caused by the huge scale difference between target objects and the loss of spatial details in the semantic segmentation of remote sensing images. The multi-scale mutual attention module was used to obtain the pixel relations between different scale images and balance the weights of different target objects in order to improve the segmentation performance of small-scale objects. The image upsampling process was guided by the information in the coding structure, and spatial details were incorporated to enhance the classification of target object boundary pixels in the coding guidance upsampling module. The mIoU scores of the proposed network on the Potsdam dataset and Jiage dataset were 85.52% and 86.59% respectively, which increased by 1.32% and 1.46% compared with the suboptimal network.



Key wordsremote sensing image      semantic segmentation      multi-scale mutual attention      small scale object      coding guidance upsampling     
Received: 25 July 2022      Published: 17 July 2023
CLC:  TP 751  
Fund:  甘肃省自然科学基金资助项目(21JR7RA289);甘肃省重点研发资助项目(20YF8GA035)
Corresponding Authors: Xiao-suo WU     E-mail: liuchj@mail.lzjtu.cn;43452740@qq.com
Cite this article:

Chun-juan LIU,Ze QIAO,Hao-wen YAN,Xiao-suo WU,Jia-wei WANG,Yu-qiang XIN. Semantic segmentation network for remote sensing image based on multi-scale mutual attention. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1335-1344.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.07.008     OR     https://www.zjujournals.com/eng/Y2023/V57/I7/1335


基于多尺度互注意力的遥感图像语义分割网络

为了解决在遥感图像语义分割任务中存在的目标物体之间巨大尺度差异和丢失空间细节信息导致分割精度下降的问题,提出多尺度互注意力与指导上采样网络. 利用多尺度互注意力模块获得不同尺度图像之间的像素关系,平衡不同尺度物体的权重,提高小尺度物体的分割性能. 编码指导上采样模块利用编码结构中的信息,指导图像上采样的过程,融合空间细节信息,提升目标物体边界像素的分类效果. 在Potsdam数据集和Jiage数据集上的mIoU得分分别为85.52%和86.59%,较次优网络分别提升了1.32%和1.46%.


关键词: 遥感图像,  语义分割,  多尺度互注意力,  小尺度物体,  编码指导上采样 
Fig.1 Multi-scale mutual attention and guided upsampling network structure
Fig.2 Structure of multi-scale mutual attention module
Fig.3 Structure of code-guided upsampling module
名称缩写 描述
DCED 单尺度输入且骨干网络为VGG16的深度卷积编码-解码网络
DCED-MMA 在DCED基础上添加了MMA的网络
DCED-CGU 在DCED基础上添加了CGU的网络
DCED-MMA-CGU 在DCED基础上添加了MMA和CGU的网络
Tab.1 Abbreviation for all experimental strategies
网络模型 F1/% mIoU/% PA/%
DCED 84.85 74.33 86.29
DCED-MMA 91.36 84.21 91.39
DCED-CGU 90.56 82.87 90.92
DCED-MMA-CGU 92.15 85.52 92.33
Tab.2 Results of ablation experiments on Potsdam dataset
Fig.4 Local visual comparison results of ablation experiments on Potsdam dataset
模型 IoU/%
背景 汽车 不透水表面 低植被 建筑物
DCED 54.57 76.05 79.37 72.02 74.97 89.02
DCED-MMA 81.96 81.22 87.87 79.37 81.92 92.91
DCED-CGU 81.26 77.39 86.04 79.19 81.64 91.71
DCED-MMA-CGU 83.21 82.42 87.89 83.09 83.79 92.71
Tab.3 Results of ablation experiments of various categories on Potsdam dataset
网络模型 F1/% mIoU/% PA/%
DCED 84.71 75.25 91.89
DCED-MMA 91.34 84.50 94.73
DCED-CGU 90.93 83.86 94.44
DCED-MMA-CGU 92.66 86.59 95.13
Tab.4 Results of ablation experiments on Jiage dataset
模型 IoU/%
背景 植被 道路 建筑物
DCED 77.04 92.84 43.91 83.91 78.56
DCED-MMA 83.11 95.84 69.38 90.34 83.82
DCED-CGU 82.57 95.41 67.68 90.64 82.99
DCED-MMA-CGU 84.15 95.51 75.31 91.53 86.44
Tab.5 Results of ablation experiments of various categories on Jiage dataset
Fig.5 Local visual comparison results of ablation experiments on Jiage dataset
模型 IoU/% mIoU/%
背景 汽车 不透水表面 低植被 建筑物
SegNet 69.49 59.85 83.44 52.97 79.26 80.36 70.90
PSPNet 78.33 65.84 86.78 56.21 81.55 88.32 76.17
DeeplabV3 78.86 67.57 85.63 60.38 80.57 87.51 76.75
MSRF 77.22 73.86 85.56 73.40 79.60 90.66 80.05
EMANet 77.40 75.60 85.60 80.70 82.10 89.30 81.80
CCNet 76.39 78.79 87.60 79.62 82.24 89.71 82.39
DANNet 82.19 77.35 87.28 82.57 82.62 92.51 84.09
MagNet 79.54 82.09 88.67 79.85 83.00 92.07 84.20
DCED-MMA-CGU 83.21 82.42 87.89 83.09 83.79 92.71 85.52
Tab.6 Quantitative comparison with 8 state-of-the-art methods on Potsdam dataset
Fig.6 Local visual comparison results of PSPNet, CCNet, MagNet and DCED-MMA-CGU on Potsdam dataset
模型 IoU/% mIoU/%
背景 植被 道路 建筑物
SegNet 61.42 87.27 91.44 45.42 66.58 70.42
PSPNet 79.08 89.91 96.25 48.81 81.27 79.06
DeeplabV3 80.83 88.67 95.27 56.51 78.66 79.99
EMANet 81.93 88.37 95.13 63.88 82.52 82.37
MSRF 80.62 87.49 94.19 69.51 81.37 82.64
CCNet 81.29 90.86 95.30 67.06 81.64 83.23
MagNet 82.37 91.31 95.70 70.47 82.78 84.52
DANNet 81.33 90.51 94.58 75.28 83.96 85.13
DCED-MMA-CGU 84.15 91.53 95.51 75.31 86.44 86.59
Tab.7 Quantitative comparison with 8 state-of-the-art methods on Jiage dataset
Fig.7 Local visual comparison results of PSPNet, CCNet, MagNet and DCED-MMA-CGU on Jiage dataset
[1]   ZHANG X, XIAO Z, LI D, et al Semantic segmentation of remote sensing images using multiscale decoding network[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16 (9): 1492- 1496
doi: 10.1109/LGRS.2019.2901592
[2]   LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
[3]   RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[4]   CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/ abs/1706.05587.
[5]   ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2881-2890.
[6]   WANG X, LI Z, HUANG Y, et al Multimodal medical image segmentation using multi-scale context-aware network[J]. Neurocomputing, 2022, 486: 135- 146
doi: 10.1016/j.neucom.2021.11.017
[7]   DOU F, ZHANG C, HU D, et al EASNet: a multiscale attention semantic segmentation network combined with asymmetric convolution[J]. Journal of Electronic Imaging, 2022, 31 (4): 043034
[8]   LUO J, ZHAO L, ZHU L, et al Multi-scale receptive field fusion network for lightweight image super-resolution[J]. Neurocomputing, 2022, 493: 314- 326
doi: 10.1016/j.neucom.2022.04.038
[9]   LIN D, SHEN D, SHEN S, et al. Zigzagnet: fusing top-down and bottom-up context for object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7490-7499.
[10]   吴泽康, 赵姗, 李宏伟, 等 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报: 工学版, 2022, 56 (4): 795- 802
WU Ze-kang, ZHAO Shan, LI Hong-wei, et al Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 795- 802
[11]   FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146-3154.
[12]   HUANG Z, WANG X, HUANG L, et al. CCNet: criss-cross attention for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 603-612.
[13]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
[14]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 3–19.
[15]   ZHOU Z, ZHOU Y, WANG D, et al Self-attention feature fusion network for semantic segmentation[J]. Neurocomputing, 2021, 453: 50- 59
doi: 10.1016/j.neucom.2021.04.106
[16]   谭大宁, 刘瑜, 姚力波, 等 基于视觉注意力机制的多源遥感图像语义分割[J]. 信号处理, 2022, 38 (6): 1180- 1191
TAN Da-ning, LIU Yu, YAO Li-bo, et al Semantic segmentation of multi-source remote sensing images based on visual attention mechanism[J]. Journal of Signal Processing, 2022, 38 (6): 1180- 1191
[17]   ZOU L, ZHANG Z, DU H, et al DA-IMRN: dual-attention-guided interactive multi-scale residual network for hyperspectral image classification[J]. Remote Sensing, 2022, 14 (3): 530
doi: 10.3390/rs14030530
[18]   CUI W, WANG F, HE X, et al Multi-scale semantic segmentation and spatial relationship recognition of remote sensing images based on an attention model[J]. Remote Sensing, 2019, 11 (9): 1044
doi: 10.3390/rs11091044
[19]   QI X, LI K, LIU P, et al Deep attention and multi-scale networks for accurate remote sensing image segmentation[J]. IEEE Access, 2020, 8: 146627- 146639
doi: 10.1109/ACCESS.2020.3015587
[20]   SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2014-09-04]. https://arxiv.org/abs/1409.1556.
[21]   BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
[22]   SRIVASTAVA A, JHA D, CHANDA S, et al Msrf-net: a multi-scale residual fusion network for biomedical image segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 26 (5): 2252- 2263
[23]   LI X, ZHONG Z, WU J, et al. Expectation-maximization attention networks for semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9167-9176.
[24]   WU X, WU Z, GUO H, et al. DANNet: a one-stage domain adaptation network for unsupervised nighttime semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2021: 15769-15778.
[1] Hao-ran GUO,Ji-chang GUO,Yu-dong WANG. Lightweight semantic segmentation network for underwater image[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1278-1286.
[2] Hai-bo ZHANG,Lei CAI,Jun-ping REN,Ru-yan WANG,Fu LIU. Efficient and adaptive semantic segmentation network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1205-1214.
[3] Guo-hua ZHOU,Jian-wei LU,Tong-guang NI,Xue-long HU. Hierarchical nonlinear subspace dictionary learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1159-1167.
[4] Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.
[5] Yun-zuo ZHANG,Wei GUO,Zhao-quan CAI,Wen-bo LI. Remote sensing image target detection combining multi-scale and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2215-2223.
[6] Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.
[7] Jing-xin CHANG,Xian-jun GAO,Yuan-wei YANG,Shao-hua LI,Ping WANG. Building boundary optimization method based on object-oriented contour constraint GGVF Snake model[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1847-1855.
[8] Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.
[9] Shu-hao RAN,Yu-long HU,Yuan-wei YANG,Xian-jun GAO,Xi LI,Ming-zhu CHEN. Building extraction from high resolution remote sensing image based on samples morphological transformation[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(5): 996-1006.
[10] Xue-yan GAO,An-ning PAN,Yang YANG. Urban green space remote sensing image registration using image mixed features[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(6): 1205-1217.
[11] WU Ning, CHEN Qiu-xiao, ZHOU Ling, WAN Li. Multi-level method of optimizing vector graphs converted from remote sensing images[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(4): 581-587.