Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (10): 1924-1934    DOI: 10.3785/j.issn.1008-973X.2022.10.004
    
Building extraction based on multiple multiscale-feature fusion attention network
Dong-jie YANG1(),Xian-jun GAO1,*(),Shu-hao RAN1,Guang-bin ZHANG1,Ping WANG2,3,Yuan-wei YANG1,4,5
1. School of Geosciences, Yangtze University, Wuhan 430100, China
2. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
3. Key Laboratory of Earth Observation of Hainan Province, Sanya 572029, China
4. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan 411201, China
5. Beijing Key Laboratory of Urban Spatial Information Engineering, Beijing Institute of Surveying and Mapping, Beijing 100045, China
Download: HTML     PDF(3454KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A novel neural network named multiple multiscale-feature fusion attention network (MMFA-Net) was proposed for building segmentation from high-resolution remote sensing images aiming at the disadvantages that the fully convolutional networks for building extraction have the problems of over-segmentation and internal cavity. U-Net was used as the backbone combined with multiple-extract efficient channel attention (MECA) and multiscale-feature fusion attention (MFA) structure. The MECA module was designed to strengthen the effectiveness of the feature information through the weight ratio, which was in the skip connection. The transition allocation of attention to invalid features was avoided. The multiple feature extraction was adopted to reduce the loss of effective features. The MFA module was positioned at the bottom of the model. Different spatial features and spectral dimension features were obtained through the combination of parallel continuous medium or small-scale atrous convolution and channel attention. Then the problem of pixel loss of large buildings caused by atrous convolution was alleviated. The MMFA-Net integrating the MECA and the MFA modules can promote the integrity and accuracy of building extraction results. The proposed MMFA-Net was verified on WHU, Massachusetts, and owner-drawing building datasets. MMFA-Net showed better performance compared with the other five comparison methods. The F1-Score and IoU of MMFA-Net reached 93.33%, 87.50% at WHU datasets, 85.38%, 74.49% at Massachusetts datasets, and 88.46%, 79.31% at owner-drawing datasets, respectively.



Key wordsdeep learning      high-resolution remote sensing image      building extraction      multiscale-feature fusion      efficient channel attention module      U-Net     
Received: 05 January 2022      Published: 25 October 2022
CLC:  TP 753  
Fund:  海南省地球观测重点实验室开放基金资助项目(2020LDE001);自然资源部地理国情监测重点实验室开放基金资助项目(2020NGCM07);城市轨道交通数字化建设与测评技术国家工程实验室开放课题基金资助项目(2021ZH02);湖南科技大学测绘遥感信息工程湖南省重点实验室开放基金资助项目(E22133);城市空间信息工程北京市重点实验室经费资助项目(20210205)
Corresponding Authors: Xian-jun GAO     E-mail: 2021710420@yangtzeu.edu.cn;junxgao@yangtzeu.edu.cn
Cite this article:

Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.10.004     OR     https://www.zjujournals.com/eng/Y2022/V56/I10/1924


基于多重多尺度融合注意力网络的建筑物提取

针对全卷积神经网络模型在进行建筑物提取时易产生过度分割以及内部空洞的问题,提出基于多重多尺度融合注意力网络(MMFA-Net)的高分辨率遥感影像建筑物提取方法. 该方法以U-Net为主体架构,设计2个模块:多重高效通道注意力(MECA)和多尺度特征融合注意力(MFA). MECA设计在模型跳跃连接中,通过权重配比强化有效特征信息,避免注意力向无效特征的过渡分配;采用多重特征提取,减少有效特征的损失. MFA被嵌入模型底部,结合并行连续中小尺度空洞卷积与通道注意力,获得不同的空间特征与光谱维度特征,缓解空洞卷积造成的大型建筑物像素缺失问题. MMFA-Net通过融合MECA和MFA,提高了建筑物提取结果的完整度和精确率. 将模型在WHU、Massachusetts和自绘建筑物数据集上进行验证,在定量评价方面优于其他5种对比方法,F1分数和IoU分别达到93.33%、87.50%;85.38%、74.49%和88.46%、79.31%.


关键词: 深度学习,  高分辨遥感影像,  建筑物提取,  多尺度特征融合,  高效通道注意力模块,  U-Net 
Fig.1 MECA model diagram
Fig.2 MFA model diagram
Fig.3 MMFA-Net model diagram
Fig.4 Building datasets images and corresponding label images
参数 数值
输入图像像素 256×256
优化器 Adam[28]
学习率 0.0001
每次训练选取样本数 6
在Massachusetts数据集上的训练轮数 200
在WHU数据集上的训练轮数 50
在自绘数据集上的训练轮数 200
Tab.1 Table of training parameters
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.01 85.33 82.06 71.91 83.66
SegNet 93.42 81.24 84.22 70.51 82.70
DeepLabV3+ 93.25 81.17 83.15 69.70 82.15
MAP-Net 93.88 84.10 82.91 71.68 83.50
BRRNet 94.01 84.10 83.77 72.31 83.93
MMFA-Net 94.45 84.01 86.79 74.49 85.38
Tab.2 Quantitative evaluation of five building extraction networks on Massachusetts building dataset
Fig.5 Building extraction results of various methods on Massachusetts building dataset
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 98.20 90.25 94.00 85.34 92.09
SegNet 98.21 91.38 92.69 85.24 92.03
DeepLabV3+ 98.12 90.14 93.28 84.64 91.68
MAP-Net 98.36 92.91 92.35 86.27 92.63
BRRNet 98.33 91.52 93.68 86.19 92.58
MMFA-Net 98.51 93.04 93.63 87.50 93.33
Tab.3 Quantitative evaluation of five building extraction networks on WHU building dataset
Fig.6 Building extraction results of various methods on WHU building dataset
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.63 88.85 80.48 73.10 84.46
SegNet 95.31 89.62 83.88 76.45 86.65
DeepLabV3+ 94.75 92.93 76.90 72.65 84.16
MAP-Net 95.69 92.55 82.93 77.74 87.48
BRRNet 95.51 91.24 83.26 77.10 87.07
MMFA-Net 95.94 91.40 85.71 79.31 88.46
Tab.4 Quantitative evaluation of five building extraction networks on owner-drawing building dataset
Fig.7 Building extraction results by different methods on owner-drawing buildings dataset
Fig.8 Big building extraction results by different methods on owner-drawing buildings dataset
Fig.9 Total parameters and training time
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.01 85.33 82.06 71.91 83.66
U-Net+ECA 94.26 86.58 82.01 72.76 84.23
U-Net+MECA 94.38 85.52 84.19 73.69 84.85
U-Net+MECA+MFA(MMFA) 94.45 84.01 86.79 74.49 85.38
Tab.5 Quantitative evaluation results with different fusion modules in Massachusetts building dataset
网络模型 OA/% P/% R/% IoU/% F1/%
MMFA(2次) 94.45 84.01 86.79 74.49 85.38
MMFA(3次) 94.37 86.81 82.38 73.22 84.54
MMFA(4次) 93.64 81.78 84.88 71.38 83.30
MMFA(5次) 94.17 85.43 82.92 72.65 84.16
Tab.6 Quantitative comparison with independent one-dimensional convolution fusion at different times in Massachusetts building dataset
[1]   范荣双, 陈洋, 徐启恒, 等 基于深度学习的高分辨率遥感影像建筑物提取方法[J]. 测绘学报, 2019, 48 (1): 34- 41
FAN Rong-shuang, CHEN Yang, XU Qi-heng, et al A high-resolution remote sensing image building extraction method based on deep learning[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48 (1): 34- 41
doi: 10.11947/j.AGCS.2019.20170638
[2]   BLASCHKE T Object based image analysis for remote sensing[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2010, 65 (1): 2- 16
doi: 10.1016/j.isprsjprs.2009.06.004
[3]   冉树浩, 胡玉龙, 杨元维, 等 基于样本形态变换的高分遥感影像建筑物提取[J]. 浙江大学学报: 工学版, 2020, 54 (5): 996- 1006
RAN Shu-hao, HU Yu-long, YANG Yuan-wei, et al Building extraction from high resolution remote sensing image based on sample morphological transformation[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (5): 996- 1006
[4]   JUNG C R, SCHRAMM R. Rectangle detection based on a windowed Hough transform [C]// Proceedings of 17th Brazilian Symposium on Computer Graphics and Image Processing. Curitiba: IEEE, 2004: 113-120.
[5]   季顺平, 魏世清 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48 (4): 448- 459
JI Shun-ping, WEI Shi-qing Building extraction via convolutional neural networks from an open remote sensing building dataset[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48 (4): 448- 459
doi: 10.11947/j.AGCS.2019.20180206
[6]   BOULILA W, SELLAMI M, DRISS M, et al RS-DCNN: a novel distributed convolutional-neural-networks based-approach for big remote-sensing image classification[J]. Computers and Electronics in Agriculture, 2021, 182: 106014
doi: 10.1016/j.compag.2021.106014
[7]   HAN W, FENG R, WANG L, et al A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145: 23- 43
doi: 10.1016/j.isprsjprs.2017.11.004
[8]   AILONG M, YUTING W, YANFEI Z, et al SceneNet: remote sensing scene classification deep learning network using multi-objective neural evolution architecture search[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 171- 188
doi: 10.1016/j.isprsjprs.2020.11.025
[9]   SAITO S, YAMASHITA T, AOKI Y Multiple object extraction from aerial imagery with convolutional neural networks[J]. Electronic Imaging, 2016, 2016 (10): 1- 9
[10]   BALL J E, ANDERSON D T, CHAN C S Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community[J]. Journal of Applied Remote Sensing, 2017, 11 (4): 042609
[11]   MNIH V. Machine learning for aerial image labeling [D]. Canada: University of Toronto, 2013.
[12]   SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39 (4): 640- 651
[13]   RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[14]   YI Y, ZHANG Z, ZHANG W, et al Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network[J]. Remote Sensing, 2019, 11 (15): 1774
doi: 10.3390/rs11151774
[15]   SHAO Z, TANG P, WANG Z, et al BRRNet: a fully convolutional neural network for automatic building extraction from high-resolution remote sensing images[J]. Remote Sensing, 2020, 12 (6): 1050
doi: 10.3390/rs12061050
[16]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
[17]   CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05)[2022-01-05]. https://arxiv.53yu.com/abs/1706.05587.
[18]   RAN S H, GAO X J, YANG Y W, et al Building multi-feature fusion refined network for building extraction from high-resolution remote sensing images[J]. Remote Sensing, 2021, 13 (14): 2794
doi: 10.3390/rs13142794
[19]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[20]   GAO Z, XIE J, WANG Q, et al. Global second-order pooling convolutional networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3024-3033.
[21]   FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146-3154.
[22]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 3-19.
[23]   WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2020.
[24]   LIN M, CHEN Q, YAN S. Network in network [EB/OL]. (2014-03-04)[2022-01-05]. https://arxiv.org/abs/1312.4400.
[25]   IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// International Conference on Machine Learning. Lille: PMLR, 2015: 448-456.
[26]   WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation [C]// 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 1451-1460.
[27]   JI S, WEI S, MENG L Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57 (1): 574- 586
[28]   KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30)[2022-01-05]. https://arxiv.org/abs/1412.6980.
[29]   MILLETARI F, NAVAB N, AHMADI S A. V-net: fully convolutional neural networks for volumetric medical image segmentation [C]// 2016 4th International Conference on 3D Vision. Stanford: IEEE, 2016: 565-571.
[30]   BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
[31]   CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 801-818.
[1] Jin-zhen LIU,Fei CHEN,Hui XIONG. Open electrical impedance imaging algorithm based on multi-scale residual network model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1789-1795.
[2] Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.
[3] Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.
[4] Yong-sheng ZHAO,Rui-xiang LI,Na-na NIU,Zhi-yong ZHAO. Shape control method of fuselage driven by digital twin[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1457-1463.
[5] Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1006-1016.
[6] Xue-qin ZHANG,Tian-ren LI. Breast cancer pathological image classification based on Cycle-GAN and improved DPN network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 727-735.
[7] Jing-hui CHU,Li-dong SHI,Pei-guang JING,Wei LV. Context-aware knowledge distillation network for object detection[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 503-509.
[8] Ruo-ran CHENG,Xiao-li ZHAO,Hao-jun ZHOU,Han-chen YE. Review of Chinese font style transfer research based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 510-519, 530.
[9] Xiang-dong PENG,Cong-cheng PAN,Ze-jun KE,Hua-qiang ZHU,Xiao ZHOU. Classification method for electrocardiograph signals based on parallel architecture model and spatial-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1912-1923.
[10] Tong CHEN,Jian-feng GUO,Xin-zhong HAN,Xue-li XIE,Jian-xiang XI. Visible and infrared image matching method based on generative adversarial model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 63-74.
[11] Song REN,Qian-wen ZHU,Xin-yue TU,Chao DENG,Xiao-shu WANG. Lining disease identification of highway tunnel based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 92-99.
[12] Xing LIU,Jian-bo YU. Attention convolutional GRU-based autoencoder and its application in industrial process monitoring[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(9): 1643-1651.
[13] Xue-yun CHEN,Xiao-qiao HUANG,Li XIE. Classification and detection method of blood cells images based on multi-scale conditional generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(9): 1772-1781.
[14] Jia-cheng LIU,Jun-zhong JI. Classification method of fMRI data based on broad learning system[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(7): 1270-1278.
[15] Li-sheng JIN,Qiang HUA,Bai-cang GUO,Xian-yi XIE,Fu-gang YAN,Bo-tao WU. Multi-target tracking of vehicles based on optimized DeepSort[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1056-1064.