Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (10): 1924-1934    DOI: 10.3785/j.issn.1008-973X.2022.10.004
自动化技术、信息工程     
基于多重多尺度融合注意力网络的建筑物提取
杨栋杰1(),高贤君1,*(),冉树浩1,张广斌1,王萍2,3,杨元维1,4,5
1. 长江大学 地球科学学院,湖北 武汉 430100
2. 中国科学院 空天信息创新研究院,北京 100094
3. 海南省地球观测重点实验室,海南 三亚 572029
4. 湖南科技大学 测绘遥感信息工程湖南省重点实验室,湖南 湘潭 411201
5. 北京市测绘设计研究院 城市空间信息工程北京市重点实验室,北京 100045
Building extraction based on multiple multiscale-feature fusion attention network
Dong-jie YANG1(),Xian-jun GAO1,*(),Shu-hao RAN1,Guang-bin ZHANG1,Ping WANG2,3,Yuan-wei YANG1,4,5
1. School of Geosciences, Yangtze University, Wuhan 430100, China
2. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
3. Key Laboratory of Earth Observation of Hainan Province, Sanya 572029, China
4. Hunan Provincial Key Laboratory of Geo-Information Engineering in Surveying, Mapping and Remote Sensing, Hunan University of Science and Technology, Xiangtan 411201, China
5. Beijing Key Laboratory of Urban Spatial Information Engineering, Beijing Institute of Surveying and Mapping, Beijing 100045, China
 全文: PDF(3454 KB)   HTML
摘要:

针对全卷积神经网络模型在进行建筑物提取时易产生过度分割以及内部空洞的问题,提出基于多重多尺度融合注意力网络(MMFA-Net)的高分辨率遥感影像建筑物提取方法. 该方法以U-Net为主体架构,设计2个模块:多重高效通道注意力(MECA)和多尺度特征融合注意力(MFA). MECA设计在模型跳跃连接中,通过权重配比强化有效特征信息,避免注意力向无效特征的过渡分配;采用多重特征提取,减少有效特征的损失. MFA被嵌入模型底部,结合并行连续中小尺度空洞卷积与通道注意力,获得不同的空间特征与光谱维度特征,缓解空洞卷积造成的大型建筑物像素缺失问题. MMFA-Net通过融合MECA和MFA,提高了建筑物提取结果的完整度和精确率. 将模型在WHU、Massachusetts和自绘建筑物数据集上进行验证,在定量评价方面优于其他5种对比方法,F1分数和IoU分别达到93.33%、87.50%;85.38%、74.49%和88.46%、79.31%.

关键词: 深度学习高分辨遥感影像建筑物提取多尺度特征融合高效通道注意力模块U-Net    
Abstract:

A novel neural network named multiple multiscale-feature fusion attention network (MMFA-Net) was proposed for building segmentation from high-resolution remote sensing images aiming at the disadvantages that the fully convolutional networks for building extraction have the problems of over-segmentation and internal cavity. U-Net was used as the backbone combined with multiple-extract efficient channel attention (MECA) and multiscale-feature fusion attention (MFA) structure. The MECA module was designed to strengthen the effectiveness of the feature information through the weight ratio, which was in the skip connection. The transition allocation of attention to invalid features was avoided. The multiple feature extraction was adopted to reduce the loss of effective features. The MFA module was positioned at the bottom of the model. Different spatial features and spectral dimension features were obtained through the combination of parallel continuous medium or small-scale atrous convolution and channel attention. Then the problem of pixel loss of large buildings caused by atrous convolution was alleviated. The MMFA-Net integrating the MECA and the MFA modules can promote the integrity and accuracy of building extraction results. The proposed MMFA-Net was verified on WHU, Massachusetts, and owner-drawing building datasets. MMFA-Net showed better performance compared with the other five comparison methods. The F1-Score and IoU of MMFA-Net reached 93.33%, 87.50% at WHU datasets, 85.38%, 74.49% at Massachusetts datasets, and 88.46%, 79.31% at owner-drawing datasets, respectively.

Key words: deep learning    high-resolution remote sensing image    building extraction    multiscale-feature fusion    efficient channel attention module    U-Net
收稿日期: 2022-01-05 出版日期: 2022-10-25
CLC:  TP 753  
基金资助: 海南省地球观测重点实验室开放基金资助项目(2020LDE001);自然资源部地理国情监测重点实验室开放基金资助项目(2020NGCM07);城市轨道交通数字化建设与测评技术国家工程实验室开放课题基金资助项目(2021ZH02);湖南科技大学测绘遥感信息工程湖南省重点实验室开放基金资助项目(E22133);城市空间信息工程北京市重点实验室经费资助项目(20210205)
通讯作者: 高贤君     E-mail: 2021710420@yangtzeu.edu.cn;junxgao@yangtzeu.edu.cn
作者简介: 杨栋杰(1990—),男,硕士生,从事高分辨遥感影像智能解译的研究. orcid.org/0000-0001-7815-3523. E-mail: 2021710420@yangtzeu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
杨栋杰
高贤君
冉树浩
张广斌
王萍
杨元维

引用本文:

杨栋杰,高贤君,冉树浩,张广斌,王萍,杨元维. 基于多重多尺度融合注意力网络的建筑物提取[J]. 浙江大学学报(工学版), 2022, 56(10): 1924-1934.

Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.10.004        https://www.zjujournals.com/eng/CN/Y2022/V56/I10/1924

图 1  MECA模块图
图 2  MFA模块图
图 3  MMFA-Net模型图
图 4  建筑物数据集影像及对应标签
参数 数值
输入图像像素 256×256
优化器 Adam[28]
学习率 0.0001
每次训练选取样本数 6
在Massachusetts数据集上的训练轮数 200
在WHU数据集上的训练轮数 50
在自绘数据集上的训练轮数 200
表 1  训练参数表
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.01 85.33 82.06 71.91 83.66
SegNet 93.42 81.24 84.22 70.51 82.70
DeepLabV3+ 93.25 81.17 83.15 69.70 82.15
MAP-Net 93.88 84.10 82.91 71.68 83.50
BRRNet 94.01 84.10 83.77 72.31 83.93
MMFA-Net 94.45 84.01 86.79 74.49 85.38
表 2  在Massachusetts建筑物数据集上与其他5种建筑物提取网络的定量对比
图 5  Massachusetts建筑物数据集上各种方法的建筑物提取结果
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 98.20 90.25 94.00 85.34 92.09
SegNet 98.21 91.38 92.69 85.24 92.03
DeepLabV3+ 98.12 90.14 93.28 84.64 91.68
MAP-Net 98.36 92.91 92.35 86.27 92.63
BRRNet 98.33 91.52 93.68 86.19 92.58
MMFA-Net 98.51 93.04 93.63 87.50 93.33
表 3  在WHU建筑物数据集上与其他5种先进建筑提取网络的定量对比
图 6  WHU建筑物数据集上各种方法的建筑物提取结果
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.63 88.85 80.48 73.10 84.46
SegNet 95.31 89.62 83.88 76.45 86.65
DeepLabV3+ 94.75 92.93 76.90 72.65 84.16
MAP-Net 95.69 92.55 82.93 77.74 87.48
BRRNet 95.51 91.24 83.26 77.10 87.07
MMFA-Net 95.94 91.40 85.71 79.31 88.46
表 4  自绘建筑物数据集上与其他5种先进建筑提取网络的定量对比
图 7  自绘建筑物数据集上各种方法的建筑物提取结果
图 8  自绘建筑物数据集上各种方法的大建筑物提取结果
图 9  总参数量与训练时间
网络模型 OA/% P/% R/% IoU/% F1/%
U-Net 94.01 85.33 82.06 71.91 83.66
U-Net+ECA 94.26 86.58 82.01 72.76 84.23
U-Net+MECA 94.38 85.52 84.19 73.69 84.85
U-Net+MECA+MFA(MMFA) 94.45 84.01 86.79 74.49 85.38
表 5  在Massachusetts建筑物数据集上进行消融实验的定量评价结果
网络模型 OA/% P/% R/% IoU/% F1/%
MMFA(2次) 94.45 84.01 86.79 74.49 85.38
MMFA(3次) 94.37 86.81 82.38 73.22 84.54
MMFA(4次) 93.64 81.78 84.88 71.38 83.30
MMFA(5次) 94.17 85.43 82.92 72.65 84.16
表 6  在Massachusetts建筑物数据集上与不同次数独立一维卷积融合的定量对比
1 范荣双, 陈洋, 徐启恒, 等 基于深度学习的高分辨率遥感影像建筑物提取方法[J]. 测绘学报, 2019, 48 (1): 34- 41
FAN Rong-shuang, CHEN Yang, XU Qi-heng, et al A high-resolution remote sensing image building extraction method based on deep learning[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48 (1): 34- 41
doi: 10.11947/j.AGCS.2019.20170638
2 BLASCHKE T Object based image analysis for remote sensing[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2010, 65 (1): 2- 16
doi: 10.1016/j.isprsjprs.2009.06.004
3 冉树浩, 胡玉龙, 杨元维, 等 基于样本形态变换的高分遥感影像建筑物提取[J]. 浙江大学学报: 工学版, 2020, 54 (5): 996- 1006
RAN Shu-hao, HU Yu-long, YANG Yuan-wei, et al Building extraction from high resolution remote sensing image based on sample morphological transformation[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (5): 996- 1006
4 JUNG C R, SCHRAMM R. Rectangle detection based on a windowed Hough transform [C]// Proceedings of 17th Brazilian Symposium on Computer Graphics and Image Processing. Curitiba: IEEE, 2004: 113-120.
5 季顺平, 魏世清 遥感影像建筑物提取的卷积神经元网络与开源数据集方法[J]. 测绘学报, 2019, 48 (4): 448- 459
JI Shun-ping, WEI Shi-qing Building extraction via convolutional neural networks from an open remote sensing building dataset[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48 (4): 448- 459
doi: 10.11947/j.AGCS.2019.20180206
6 BOULILA W, SELLAMI M, DRISS M, et al RS-DCNN: a novel distributed convolutional-neural-networks based-approach for big remote-sensing image classification[J]. Computers and Electronics in Agriculture, 2021, 182: 106014
doi: 10.1016/j.compag.2021.106014
7 HAN W, FENG R, WANG L, et al A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145: 23- 43
doi: 10.1016/j.isprsjprs.2017.11.004
8 AILONG M, YUTING W, YANFEI Z, et al SceneNet: remote sensing scene classification deep learning network using multi-objective neural evolution architecture search[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 172: 171- 188
doi: 10.1016/j.isprsjprs.2020.11.025
9 SAITO S, YAMASHITA T, AOKI Y Multiple object extraction from aerial imagery with convolutional neural networks[J]. Electronic Imaging, 2016, 2016 (10): 1- 9
10 BALL J E, ANDERSON D T, CHAN C S Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community[J]. Journal of Applied Remote Sensing, 2017, 11 (4): 042609
11 MNIH V. Machine learning for aerial image labeling [D]. Canada: University of Toronto, 2013.
12 SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39 (4): 640- 651
13 RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
14 YI Y, ZHANG Z, ZHANG W, et al Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network[J]. Remote Sensing, 2019, 11 (15): 1774
doi: 10.3390/rs11151774
15 SHAO Z, TANG P, WANG Z, et al BRRNet: a fully convolutional neural network for automatic building extraction from high-resolution remote sensing images[J]. Remote Sensing, 2020, 12 (6): 1050
doi: 10.3390/rs12061050
16 CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
17 CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. (2017-12-05)[2022-01-05]. https://arxiv.53yu.com/abs/1706.05587.
18 RAN S H, GAO X J, YANG Y W, et al Building multi-feature fusion refined network for building extraction from high-resolution remote sensing images[J]. Remote Sensing, 2021, 13 (14): 2794
doi: 10.3390/rs13142794
19 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
20 GAO Z, XIE J, WANG Q, et al. Global second-order pooling convolutional networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3024-3033.
21 FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3146-3154.
22 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 3-19.
23 WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2020.
24 LIN M, CHEN Q, YAN S. Network in network [EB/OL]. (2014-03-04)[2022-01-05]. https://arxiv.org/abs/1312.4400.
25 IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// International Conference on Machine Learning. Lille: PMLR, 2015: 448-456.
26 WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation [C]// 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 1451-1460.
27 JI S, WEI S, MENG L Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 57 (1): 574- 586
28 KINGMA D P, BA J. Adam: a method for stochastic optimization [EB/OL]. (2017-01-30)[2022-01-05]. https://arxiv.org/abs/1412.6980.
29 MILLETARI F, NAVAB N, AHMADI S A. V-net: fully convolutional neural networks for volumetric medical image segmentation [C]// 2016 4th International Conference on 3D Vision. Stanford: IEEE, 2016: 565-571.
30 BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
31 CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// Proceedings of the European Conference on Computer Vision. Munich: [s. n. ], 2018: 801-818.
[1] 刘近贞,陈飞,熊慧. 多尺度残差网络模型的开放式电阻抗成像算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1789-1795.
[2] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.
[3] 郝琨,王阔,王贝贝. 基于改进Mobilenet-YOLOv3的轻量级水下生物检测算法[J]. 浙江大学学报(工学版), 2022, 56(8): 1622-1632.
[4] 夏杰锋,唐武勤,杨强. 光伏航拍红外图像的热斑自动检测方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1640-1647.
[5] 赵永胜,李瑞祥,牛娜娜,赵志勇. 数字孪生驱动的机身形状控制方法[J]. 浙江大学学报(工学版), 2022, 56(7): 1457-1463.
[6] 何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016.
[7] 张雪芹,李天任. 基于Cycle-GAN和改进DPN网络的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2022, 56(4): 727-735.
[8] 褚晶辉,史李栋,井佩光,吕卫. 适用于目标检测的上下文感知知识蒸馏网络[J]. 浙江大学学报(工学版), 2022, 56(3): 503-509.
[9] 程若然,赵晓丽,周浩军,叶翰辰. 基于深度学习的中文字体风格转换研究综述[J]. 浙江大学学报(工学版), 2022, 56(3): 510-519, 530.
[10] 彭向东,潘从成,柯泽浚,朱华强,周肖. 基于并行架构和时空注意力机制的心电分类方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1912-1923.
[11] 陈彤,郭剑锋,韩心中,谢学立,席建祥. 基于生成对抗模型的可见光-红外图像匹配方法[J]. 浙江大学学报(工学版), 2022, 56(1): 63-74.
[12] 任松,朱倩雯,涂歆玥,邓超,王小书. 基于深度学习的公路隧道衬砌病害识别方法[J]. 浙江大学学报(工学版), 2022, 56(1): 92-99.
[13] 刘兴,余建波. 注意力卷积GRU自编码器及其在工业过程监控的应用[J]. 浙江大学学报(工学版), 2021, 55(9): 1643-1651.
[14] 陈雪云,黄小巧,谢丽. 基于多尺度条件生成对抗网络血细胞图像分类检测方法[J]. 浙江大学学报(工学版), 2021, 55(9): 1772-1781.
[15] 陈智超,焦海宁,杨杰,曾华福. 基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报(工学版), 2021, 55(8): 1490-1499.