|
|
|
| Building extraction from remote sensing images with global-local feature fusion |
Guoyan LI1( ),Wei YU1,Yupeng MEI1,*( ),Minghui ZHANG1,Xinqiang WANG2 |
1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China 2. Software & Communication School, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China |
|
|
|
Abstract Existing methods show insufficient capability in simultaneously capturing global semantic information and local detailed features. As a result, missed detections and false detections of building information often occur in complex scenes. To address this problem, a dual-branch feature fusion and mutual enhancement network (DFFME-Net) based on BuildFormer was proposed. Firstly, VGG13 was employed as the backbone to extract local features, and a multi-scale global-local feature fusion method (MGLFF) was designed to fuse features from both branches at each stage, generating enriched feature representations. Secondly, to break the independence between the two branches and promote mutual reinforcement in feature extraction, a dual-branch feature fusion module (DFM) was introduced. Finally, to enhance the network’s focus on regions of interest, a channel prior convolutional attention mechanism (CPCA) was incorporated into the DFM, enabling improved feature representation and dynamically allocating attention weights. Experiments were conducted on the WHU and Massachusetts building datasets to verify the effectiveness and applicability of the proposed network. DFFME-Net achieved an IoU of 91.81% and an F1-score of 95.73% on the WHU test set, and 77.01% and 87.01% on the Massachusetts test set, respectively, outperforming several advanced models. Results indicate that the proposed network has strong practical application value in high-resolution remote sensing building extraction tasks.
|
|
Received: 30 May 2025
Published: 06 May 2026
|
|
|
| Fund: 天津市科技特派员项目(24YDTPJC00410). |
|
Corresponding Authors:
Yupeng MEI
E-mail: ligy@tcu.edu.cn;myp@tcu.edu.cn
|
全局局部特征融合的遥感图像建筑物提取
现有方法兼顾捕捉全局语义与局部细节特征的能力不足,导致复杂场景下出现建筑物信息的漏检、误检问题,为此提出基于BuildFormer的双分支特征融合与相互增强网络(DFFME-Net). 引入VGG13骨干网络来提取局部特征,设计多尺度全局局部特征融合方法(MGLFF)将双分支各阶段特征融合,获得丰富的特征表示. 为了打破双分支独立状态,实现2个分支互相促进特征提取,提出双分支特征融合模块(DFM). 为了增强模型对感兴趣区域的关注度,在DFM中引入通道优先卷积注意力机制(CPCA),实现增强特征表征和动态分配注意力权重. 为了验证所提网络的有效性和适用性,在WHU和Massachusetts建筑数据集上进行实验. 在WHU测试集上DFFME-Net的交并比和F1分数分别为91.81%和95.73%,在Massachusetts测试集上分别为77.01%和87.01%,网络性能优于一些先进模型. 结果表明,在高分辨率遥感影像建筑物提取领域中,所提网络具有良好的工程应用价值.
关键词:
高分辨率遥感影像,
建筑物提取,
全局和局部特征,
多尺度特征融合,
注意力机制
|
|
| [1] |
LI Q, MOU L, SUN Y, et al A review of building extraction from remote sensing imagery: geometrical structures and semantic attributes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4702315
doi: 10.1109/tgrs.2024.3369723
|
|
|
| [2] |
CHEN S, OGAWA Y, ZHAO C, et al Large-scale individual building extraction from open-source satellite imagery via super-resolution-based instance segmentation approach[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 129- 152
doi: 10.1016/j.isprsjprs.2022.11.006
|
|
|
| [3] |
HE W, LI J, CAO W, et al. Building extraction from remote sensing images via an uncertainty-aware network [EB/OL]. (2023–07–03) [2025–04–01]. https://arxiv.org/pdf/2307.12309.
|
|
|
| [4] |
GUO H, SU X, WU C, et al Decoupling semantic and edge representations for building footprint extraction from remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5613116
doi: 10.1109/tgrs.2023.3287298
|
|
|
| [5] |
CHEN M, MAO T, WU J, et al SAU-Net: a novel network for building extraction from high-resolution remote sensing images by reconstructing fine-grained semantic features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 6747- 6761
doi: 10.1109/JSTARS.2024.3371427
|
|
|
| [6] |
LIU J, GU H, LI Z, et al Multi-scale feature fusion attention network for building extraction in remote sensing images[J]. Electronics, 2024, 13 (5): 923
doi: 10.3390/electronics13050923
|
|
|
| [7] |
RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. [S.l.]: Springer, 2015: 234–241.
|
|
|
| [8] |
LI R, ZHENG S, ZHANG C, et al Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5607713
|
|
|
| [9] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc., 2017: 6000–6010.
|
|
|
| [10] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021–06–03) [2025–04–01]. https://arxiv.org/pdf/2010.11929.
|
|
|
| [11] |
LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2022: 9992–10002.
|
|
|
| [12] |
CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation [C]// Computer Vision–ECCV 2022 Workshops. [S.l.]: Springer, 2023: 205–218.
|
|
|
| [13] |
LONG J, LI M, WANG X Integrating spatial details with long-range contexts for semantic segmentation of very high-resolution remote-sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 2501605
|
|
|
| [14] |
CHANG J, CEN Y, CEN G Asymmetric network combining CNN and transformer for building extraction from remote sensing images[J]. Sensors, 2024, 24 (19): 6198
doi: 10.3390/s24196198
|
|
|
| [15] |
YUAN Q Building rooftop extraction from high resolution aerial images using multiscale global perceptron with spatial context refinement[J]. Scientific Reports, 2025, 15: 6499
doi: 10.1038/s41598-025-91206-6
|
|
|
| [16] |
CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. (2021–02–08) [2025–04–01]. https://arxiv.org/pdf/2102.04306.
|
|
|
| [17] |
ZHANG R, ZHANG Q, ZHANG G SDSC-UNet: dual skip connection ViT-based U-shaped model for building extraction[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6005005
doi: 10.1109/lgrs.2023.3270303
|
|
|
| [18] |
ZHANG R, WAN Z, ZHANG Q, et al DSAT-net: dual spatial attention transformer for building extraction from aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6008405
|
|
|
| [19] |
FU W, XIE K, FANG L Complementarity-aware local–global feature fusion network for building extraction in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5617113
|
|
|
| [20] |
XU L, LI Y, XU J, et al BCTNet: bi-branch cross-fusion transformer for building footprint extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 4402014
|
|
|
| [21] |
LI Y, HONG D, LI C, et al HD-Net: high-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2024, 209: 51- 65
doi: 10.1016/j.isprsjprs.2024.01.022
|
|
|
| [22] |
WANG L, LI R, DUAN C, et al A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6506105
doi: 10.1109/lgrs.2022.3143368
|
|
|
| [23] |
WANG L, FANG S, MENG X, et al Building extraction with vision transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625711
|
|
|
| [24] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015–04–10) [2025–04–01]. https://arxiv.org/pdf/1409.1556.
|
|
|
| [25] |
YOO J, KIM T, LEE S, et al. Enriched CNN-transformer feature aggregation networks for super-resolution [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 4945–4954.
|
|
|
| [26] |
HUANG H, CHEN Z, ZOU Y, et al Channel prior convolutional attention for medical image segmentation[J]. Computers in Biology and Medicine, 2024, 178: 108784
doi: 10.1016/j.compbiomed.2024.108784
|
|
|
| [27] |
JI S, WEI S, LU M Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (1): 574- 586
doi: 10.1109/TGRS.2018.2858817
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|