Building extraction from remote sensing images with global-local feature fusion

doi:10.3785/j.issn.1008-973X.2026.05.019

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (5): 1100-1108 DOI: 10.3785/j.issn.1008-973X.2026.05.019

Building extraction from remote sensing images with global-local feature fusion

Guoyan LI1(

),Wei YU1,Yupeng MEI1,*(

),Minghui ZHANG1,Xinqiang WANG2

1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China
2. Software & Communication School, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China

Download:

HTML

PDF(1900KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

Existing methods show insufficient capability in simultaneously capturing global semantic information and local detailed features. As a result, missed detections and false detections of building information often occur in complex scenes. To address this problem, a dual-branch feature fusion and mutual enhancement network (DFFME-Net) based on BuildFormer was proposed. Firstly, VGG13 was employed as the backbone to extract local features, and a multi-scale global-local feature fusion method (MGLFF) was designed to fuse features from both branches at each stage, generating enriched feature representations. Secondly, to break the independence between the two branches and promote mutual reinforcement in feature extraction, a dual-branch feature fusion module (DFM) was introduced. Finally, to enhance the network’s focus on regions of interest, a channel prior convolutional attention mechanism (CPCA) was incorporated into the DFM, enabling improved feature representation and dynamically allocating attention weights. Experiments were conducted on the WHU and Massachusetts building datasets to verify the effectiveness and applicability of the proposed network. DFFME-Net achieved an IoU of 91.81% and an F1-score of 95.73% on the WHU test set, and 77.01% and 87.01% on the Massachusetts test set, respectively, outperforming several advanced models. Results indicate that the proposed network has strong practical application value in high-resolution remote sensing building extraction tasks.

Key words： high-resolution remote sensing image building extraction global and local features multi-scale feature fusion attention mechanism

Received: 30 May 2025 Published: 06 May 2026

CLC:

P 237

Fund: 天津市科技特派员项目（24YDTPJC00410）.

Corresponding Authors: Yupeng MEI E-mail: ligy@tcu.edu.cn;myp@tcu.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Guoyan LI
	Wei YU
	Yupeng MEI
	Minghui ZHANG
	Xinqiang WANG

Cite this article:

Guoyan LI,Wei YU,Yupeng MEI,Minghui ZHANG,Xinqiang WANG. Building extraction from remote sensing images with global-local feature fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1100-1108.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.05.019 OR https://www.zjujournals.com/eng/Y2026/V60/I5/1100

全局局部特征融合的遥感图像建筑物提取

现有方法兼顾捕捉全局语义与局部细节特征的能力不足，导致复杂场景下出现建筑物信息的漏检、误检问题，为此提出基于BuildFormer的双分支特征融合与相互增强网络（DFFME-Net）. 引入VGG13骨干网络来提取局部特征，设计多尺度全局局部特征融合方法（MGLFF）将双分支各阶段特征融合，获得丰富的特征表示. 为了打破双分支独立状态，实现2个分支互相促进特征提取，提出双分支特征融合模块（DFM）. 为了增强模型对感兴趣区域的关注度，在DFM中引入通道优先卷积注意力机制（CPCA），实现增强特征表征和动态分配注意力权重. 为了验证所提网络的有效性和适用性，在WHU和Massachusetts建筑数据集上进行实验. 在WHU测试集上DFFME-Net的交并比和F1分数分别为91.81%和95.73%，在Massachusetts测试集上分别为77.01%和87.01%，网络性能优于一些先进模型. 结果表明，在高分辨率遥感影像建筑物提取领域中，所提网络具有良好的工程应用价值.

关键词： 高分辨率遥感影像, 建筑物提取, 全局和局部特征, 多尺度特征融合, 注意力机制

Fig.1 Dual-branch feature fusion and mutual enhancement network

Fig.2 Dual-branch feature fusion module

Fig.3 Channel prior convolutional attention mechanism

Tab.1 Quantitative comparison of different image segmenta-tion methods on WHU building dataset %

Fig.4 Qualitative comparison of different image segmentation methods on WHU building dataset

Tab.2 Quantitative comparison of different image segmenta-tion methods on Massachusetts building dataset %

Fig.5 Qualitative comparison of different image segmentation methods on Massachusetts building dataset

Tab.3 Ablation study results of multi-scale global-local fusion method and dual-branch feature fusion module on WHU building dataset %

Tab.4 Ablation study results of residual convolution module on WHU building dataset %

Tab.5 Ablation study results of channel prior convolutional attention mechanism on WHU building dataset %


[1]	LI Q, MOU L, SUN Y, et al A review of building extraction from remote sensing imagery: geometrical structures and semantic attributes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4702315 doi: 10.1109/tgrs.2024.3369723

[2]	CHEN S, OGAWA Y, ZHAO C, et al Large-scale individual building extraction from open-source satellite imagery via super-resolution-based instance segmentation approach[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 129- 152 doi: 10.1016/j.isprsjprs.2022.11.006

[3]	HE W, LI J, CAO W, et al. Building extraction from remote sensing images via an uncertainty-aware network [EB/OL]. (2023–07–03) [2025–04–01]. https://arxiv.org/pdf/2307.12309.

[4]	GUO H, SU X, WU C, et al Decoupling semantic and edge representations for building footprint extraction from remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5613116 doi: 10.1109/tgrs.2023.3287298

[5]	CHEN M, MAO T, WU J, et al SAU-Net: a novel network for building extraction from high-resolution remote sensing images by reconstructing fine-grained semantic features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 6747- 6761 doi: 10.1109/JSTARS.2024.3371427

[6]	LIU J, GU H, LI Z, et al Multi-scale feature fusion attention network for building extraction in remote sensing images[J]. Electronics, 2024, 13 (5): 923 doi: 10.3390/electronics13050923

[7]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. [S.l.]: Springer, 2015: 234–241.

[8]	LI R, ZHENG S, ZHANG C, et al Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5607713

[9]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc., 2017: 6000–6010.

[10]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021–06–03) [2025–04–01]. https://arxiv.org/pdf/2010.11929.

[11]	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2022: 9992–10002.

[12]	CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation [C]// Computer Vision–ECCV 2022 Workshops. [S.l.]: Springer, 2023: 205–218.

[13]	LONG J, LI M, WANG X Integrating spatial details with long-range contexts for semantic segmentation of very high-resolution remote-sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 2501605

[14]	CHANG J, CEN Y, CEN G Asymmetric network combining CNN and transformer for building extraction from remote sensing images[J]. Sensors, 2024, 24 (19): 6198 doi: 10.3390/s24196198

[15]	YUAN Q Building rooftop extraction from high resolution aerial images using multiscale global perceptron with spatial context refinement[J]. Scientific Reports, 2025, 15: 6499 doi: 10.1038/s41598-025-91206-6

[16]	CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. (2021–02–08) [2025–04–01]. https://arxiv.org/pdf/2102.04306.

[17]	ZHANG R, ZHANG Q, ZHANG G SDSC-UNet: dual skip connection ViT-based U-shaped model for building extraction[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6005005 doi: 10.1109/lgrs.2023.3270303

[18]	ZHANG R, WAN Z, ZHANG Q, et al DSAT-net: dual spatial attention transformer for building extraction from aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6008405

[19]	FU W, XIE K, FANG L Complementarity-aware local–global feature fusion network for building extraction in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5617113

[20]	XU L, LI Y, XU J, et al BCTNet: bi-branch cross-fusion transformer for building footprint extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 4402014

[21]	LI Y, HONG D, LI C, et al HD-Net: high-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2024, 209: 51- 65 doi: 10.1016/j.isprsjprs.2024.01.022

[22]	WANG L, LI R, DUAN C, et al A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6506105 doi: 10.1109/lgrs.2022.3143368

[23]	WANG L, FANG S, MENG X, et al Building extraction with vision transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625711

[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015–04–10) [2025–04–01]. https://arxiv.org/pdf/1409.1556.

[25]	YOO J, KIM T, LEE S, et al. Enriched CNN-transformer feature aggregation networks for super-resolution [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 4945–4954.

[26]	HUANG H, CHEN Z, ZOU Y, et al Channel prior convolutional attention for medical image segmentation[J]. Computers in Biology and Medicine, 2024, 178: 108784 doi: 10.1016/j.compbiomed.2024.108784

[27]	JI S, WEI S, LU M Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (1): 574- 586 doi: 10.1109/TGRS.2018.2858817

[1]	Yaolian SONG,Chi PENG,Jingmin TANG,Xuanzhi ZHAO,Guicai YU. Small object detection algorithm for optical remote sensing images based on fusion attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 763-771.

[2]	Gang WAN,Xiaobo WANG,Gang SHI,Dezhen YE,Sisi ZHU,Fan SI. Underwater image enhancement algorithm based on feature refinement and attention-augmented reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 800-811.

[3]	Wenqiang CHEN,Linyue FENG,Dongdan WANG,Yulei GU,Xuan ZHAO. Vehicle trajectory prediction model integrating dynamic risk map and multivariate attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 455-467.

[4]	Congyu HU,Chenbo YIN,Wei MA,Chao YANG,Shikuan YAN. Object recognition of excavator operation based on improved CNN-LSTM[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 536-545.

[5]	Binbin LI,Chao ZHANG,Tao QIN,Changsheng CHEN,Xingyan LIU,Jing YANG. Mobile-based human fall detection method for photovoltaic power plant construction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 546-555.

[6]	Guoyan LI,Penghui LI,Rong LIU,Yupeng MEI,Minghui ZHANG. Remote sensing road extraction by fusing multi-scale resolution and strip feature[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 585-593.

[7]	Fang FANG,Jun YAN,Hongxiang GUO,Yong WANG. Lightweight brainprint recognition algorithm based on spatio-temporal attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 633-642.

[8]	Shuang WANG,Xitai ZHANG,Yongcun GUO,Shousuo SUN. Demagnetization fault diagnosis of controllable hybrid magnetic couplers based on deep neural networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 279-286.

[9]	Xianhua LI,Pengfei DU,Tao SONG,Xun QIU,Yu CAI. EEG signal classification based on multi-scale sliding-window attention temporal convolutional networks[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 370-378.

[10]	Minghui YANG,Muyuan SONG,Daxi FU,Yanwei GUO,Xianzhui LU,Wencong ZHANG,Weilong ZHENG. Prediction of shield tunneling-induced soil settlement based on multi-head self-attention-Bi-LSTM model[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 415-424.

[11]	Siyao ZHOU,Nan XIA,Jiahong JIANG. Pose-guided dual-branch network for clothing-changing person re-identification[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 71-80.

[12]	Fujian WANG,Zetian ZHANG,Xiqun CHEN,Dianhai WANG. Usage prediction of shared bike based on multi-channel graph aggregation attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1986-1995.

[13]	Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

[14]	Yishan LIN,Jing ZUO,Shuhua LU. Multimodal sentiment analysis based on multi-head self-attention mechanism and MLP-Interactor[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1653-1661.

[15]	Yahong ZHAI,Yaling CHEN,Longyan XU,Yu GONG. Improved YOLOv8s lightweight small target detection algorithm of UAV aerial image[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1708-1717.

Viewed

Full text

Abstract

Cited

Shared

Discussed