Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (5): 1100-1108    DOI: 10.3785/j.issn.1008-973X.2026.05.019
计算机技术、控制工程     
全局局部特征融合的遥感图像建筑物提取
李国燕1(),于威1,梅玉鹏1,*(),张明辉1,王新强2
1. 天津城建大学 计算机与信息工程学院,天津 300384
2. 天津中德应用技术大学 软件与通信学院,天津 300350
Building extraction from remote sensing images with global-local feature fusion
Guoyan LI1(),Wei YU1,Yupeng MEI1,*(),Minghui ZHANG1,Xinqiang WANG2
1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China
2. Software & Communication School, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China
 全文: PDF(1900 KB)   HTML
摘要:

现有方法兼顾捕捉全局语义与局部细节特征的能力不足,导致复杂场景下出现建筑物信息的漏检、误检问题,为此提出基于BuildFormer的双分支特征融合与相互增强网络(DFFME-Net). 引入VGG13骨干网络来提取局部特征,设计多尺度全局局部特征融合方法(MGLFF)将双分支各阶段特征融合,获得丰富的特征表示. 为了打破双分支独立状态,实现2个分支互相促进特征提取,提出双分支特征融合模块(DFM). 为了增强模型对感兴趣区域的关注度,在DFM中引入通道优先卷积注意力机制(CPCA),实现增强特征表征和动态分配注意力权重. 为了验证所提网络的有效性和适用性,在WHU和Massachusetts建筑数据集上进行实验. 在WHU测试集上DFFME-Net的交并比和F1分数分别为91.81%和95.73%,在Massachusetts测试集上分别为77.01%和87.01%,网络性能优于一些先进模型. 结果表明,在高分辨率遥感影像建筑物提取领域中,所提网络具有良好的工程应用价值.

关键词: 高分辨率遥感影像建筑物提取全局和局部特征多尺度特征融合注意力机制    
Abstract:

Existing methods show insufficient capability in simultaneously capturing global semantic information and local detailed features. As a result, missed detections and false detections of building information often occur in complex scenes. To address this problem, a dual-branch feature fusion and mutual enhancement network (DFFME-Net) based on BuildFormer was proposed. Firstly, VGG13 was employed as the backbone to extract local features, and a multi-scale global-local feature fusion method (MGLFF) was designed to fuse features from both branches at each stage, generating enriched feature representations. Secondly, to break the independence between the two branches and promote mutual reinforcement in feature extraction, a dual-branch feature fusion module (DFM) was introduced. Finally, to enhance the network’s focus on regions of interest, a channel prior convolutional attention mechanism (CPCA) was incorporated into the DFM, enabling improved feature representation and dynamically allocating attention weights. Experiments were conducted on the WHU and Massachusetts building datasets to verify the effectiveness and applicability of the proposed network. DFFME-Net achieved an IoU of 91.81% and an F1-score of 95.73% on the WHU test set, and 77.01% and 87.01% on the Massachusetts test set, respectively, outperforming several advanced models. Results indicate that the proposed network has strong practical application value in high-resolution remote sensing building extraction tasks.

Key words: high-resolution remote sensing image    building extraction    global and local features    multi-scale feature fusion    attention mechanism
收稿日期: 2025-05-30 出版日期: 2026-05-06
CLC:  P 237  
基金资助: 天津市科技特派员项目(24YDTPJC00410).
通讯作者: 梅玉鹏     E-mail: ligy@tcu.edu.cn;myp@tcu.edu.cn
作者简介: 李国燕(1984—),女,副教授,博士,从事机器视觉、下一代网络技术研究. orcid.org/0000-0003-3224-2824. E-mail:ligy@tcu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李国燕
于威
梅玉鹏
张明辉
王新强

引用本文:

李国燕,于威,梅玉鹏,张明辉,王新强. 全局局部特征融合的遥感图像建筑物提取[J]. 浙江大学学报(工学版), 2026, 60(5): 1100-1108.

Guoyan LI,Wei YU,Yupeng MEI,Minghui ZHANG,Xinqiang WANG. Building extraction from remote sensing images with global-local feature fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1100-1108.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.05.019        https://www.zjujournals.com/eng/CN/Y2026/V60/I5/1100

图 1  双分支特征融合与相互增强网络
图 2  双分支特征融合模块
图 3  通道优先卷积注意力机制
方法IoUF1PROA
Trans-UNet90.2194.8594.7594.9598.85
Swin-Unet89.6794.5694.6794.4498.79
DCSwin88.8394.0994.4893.7098.69
MANet90.6295.0895.4694.7098.91
BuildFormer90.8495.2094.8695.5498.93
DFFME-Net91.8195.7395.9395.5399.05
表 1  在WHU建筑数据集上不同图像分割方法的定量比较
图 4  在WHU建筑数据集上不同图像分割方法的定性比较
方法IoUF1PROA
Trans-UNet75.2685.8887.6184.2295.09
Swin-Unet74.4585.3686.7584.0194.89
DCSwin72.7784.2486.5382.0694.55
MANet74.9785.6988.0683.4595.06
BuildFormer75.7286.1988.0684.3995.20
DFFME-Net77.0187.0188.5185.5695.47
表 2  在Massachusetts建筑数据集上不同图像分割方法的定量比较
图 5  在Massachusetts建筑数据集上不同图像分割方法的定性比较
方法IoUF1PROA
基准模型90.8495.2094.8695.5498.93
+MGLFF91.2795.4495.5295.3598.99
+MGLFF+DFM91.8195.7395.9395.5399.05
表 3  在WHU建筑数据集上多尺度全局局部特征融合方法和双分支特征融合模块的消融实验结果
方法IoUF1PROA
不含RCM91.0595.3295.2195.4298.96
含1个RCM91.7595.7095.9295.4899.04
含2个RCM91.7895.7295.9495.4999.04
含4个RCM91.8195.7395.9395.5399.05
表 4  在WHU建筑数据集上残差卷积模块的消融实验结果
方法IoUF1PROA
不含CPCA91.6195.6295.8395.4199.03
含有CPCA91.8195.7395.9395.5399.05
表 5  在WHU建筑数据集上通道优先卷积注意力机制的消融实验结果
1 LI Q, MOU L, SUN Y, et al A review of building extraction from remote sensing imagery: geometrical structures and semantic attributes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4702315
doi: 10.1109/tgrs.2024.3369723
2 CHEN S, OGAWA Y, ZHAO C, et al Large-scale individual building extraction from open-source satellite imagery via super-resolution-based instance segmentation approach[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 129- 152
doi: 10.1016/j.isprsjprs.2022.11.006
3 HE W, LI J, CAO W, et al. Building extraction from remote sensing images via an uncertainty-aware network [EB/OL]. (2023–07–03) [2025–04–01]. https://arxiv.org/pdf/2307.12309.
4 GUO H, SU X, WU C, et al Decoupling semantic and edge representations for building footprint extraction from remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5613116
doi: 10.1109/tgrs.2023.3287298
5 CHEN M, MAO T, WU J, et al SAU-Net: a novel network for building extraction from high-resolution remote sensing images by reconstructing fine-grained semantic features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 6747- 6761
doi: 10.1109/JSTARS.2024.3371427
6 LIU J, GU H, LI Z, et al Multi-scale feature fusion attention network for building extraction in remote sensing images[J]. Electronics, 2024, 13 (5): 923
doi: 10.3390/electronics13050923
7 RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. [S.l.]: Springer, 2015: 234–241.
8 LI R, ZHENG S, ZHANG C, et al Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5607713
9 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc., 2017: 6000–6010.
10 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021–06–03) [2025–04–01]. https://arxiv.org/pdf/2010.11929.
11 LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2022: 9992–10002.
12 CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation [C]// Computer Vision–ECCV 2022 Workshops. [S.l.]: Springer, 2023: 205–218.
13 LONG J, LI M, WANG X Integrating spatial details with long-range contexts for semantic segmentation of very high-resolution remote-sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 2501605
14 CHANG J, CEN Y, CEN G Asymmetric network combining CNN and transformer for building extraction from remote sensing images[J]. Sensors, 2024, 24 (19): 6198
doi: 10.3390/s24196198
15 YUAN Q Building rooftop extraction from high resolution aerial images using multiscale global perceptron with spatial context refinement[J]. Scientific Reports, 2025, 15: 6499
doi: 10.1038/s41598-025-91206-6
16 CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. (2021–02–08) [2025–04–01]. https://arxiv.org/pdf/2102.04306.
17 ZHANG R, ZHANG Q, ZHANG G SDSC-UNet: dual skip connection ViT-based U-shaped model for building extraction[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6005005
doi: 10.1109/lgrs.2023.3270303
18 ZHANG R, WAN Z, ZHANG Q, et al DSAT-net: dual spatial attention transformer for building extraction from aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 6008405
19 FU W, XIE K, FANG L Complementarity-aware local–global feature fusion network for building extraction in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5617113
20 XU L, LI Y, XU J, et al BCTNet: bi-branch cross-fusion transformer for building footprint extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 4402014
21 LI Y, HONG D, LI C, et al HD-Net: high-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2024, 209: 51- 65
doi: 10.1016/j.isprsjprs.2024.01.022
22 WANG L, LI R, DUAN C, et al A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6506105
doi: 10.1109/lgrs.2022.3143368
23 WANG L, FANG S, MENG X, et al Building extraction with vision transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625711
24 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2015–04–10) [2025–04–01]. https://arxiv.org/pdf/1409.1556.
25 YOO J, KIM T, LEE S, et al. Enriched CNN-transformer feature aggregation networks for super-resolution [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 4945–4954.
26 HUANG H, CHEN Z, ZOU Y, et al Channel prior convolutional attention for medical image segmentation[J]. Computers in Biology and Medicine, 2024, 178: 108784
doi: 10.1016/j.compbiomed.2024.108784
27 JI S, WEI S, LU M Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (1): 574- 586
doi: 10.1109/TGRS.2018.2858817
[1] 宋耀莲,彭驰,唐菁敏,赵宣植,虞贵财. 基于融合注意力机制的光学遥感图像小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(4): 763-771.
[2] 万刚,王小波,石纲,叶德震,朱思思,司帆. 基于特征细化与注意力增强重构的水下图像增强算法[J]. 浙江大学学报(工学版), 2026, 60(4): 800-811.
[3] 陈文强,冯琳越,王东丹,顾玉磊,赵轩. 融合动态风险图与多变量注意力机制的车辆轨迹预测模型[J]. 浙江大学学报(工学版), 2026, 60(3): 455-467.
[4] 胡从裕,殷晨波,马伟,杨超,颜士宽. 基于改进CNN-LSTM的挖掘机作业对象识别[J]. 浙江大学学报(工学版), 2026, 60(3): 536-545.
[5] 李彬彬,张超,覃涛,陈昌盛,刘兴艳,杨靖. 面向光伏电站建设的移动端人体跌倒检测方法[J]. 浙江大学学报(工学版), 2026, 60(3): 546-555.
[6] 李国燕,李鹏辉,刘榕,梅玉鹏,张明辉. 融合多尺度分辨率和带状特征的遥感道路提取[J]. 浙江大学学报(工学版), 2026, 60(3): 585-593.
[7] 方芳,严军,郭红想,王勇. 基于时空注意力机制的轻量级脑纹识别算法[J]. 浙江大学学报(工学版), 2026, 60(3): 633-642.
[8] 王爽,章熙泰,郭永存,孙守锁. 基于深度网络的可控混合式磁力耦合器退磁诊断[J]. 浙江大学学报(工学版), 2026, 60(2): 279-286.
[9] 李宪华,杜鹏飞,宋韬,邱洵,蔡钰. 基于多尺度滑窗注意力时序卷积网络的脑电信号分类[J]. 浙江大学学报(工学版), 2026, 60(2): 370-378.
[10] 杨明辉,宋牧原,付大喜,郭炎伟,卢贤锥,张文聪,郑伟龙. 基于多头自注意力-Bi-LSTM模型的盾构掘进引发的土体沉降预测[J]. 浙江大学学报(工学版), 2026, 60(2): 415-424.
[11] 周思瑶,夏楠,江佳鸿. 姿态引导的双分支换装行人重识别网络[J]. 浙江大学学报(工学版), 2026, 60(1): 71-80.
[12] 张学军,梁书滨,白万荣,张奉鹤,黄海燕,郭梅凤,陈卓. 基于异构图表征的源代码漏洞检测方法[J]. 浙江大学学报(工学版), 2025, 59(8): 1644-1652.
[13] 林宜山,左景,卢树华. 基于多头自注意力机制与MLP-Interactor的多模态情感分析[J]. 浙江大学学报(工学版), 2025, 59(8): 1653-1661.
[14] 翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[15] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.