Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (1): 40-49    DOI: 10.3785/j.issn.1008-973X.2024.01.005
计算机技术     
基于轻量级Transformer的城市路网提取方法
冯志成1,2(),杨杰1,2,*(),陈智超1,2
1. 江西理工大学 电气工程与自动化学院,江西 赣州 341000
2. 江西省磁悬浮技术重点实验室,江西 赣州 341000
Urban road network extraction method based on lightweight Transformer
Zhicheng FENG1,2(),Jie YANG1,2,*(),Zhichao CHEN1,2
1. School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
2. Jiangxi Provincial Key Laboratory of Maglev Technology, Ganzhou 341000, China
 全文: PDF(4885 KB)   HTML
摘要:

针对现有方法存在道路区域提取不精准和实时性不足的限制,提出基于轻量级Transformer的路网提取方法RoadViT. 利用卷积神经网络与Transformer混合的MobileViT架构进行编码特征,有效地提取高级上下文信息. 提出金字塔解码器实现多尺度特征的提取和融合,生成像素类别的概率分布. 结合Mosaic与多尺度缩放和随机裁剪策略实现数据增强,构建精细多样的遥感图像. 针对城市遥感图像中道路类别和背景类别的不平衡问题,提出动态加权损失函数. 实验结果表明,RoadViT的参数量仅为1.25 × 106,在Jetson TX2上的推理速度可达10帧/s,在CHN6-CUG数据集上的精度可达57.0%. 所提方法是轻量级Transformer在城市遥感图像中的有效探索,在保证推理实时性的同时,实现道路提取精度的提升.

关键词: 城市路网提取TransformerMobileViT遥感图像语义分割轻量级模型    
Abstract:

A road network extraction method based on a lightweight Transformer was proposed, named RoadViT aiming at some limitations of the existing methods, such as imprecise road region extraction and limited real-time performance. The MobileViT architecture which could mix convolutional neural networks and the Transformer was used to encode features in order to efficiently extract high-level context information. Then a pyramid decoder was proposed to implement the extraction and fusion of multi-scale features, and the probability distribution of pixel categories was generated. The Mosaic method was combined with multi-scale scaling and random cropping strategies to implement data enhancement, which could construct fine and various remote sensing images. A dynamic weighting loss function was proposed to mitigate the problem according to the imbalance between the road category and background category in urban remote sensing images. The experimental results show that RoadViT, with a number of parameters of only 1.25 × 106, can achieve an inference speed of up to 10 frames in a second on the Jetson TX2, and an accuracy of up to 57.0% on the CHN6-CUG dataset. The proposed method is an effective exploration of the lightweight Transformer in urban remote sensing images, which can achieve improved road extraction accuracy while maintaining the real-time performance of inference.

Key words: urban road network extraction    Transformer    MobileViT    semantic segmentation of remote sensing image    lightweight model
收稿日期: 2023-06-01 出版日期: 2023-11-07
CLC:  TP 751  
基金资助: 国家自然科学基金资助项目(62063009)
通讯作者: 杨杰     E-mail: fengzhichengai@163.com;yangjie@jxust.edu.cn
作者简介: 冯志成(2000—),男,硕士生,从事遥感图像处理的研究. orcid.org/0000-0001-7887-4566. E-mail: fengzhichengai@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
冯志成
杨杰
陈智超

引用本文:

冯志成,杨杰,陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学学报(工学版), 2024, 58(1): 40-49.

Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.01.005        https://www.zjujournals.com/eng/CN/Y2024/V58/I1/40

图 1  提出的城市路网提取模型RoadViT的结构
图 2  MV2模块和多头注意力机制的实现过程
图 3  金字塔解码器的结构
图 4  动态加权函数的图形和表达式
图 5  数据增强的实现过程
编码器 解码器 数据增强方式 P/106 FLOPs/109 RIoU/%
MobileViT FCNHead 1.06 1.0 47.6
Cutout 1.06 1.0 47.9
Cutmix 1.06 1.0 48.7
Mosaic 1.06 1.0 49.9
多尺度缩放和随机裁剪 1.06 1.0 52.2
Mosaic与多尺度缩放和随机裁剪 1.06 1.0 53.6
SPP+FCNHead 1.92 1.89 49.5
金字塔解码器 1.25 1.18 50.4
MobileViT (无MHA) FCNHead 0.17 0.49 41.5
表 1  不同技术对模型性能的效果
模型 编码器 数据增强 动态加权损失 金字塔解码器 P/106 FLOPs/109 RIoU/%
RoadViT MobileViT 1.06 1.00 47.6
1.06 1.00 53.6
1.06 1.00 49.5
1.25 1.18 50.5
1.25 1.18 51.4
1.06 1.00 55.7
1.25 1.18 56.5
1.25 1.18 57.0
RoadViT-m MobileViT-xs 2.35 3.02 58.7
RoadViT-l MobileViT-s 5.97 6.01 59.7
表 2  RoadViT的消融实验
模型 P/106 FLOPs/109 RIoU/%
CHN6-CUG
数据集
DeepGlobe数据集
PSPNet[16](ResNet18[11]) 12.92 67.51 57.1 57.7
DeepLab V3[15](ResNet18) 13.60 85.97 57.6 58.6
PSPNet(MobileNet V2[9]) 2.65 10.72 55.3 54.5
DeepLab V3(MobileNet V2) 3.23 22.61 53.6 55.4
LRASPP[10] 3.22 1.98 51.1 51.1
BiseNet V2[19] 3.62 12.80 56.4 51.8
STDC[18] 14.23 23.51 60.7 54.6
DDRNet[17] 20.15 17.87 61.0 54.8
PIDNet[20] 7.62 5.89 60.0 52.6
RoadViT 1.25 1.18 57.0 52.3
RoadViT-m 2.35 3.02 58.7 53.7
RoadViT-l 5.97 6.01 59.7 54.3
D-LinkNet[22] 55.7
HsgNet[22] 57.7
表 3  RoadViT和主流模型在不同数据集上的对比
图 6  模型的推理速度对比
图 7  各模型的体积和分割精度对比
图 8  不同技术对分割效果的影响
图 9  RoadViT和主流模型的实际道路提取效果对比
1 WU S, DU C, CHEN H, et al Road extraction from very high resolution images using weakly labeled OpenStreetMap centerline[J]. International Journal of Geo-Information, 2019, 8 (11): 478
doi: 10.3390/ijgi8110478
2 CLAUSSMANN L, REVILLOUD M, GRUYER D, et al A review of motion planning for highway autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21 (5): 1826- 1848
doi: 10.1109/TITS.2019.2913998
3 YIN W, QIAN M, WANG L, et al Road extraction from satellite images with iterative cross-task feature enhancement[J]. Neurocomputing, 2022, 506: 300- 310
doi: 10.1016/j.neucom.2022.07.086
4 MA Y, WU H, WANG L, et al Remote sensing big data computing: challenges and opportunities[J]. Future Generation Computer Systems, 2015, 51: 47- 60
doi: 10.1016/j.future.2014.10.029
5 刘春娟, 乔泽, 闫浩文, 等 基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报: 工学版, 2023, 57 (7): 1335- 1344
LIU Chunjuan, QIAO Ze, YAN Haowen, et al Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (7): 1335- 1344
6 BADRINARAYANAN V, KENDALL A, CIPOLLA R SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
7 DAI J, ZHU T, ZHANG Y, et al Lane-level road extraction from high-resolution optical satellite images[J]. Remote Sensing, 2019, 11 (22): 2672
doi: 10.3390/rs11222672
8 CHEN L, ZHU Q, XIE X, et al Road extraction from VHR remote-sensing imagery via object segmentation constrained by Gabor features[J]. ISPRS International Journal of Geo-Information, 2018, 7 (9): 362
doi: 10.3390/ijgi7090362
9 陈智超, 焦海宁, 杨杰, 等 基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报: 工学版, 2021, 55 (8): 1490- 1499
CHEN Zhichao, JIAO Haining, YANG Jie, et al Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (8): 1490- 1499
10 HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.
11 HE K , ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
12 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. (2021-06-03) [2023-08-05]. https://arxiv.org/pdf/2010.11929.pdf.
13 MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision Transformer [EB/OL]. (2022-03-04) [2023-08-05]. https://arxiv.org/pdf/2110.02178.pdf.
14 SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
doi: 10.1109/TPAMI.2016.2572683
15 DU B, ZHAO Z, HU X, et al Landslide susceptibility prediction based on image semantic segmentation[J]. Computers and Geosciences, 2021, 155: 104860
doi: 10.1016/j.cageo.2021.104860
16 ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.
17 PAN H, HONG Y, SUN W, et al Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24 (3): 3448- 3460
doi: 10.1109/TITS.2022.3228042
18 FAN M, LAI S, HUANG J, et al. Rethinking BiSeNet for real-time semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 9711-9720.
19 YU C, GAO C, WANG J, et al BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068
20 XU J, XIONG Z, BHATTACHARYYA, et al. PIDNet: a real-time semantic segmentation network inspired by PID controllers [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2023: 19529-19539.
21 ZHOU L, ZHANG C, WU M. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 192-196.
22 ZHU Q, ZHANG Y, WANG L, et al A global context-aware and batch-independent network for road extraction from VHR satellite imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175: 353- 365
doi: 10.1016/j.isprsjprs.2021.03.016
23 DIAKOGIANNIS F, WALDNER F, CACCETTA P, et al ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 162: 94- 114
doi: 10.1016/j.isprsjprs.2020.01.013
24 吴仁哲, 蔡嘉伦, 刘国祥, 等 针对高分影像的RDU-Net乡村路网提取方法[J]. 遥感信息, 2021, 36 (1): 29- 36
WU Renzhe, CAI Jialun, LIU Guoxiang, et al Rural road network extraction for high resolution imagery using RDU-Net deep learning method[J]. Remote Sensing Information, 2021, 36 (1): 29- 36
25 LI J, SUN B, LI S, et al Semisupervised semantic segmentation of remote sensing images with consistency self-training[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1- 11
26 YOU Z, WANG J, CHEN S, et al FMWDCT: foreground mixup into weighted dual-network cross training for semisupervised remote sensing road extraction[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 5570- 5579
doi: 10.1109/JSTARS.2022.3188025
27 SONG J, LI J, CHEN H, et al MapGen-GAN: a fast translator for remote sensing image to map via unsupervised adversarial learning[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 2341- 2357
doi: 10.1109/JSTARS.2021.3049905
28 YANG Y, ZHOU L SRP-YOLOX: an improved deep convolutional neural network for automated via detection[J]. Microelectronics Reliability, 2023, 147: 115069
doi: 10.1016/j.microrel.2023.115069
29 SHI M, XIE F, YANG J, et al Cutout with patch-loss augmentation for improving generative adversarial networks against instability[J]. Computer Vision and Image Understanding, 2023, 234: 103761
doi: 10.1016/j.cviu.2023.103761
30 YUN S, HAN D, CHEN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seoul: IEEE, 2021: 6022-6031.
[1] 张海波,蔡磊,任俊平,王汝言,刘富. 基于Transformer的高效自适应语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(6): 1205-1214.
[2] 王誉翔,钟智伟,夏鹏程,黄亦翔,刘成良. 基于改进Transformer的复合故障解耦诊断方法[J]. 浙江大学学报(工学版), 2023, 57(5): 855-864.
[3] 吕鑫栋,李娇,邓真楠,冯浩,崔欣桐,邓红霞. 基于改进Transformer的结构化图像超分辨网络[J]. 浙江大学学报(工学版), 2023, 57(5): 865-874.
[4] 陆昱翔,徐冠华,唐波. 基于视觉Transformer时空自注意力的工人行为识别[J]. 浙江大学学报(工学版), 2023, 57(3): 446-454.
[5] 胡锦波,聂为之,宋丹,高卓,白云鹏,赵丰. 可形变Transformer辅助的胸部X光影像疾病诊断模型[J]. 浙江大学学报(工学版), 2023, 57(10): 1923-1932.
[6] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.
[7] 袁天乐,袁巨龙,朱勇建,郑翰辰. 基于改进YOLOv5的推力球轴承表面缺陷检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2349-2357.
[8] 胡晨, 吴新科, 彭方正, 钱照明. 变压器级联的双路均流准谐振反激LED驱动器[J]. 浙江大学学报(工学版), 2015, 49(4): 740-748.