Urban road network extraction method based on lightweight Transformer

doi:10.3785/j.issn.1008-973X.2024.01.005

Journal of ZheJiang University (Engineering Science)

2024, Vol. 58

Issue (1): 40-49 DOI: 10.3785/j.issn.1008-973X.2024.01.005

Urban road network extraction method based on lightweight Transformer

Zhicheng FENG1,2(

),Jie YANG1,2,*(

),Zhichao CHEN1,2

1. School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
2. Jiangxi Provincial Key Laboratory of Maglev Technology, Ganzhou 341000, China

Download:

HTML

PDF(4885KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A road network extraction method based on a lightweight Transformer was proposed, named RoadViT aiming at some limitations of the existing methods, such as imprecise road region extraction and limited real-time performance. The MobileViT architecture which could mix convolutional neural networks and the Transformer was used to encode features in order to efficiently extract high-level context information. Then a pyramid decoder was proposed to implement the extraction and fusion of multi-scale features, and the probability distribution of pixel categories was generated. The Mosaic method was combined with multi-scale scaling and random cropping strategies to implement data enhancement, which could construct fine and various remote sensing images. A dynamic weighting loss function was proposed to mitigate the problem according to the imbalance between the road category and background category in urban remote sensing images. The experimental results show that RoadViT, with a number of parameters of only 1.25 × 10⁶, can achieve an inference speed of up to 10 frames in a second on the Jetson TX2, and an accuracy of up to 57.0% on the CHN6-CUG dataset. The proposed method is an effective exploration of the lightweight Transformer in urban remote sensing images, which can achieve improved road extraction accuracy while maintaining the real-time performance of inference.

Key words： urban road network extraction Transformer MobileViT semantic segmentation of remote sensing image lightweight model

Received: 01 June 2023 Published: 07 November 2023

CLC:	TP 751
	U 212

Fund: 国家自然科学基金资助项目（62063009）

Corresponding Authors: Jie YANG E-mail: fengzhichengai@163.com;yangjie@jxust.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zhicheng FENG
	Jie YANG
	Zhichao CHEN

Cite this article:

Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.01.005 OR https://www.zjujournals.com/eng/Y2024/V58/I1/40

基于轻量级Transformer的城市路网提取方法

针对现有方法存在道路区域提取不精准和实时性不足的限制，提出基于轻量级Transformer的路网提取方法RoadViT. 利用卷积神经网络与Transformer混合的MobileViT架构进行编码特征，有效地提取高级上下文信息. 提出金字塔解码器实现多尺度特征的提取和融合，生成像素类别的概率分布. 结合Mosaic与多尺度缩放和随机裁剪策略实现数据增强，构建精细多样的遥感图像. 针对城市遥感图像中道路类别和背景类别的不平衡问题，提出动态加权损失函数. 实验结果表明，RoadViT的参数量仅为1.25 × 10⁶，在Jetson TX2上的推理速度可达10帧/s，在CHN6-CUG数据集上的精度可达57.0%. 所提方法是轻量级Transformer在城市遥感图像中的有效探索，在保证推理实时性的同时，实现道路提取精度的提升.

关键词： 城市路网提取, Transformer, MobileViT, 遥感图像语义分割, 轻量级模型

Fig.1 Structure of proposed urban road network extraction model RoadViT

Fig.2 Implementation process of MV2 module and multi-head attention mechanism

Fig.3 Structure of pyramid decoder

Fig.4 Graph and expression of dynamic weighting function

Fig.5 Implementation process of data enhancement

Tab.1 Effect of different techniques on model performance

Tab.2 Ablation experiments of RoadViT

Tab.3 Comparison of RoadViT and mainstream models on different datasets

Fig.6 Comparison of inference speed for models

Fig.7 Comparison of volume and segmentation accuracy for models

Fig.8 Impact of different techniques on segmentation effect

Fig.9 Comparison of actual road extraction results between RoadViT and mainstream models


[1]	WU S, DU C, CHEN H, et al Road extraction from very high resolution images using weakly labeled OpenStreetMap centerline[J]. International Journal of Geo-Information, 2019, 8 (11): 478 doi: 10.3390/ijgi8110478

[2]	CLAUSSMANN L, REVILLOUD M, GRUYER D, et al A review of motion planning for highway autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21 (5): 1826- 1848 doi: 10.1109/TITS.2019.2913998

[3]	YIN W, QIAN M, WANG L, et al Road extraction from satellite images with iterative cross-task feature enhancement[J]. Neurocomputing, 2022, 506: 300- 310 doi: 10.1016/j.neucom.2022.07.086

[4]	MA Y, WU H, WANG L, et al Remote sensing big data computing: challenges and opportunities[J]. Future Generation Computer Systems, 2015, 51: 47- 60 doi: 10.1016/j.future.2014.10.029

[5]	刘春娟, 乔泽, 闫浩文, 等基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报: 工学版, 2023, 57 (7): 1335- 1344 LIU Chunjuan, QIAO Ze, YAN Haowen, et al Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (7): 1335- 1344

[6]	BADRINARAYANAN V, KENDALL A, CIPOLLA R SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495 doi: 10.1109/TPAMI.2016.2644615

[7]	DAI J, ZHU T, ZHANG Y, et al Lane-level road extraction from high-resolution optical satellite images[J]. Remote Sensing, 2019, 11 (22): 2672 doi: 10.3390/rs11222672

[8]	CHEN L, ZHU Q, XIE X, et al Road extraction from VHR remote-sensing imagery via object segmentation constrained by Gabor features[J]. ISPRS International Journal of Geo-Information, 2018, 7 (9): 362 doi: 10.3390/ijgi7090362

[9]	陈智超, 焦海宁, 杨杰, 等基于改进MobileNet v2的垃圾图像分类算法[J]. 浙江大学学报: 工学版, 2021, 55 (8): 1490- 1499 CHEN Zhichao, JIAO Haining, YANG Jie, et al Garbage image classification algorithm based on improved MobileNet v2[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (8): 1490- 1499

[10]	HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3 [C]// Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 1314-1324.

[11]	HE K , ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.

[12]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. (2021-06-03) [2023-08-05]. https://arxiv.org/pdf/2010.11929.pdf.

[13]	MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision Transformer [EB/OL]. (2022-03-04) [2023-08-05]. https://arxiv.org/pdf/2110.02178.pdf.

[14]	SHELHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651 doi: 10.1109/TPAMI.2016.2572683

[15]	DU B, ZHAO Z, HU X, et al Landslide susceptibility prediction based on image semantic segmentation[J]. Computers and Geosciences, 2021, 155: 104860 doi: 10.1016/j.cageo.2021.104860

[16]	ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.

[17]	PAN H, HONG Y, SUN W, et al Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24 (3): 3448- 3460 doi: 10.1109/TITS.2022.3228042

[18]	FAN M, LAI S, HUANG J, et al. Rethinking BiSeNet for real-time semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 9711-9720.

[19]	YU C, GAO C, WANG J, et al BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068

[20]	XU J, XIONG Z, BHATTACHARYYA, et al. PIDNet: a real-time semantic segmentation network inspired by PID controllers [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S. l. ]: IEEE, 2023: 19529-19539.

[21]	ZHOU L, ZHANG C, WU M. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 192-196.

[22]	ZHU Q, ZHANG Y, WANG L, et al A global context-aware and batch-independent network for road extraction from VHR satellite imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175: 353- 365 doi: 10.1016/j.isprsjprs.2021.03.016

[23]	DIAKOGIANNIS F, WALDNER F, CACCETTA P, et al ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 162: 94- 114 doi: 10.1016/j.isprsjprs.2020.01.013

[24]	吴仁哲, 蔡嘉伦, 刘国祥, 等针对高分影像的RDU-Net乡村路网提取方法[J]. 遥感信息, 2021, 36 (1): 29- 36 WU Renzhe, CAI Jialun, LIU Guoxiang, et al Rural road network extraction for high resolution imagery using RDU-Net deep learning method[J]. Remote Sensing Information, 2021, 36 (1): 29- 36

[25]	LI J, SUN B, LI S, et al Semisupervised semantic segmentation of remote sensing images with consistency self-training[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1- 11

[26]	YOU Z, WANG J, CHEN S, et al FMWDCT: foreground mixup into weighted dual-network cross training for semisupervised remote sensing road extraction[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 5570- 5579 doi: 10.1109/JSTARS.2022.3188025

[27]	SONG J, LI J, CHEN H, et al MapGen-GAN: a fast translator for remote sensing image to map via unsupervised adversarial learning[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 2341- 2357 doi: 10.1109/JSTARS.2021.3049905

[28]	YANG Y, ZHOU L SRP-YOLOX: an improved deep convolutional neural network for automated via detection[J]. Microelectronics Reliability, 2023, 147: 115069 doi: 10.1016/j.microrel.2023.115069

[29]	SHI M, XIE F, YANG J, et al Cutout with patch-loss augmentation for improving generative adversarial networks against instability[J]. Computer Vision and Image Understanding, 2023, 234: 103761 doi: 10.1016/j.cviu.2023.103761

[30]	YUN S, HAN D, CHEN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seoul: IEEE, 2021: 6022-6031.

[1]	Xin JIN,Jian-jun ZHUANG,Zi-heng XU. Lightweight YOLOv5s network-based algorithm for identifying hazardous objects under vehicles[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1516-1526.

[2]	Hai-bo ZHANG,Lei CAI,Jun-ping REN,Ru-yan WANG,Fu LIU. Efficient and adaptive semantic segmentation network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1205-1214.

[3]	Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU. Compound fault decoupling diagnosis method based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 855-864.

[4]	Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.

[5]	Yu-xiang LU,Guan-hua XU,Bo TANG. Worker behavior recognition based on temporal and spatial self-attention of vision Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 446-454.

[6]	Jin-bo HU,Wei-zhi NIE,Dan SONG,Zhuo GAO,Yun-peng BAI,Feng ZHAO. Chest X-ray imaging disease diagnosis model assisted by deformable Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1923-1932.

[7]	Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.

[8]	Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.

[9]	Guo-peng ZHANG,Zi-han LI,Hao WANG,zheng ZHENG. Isolated AC-DC solid state transformer front and rear stages integrated sliding mode control[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 622-630.

[10]	Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.

[11]	Zhen-hong MA,Zhen LIU,Sheng-yong YIN,Rong-wei MA,Ke-ping YAN. Experimental study on melanoma cell ablation by high-voltage nanosecond pulsed electric field[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1168-1174.

[12]	Le XIE,Xi-dan HENG,Yang LIU,Qi-long JIANG,Dong LIU. Transformer fault diagnosis based on linear discriminant analysis and step-by-step machine learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(11): 2266-2272.

[13]	SU Guo-dong, SUN Ling-ling, WANG Xiang, WANG Zun-feng, ZHANG Sheng-zhou, LEI Yu-chao. Design of 126.6-128.1 GHz fundamental voltage control oscillator[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1788-1795.

[14]	LIU Xin, ZHENG Xiang-jie, HOU Qing-hui, SHI Jian-jiang. Current-sharing characteristic of converter composed of LLC with series-parallel transformer and interleaved Buck[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(4): 806-818.

[15]	ZHU Ming-lei, ZHAO Rong-xiang, YANG Huan. Power electronic transformer using multi-pulse rectification technique[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(9): 1861-1869.

Viewed

Full text

Abstract

Cited

Shared

Discussed