Method for augmenting crack image datasets via fine-tuning of stable diffusion models

doi:10.3785/j.issn.1008-973X.2026.05.002

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (5): 926-934 DOI: 10.3785/j.issn.1008-973X.2026.05.002

Method for augmenting crack image datasets via fine-tuning of stable diffusion models

Jie WU1(

),Beilin HAN1,Yihang ZHANG1,Chao ZOU2,Lifeng XIN3,Shiping HUANG4,*(

)

1. School of Civil Engineering and Architecture, Wuhan Polytechnic University, Wuhan 430023, China
2. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
3. School of Mechanics and Transportation Engineering, Northwestern Polytechnical University, Xi’an 710072, China
4. School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510641, China

Download:

HTML

PDF(3964KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

To address the issues of data scarcity and class imbalance in crack image datasets, a crack image generation and dataset augmentation method based on low-rank adaptation (LoRA) fine-tuning of stable diffusion models was proposed. By freezing the backbone of stable diffusion and inserting low-rank adaptation matrices into the attention layers of the U-Net model to fine-tune the attention weights, efficient transfer and precise modeling of crack semantic features were achieved. Comparative experiments with mainstream generative models (DCGAN, WGAN-GP, and StyleGAN) and the original stable diffusion demonstrated that the proposed method achieved superior crack clarity, texture fidelity, and background consistency, with significant improvements in multiple image quality metrics. When combined with the DeepCrack dataset for mixed training, the generated images significantly improve the segmentation performance of U-Net, TransUNet, and MobileViT. In particular, on TransUNet, precision, recall, F1-score, and IoU are improved by 5.9, 7.2, 6.4, and 5.6 percentage points, respectively. The proposed method effectively generated crack images with realistic structures and diverse morphologies, enhancing the robustness and generalization ability of segmentation models, and demonstrated strong potential in scenarios with limited data, difficult annotation, and high-risk environments.

Key words： diffusion model crack segmentation deep learning structural health monitoring crack dataset

Received: 03 September 2025 Published: 06 May 2026

CLC:

U 446

Fund: 国家自然科学基金资助项目（52208445, 52572372）；武汉轻工大学科研资助项目（2025Y009）.

Corresponding Authors: Shiping HUANG E-mail: wujiemc@whpu.edu.cn;ctasihuang@scut.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Jie WU
	Beilin HAN
	Yihang ZHANG
	Chao ZOU
	Lifeng XIN
	Shiping HUANG

Cite this article:

Jie WU,Beilin HAN,Yihang ZHANG,Chao ZOU,Lifeng XIN,Shiping HUANG. Method for augmenting crack image datasets via fine-tuning of stable diffusion models. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 926-934.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.05.002 OR https://www.zjujournals.com/eng/Y2026/V60/I5/926

微调稳定扩散模型的裂缝图像数据集扩充方法

针对裂缝图像数据集稀缺和类别不平衡问题，提出基于低秩自适应（LoRA）微调稳定扩散模型的裂缝图像生成与数据集扩充方法. 在冻结稳定扩散模型主干权重的基础上，通过在U-Net模型的注意力层插入低秩适配矩阵对注意力权重进行微调，实现裂缝语义特征的高效迁移与精准建模. 与主流生成模型（DCGAN、WGAN-GP和StyleGAN）及未微调的稳定扩散模型的对比实验结果表明，所提方法在裂缝结构清晰度、纹理保真度和背景一致性方面表现最优，在各项生成图像质量评估指标上均取得显著进步. 将生成的裂缝图像与真实数据集DeepCrack进行混合训练，在3种典型分割模型（U-Net、TransUNet和MobileViT）上开展性能对比实验. 结果显示，所提方法在精确率、召回率、F1分数和交并比上均显著优于基准模型，其中在TransUNet上分别提升了5.9、7.2、6.4和5.6个百分点. 所提方法能够有效生成结构真实、形态多样的裂缝图像，显著提升分割模型的鲁棒性与泛化能力；在数据稀缺、标注困难及高危环境等场景中，具备广阔的应用潜力.

关键词： 扩散模型, 裂缝分割, 深度学习, 结构健康监测, 裂缝数据集

Fig.1 Diffusion process of denoising diffusion probabilistic model

Fig.2 Denoising diffusion probabilistic model

Fig.3 Latent diffusion model

Fig.4 Framework diagram of stable diffusion model

Fig.5 Crack images generated using stable diffusion model

Fig.6 Flowchart for fine-tuning stable diffusion model via low-rank adaptation technology

Fig.7 Flowchart for fine-tuning stable diffusion model via low-rank adaptation technology for crack image generation

Fig.8 Crack images generated by fine-tuning stable diffusion model via low-rank adaptation technology

Fig.9 Crack images before and after Labelme annotation

Fig.10 Examples of crack images generated by different models

Tab.1 Quantitative analysis of different models for crack image generation

Tab.2 Sample distribution of different datasets in comparative experiments

Tab.3 Segmentation performance comparison of different models on mixed dataset %


[1]	SUN L, SHANG Z, XIA Y, et al Review of bridge structural health monitoring aided by big data and artificial intelligence: from condition assessment to damage detection[J]. Journal of Structural Engineering, 2020, 146 (5): 04020073 doi: 10.1061/(ASCE)ST.1943-541X.0002535

[2]	DENG L, SUN T, YANG L, et al Binocular video-based 3D reconstruction and length quantification of cracks in concrete structures[J]. Automation in Construction, 2023, 148: 104743 doi: 10.1016/j.autcon.2023.104743

[3]	CHA Y J, ALI R, LEWIS J, et al Deep learning-based structural health monitoring[J]. Automation in Construction, 2024, 161: 105328 doi: 10.1016/j.autcon.2024.105328

[4]	吴杰, 黄楚越, 韩贝林, 等基于深度学习和图像处理的螺栓损伤检测[J]. 哈尔滨工程大学学报, 2025, 46 (9): 1754- 1764 WU Jie, HUANG Chuyue, HAN Beilin, et al Bolt damage detection based on deep learning and image processing[J]. Journal of Harbin Engineering University, 2025, 46 (9): 1754- 1764 doi: 10.11990/jheu.202403035

[5]	MAEDA H, KASHIYAMA T, SEKIMOTO Y, et al Generative adversarial network for road damage detection[J]. Computer-Aided Civil and Infrastructure Engineering, 2021, 36 (1): 47- 60 doi: 10.1111/mice.12561

[6]	DEEPA D, SIVASANGARI A ESSR-GAN: enhanced super and semi supervised remora resolution based generative adversarial learning framework model for smartphone based road damage detection[J]. Multimedia Tools and Applications, 2024, 83 (2): 5099- 5129 doi: 10.1007/s11042-023-15850-8

[7]	XU B, LIU C Pavement crack detection algorithm based on generative adversarial network and convolutional neural network under small samples[J]. Measurement, 2022, 196: 111219 doi: 10.1016/j.measurement.2022.111219

[8]	CRESWELL A, WHITE T, DUMOULIN V, et al Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35 (1): 53- 65 doi: 10.20944/preprints202212.0191.v1

[9]	ZHONG J, JU H, ZHANG W, et al A deeper generative adversarial network for grooved cement concrete pavement crack detection[J]. Engineering Applications of Artificial Intelligence, 2023, 119: 105808 doi: 10.1016/j.engappai.2022.105808

[10]	DONG J, WANG N, FANG H, et al Innovative method for pavement multiple damages segmentation and measurement by the Road-Seg-CapsNet of feature fusion[J]. Construction and Building Materials, 2022, 324: 126719 doi: 10.1016/j.conbuildmat.2022.126719

[11]	DUNPHY K, FEKRI M N, GROLINGER K, et al Data augmentation for deep-learning-based multiclass structural damage detection using limited information[J]. Sensors, 2022, 22 (16): 6193 doi: 10.3390/s22166193

[12]	PEI L, SUN Z, XIAO L, et al Virtual generation of pavement crack images based on improved deep convolutional generative adversarial network[J]. Engineering Applications of Artificial Intelligence, 2021, 104: 104376 doi: 10.1016/j.engappai.2021.104376

[13]	程风雯, 甘进, 李星, 等基于DCGAN的水下结构物表面缺陷图像生成[J]. 长江科学院院报, 2023, 40 (9): 155- 161 CHENG Fengwen, GAN Jin, LI Xing, et al Image generation for surface defects of underwater structures based on deep convolutional generative adversarial networks[J]. Journal of Changjiang River Scientific Research Institute, 2023, 40 (9): 155- 161 doi: 10.11988/ckyyb.20220421

[14]	赵阳, 康飞, 万刚基于改进CycleGAN与YOLOv8s的混凝土坝水下裂缝识别方法[J]. 水电能源科学, 2025, 43 (4): 158- 162 ZHAO Yang, KANG Fei, WAN Gang Underwater crack identification for concrete dams based on improved CycleGAN and YOLOv8s[J]. Water Resources and Power, 2025, 43 (4): 158- 162 doi: 10.20040/j.cnki.1000-7709.2025.20240874

[15]	吴海鸣, 陈敬玉基于AIGC技术的民族服饰设计研究: 以畲族为例[J]. 丝绸, 2025, 62 (1): 20- 29 WU Haiming, CHEN Jingyu Research on ethnic costume design based on AIGC technology: taking the She ethnic group as an example[J]. Silk, 2025, 62 (1): 20- 29 doi: 10.3969/j.issn.1001-7003.2025.01.003

[16]	师妹华 AIGC技术赋能江苏动画产业研究与应用[J]. 天津美术学院学报, 2024, (3): 18- 21 SHI Meihua Research and application of AIGC technology empowering animation industry in Jiangsu Province[J]. Journal of Tianjin Academy of Fine Arts, 2024, (3): 18- 21 doi: 10.3969/j.issn.1008-8822.2024.03.004

[17]	HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. (2021–10–16)[2025–07–12]. https://arxiv.org/pdf/2106.09685.

[18]	NICHOL A Q, DHARIWAL P. Improved denoising diffusion probabilistic models [C]// 38th International Conference on Machine Learning. [S.l.]: ML Research Press, 2021: 8162−8171.

[19]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans. IEEE, 2022: 10674–10685.

[20]	罗义凯, 徐金华, 李昱燃, 等基于时空关联和异构图卷积的车道级流量预测[J]. 哈尔滨工业大学学报, 2025, 57 (11): 62- 70 LUO Yikai, XU Jinhua, LI Yuran, et al Lane-level traffic flow prediction based on spatiotemporal correlation and heterogeneous graph convolution[J]. Journal of Harbin Institute of Technology, 2025, 57 (11): 62- 70 doi: 10.11918/202407040

[21]	RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks [EB/OL]. (2015−11−19) [2026−04−10]. https://arxiv.org/pdf/1511.06434.

[22]	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc., 2017: 5769−5779.

[23]	KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4401−4410

[24]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. [S.l.]: Springer, 2015: 234–241.

[25]	CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. (2021–02–08)[2025–07–12]. https://arxiv.org/pdf/2102.04306.

[26]	MEHTA S, RASTEGARI M. MobileVIT: light-weight, general-purpose, and mobile-friendly vision transformer [EB/OL]. (2022–03–04)[2025–07–12]. https://arxiv.org/pdf/2110.02178.

[1]	Wenyuan BIAN,Jiuyuan HUO,Chen CHANG. Wind power data cleaning method based on improved imputation diffusion model and LSTM[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1016-1026.

[2]	Hongbin LIN,Sijin LV,Chenyang WANG,Tianfang CAI,Pengwei LUO. Radial basis network based on residual/gradient Gaussian adaptive sampling[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1119-1127.

[3]	Zixiang LI,Kecheng LU,Haibing CAI,Weishuai XIE,Guangdong ZHANG. Concrete crack detection in dark environments based on biomimetic tactile technology[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 915-925.

[4]	Xiaochun WU,Ning GUO. Lightweight detection method of water leakage in tunnel based on HMARU-net[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 468-477.

[5]	Leping LIN,Liang LI,Ning OUYANG. High spatio-temporal resolution precipitation forecast based on dual-polarization radar[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 574-584.

[6]	Minghui YANG,Muyuan SONG,Daxi FU,Yanwei GUO,Xianzhui LU,Wencong ZHANG,Weilong ZHENG. Prediction of shield tunneling-induced soil settlement based on multi-head self-attention-Bi-LSTM model[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 415-424.

[7]	Zhihang ZHU,Yunfeng YAN,Donglian QI. Image generation for power personnel behaviors based on diffusion model with multimodal prompts[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 43-51.

[8]	Yue SUN,Xinglan ZHANG. Targeted adversarial attack method based on dual guidance[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 81-89.

[9]	Jizhong DUAN,Haiyuan LI. Multi-scale parallel magnetic resonance imaging reconstruction based on variational model and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1826-1837.

[10]	Fujian WANG,Zetian ZHANG,Xiqun CHEN,Dianhai WANG. Usage prediction of shared bike based on multi-channel graph aggregation attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1986-1995.

[11]	Hong ZHANG,Xuecheng ZHANG,Guoqiang WANG,Panlong GU,Nan JIANG. Real-time positioning and control of soft robot based on three-dimensional vision[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1574-1582.

[12]	Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.

[13]	Dongping ZHANG,Dawei WANG,Shuji HE,Siliang TANG,Zhiyong LIU,Zhongqiu LIU. Remaining useful life prediction of aircraft engines based on cross-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1504-1513.

[14]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

[15]	Lihong WANG,Xinqian LIU,Jing LI,Zhiquan FENG. Network intrusion detection method based on federated learning and spatiotemporal feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1201-1210.

Viewed

Full text

Abstract

Cited

Shared

Discussed