Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (5): 926-934    DOI: 10.3785/j.issn.1008-973X.2026.05.002
土木与建筑工程     
微调稳定扩散模型的裂缝图像数据集扩充方法
吴杰1(),韩贝林1,张舣航1,邹超2,辛莉峰3,黄仕平4,*()
1. 武汉轻工大学 土木工程与建筑学院,湖北 武汉 430023
2. 广东工业大学 土木与交通工程学院,广东 广州 510006
3. 西北工业大学 力学与交通运载工程学院,陕西 西安 710072
4. 华南理工大学 土木与交通学院,广东 广州 510641
Method for augmenting crack image datasets via fine-tuning of stable diffusion models
Jie WU1(),Beilin HAN1,Yihang ZHANG1,Chao ZOU2,Lifeng XIN3,Shiping HUANG4,*()
1. School of Civil Engineering and Architecture, Wuhan Polytechnic University, Wuhan 430023, China
2. School of Civil and Transportation Engineering, Guangdong University of Technology, Guangzhou 510006, China
3. School of Mechanics and Transportation Engineering, Northwestern Polytechnical University, Xi’an 710072, China
4. School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510641, China
 全文: PDF(3964 KB)   HTML
摘要:

针对裂缝图像数据集稀缺和类别不平衡问题,提出基于低秩自适应(LoRA)微调稳定扩散模型的裂缝图像生成与数据集扩充方法. 在冻结稳定扩散模型主干权重的基础上,通过在U-Net模型的注意力层插入低秩适配矩阵对注意力权重进行微调,实现裂缝语义特征的高效迁移与精准建模. 与主流生成模型(DCGAN、WGAN-GP和StyleGAN)及未微调的稳定扩散模型的对比实验结果表明,所提方法在裂缝结构清晰度、纹理保真度和背景一致性方面表现最优,在各项生成图像质量评估指标上均取得显著进步. 将生成的裂缝图像与真实数据集DeepCrack进行混合训练,在3种典型分割模型(U-Net、TransUNet和MobileViT)上开展性能对比实验. 结果显示,所提方法在精确率、召回率、F1分数和交并比上均显著优于基准模型,其中在TransUNet上分别提升了5.9、7.2、6.4和5.6个百分点. 所提方法能够有效生成结构真实、形态多样的裂缝图像,显著提升分割模型的鲁棒性与泛化能力;在数据稀缺、标注困难及高危环境等场景中,具备广阔的应用潜力.

关键词: 扩散模型裂缝分割深度学习结构健康监测裂缝数据集    
Abstract:

To address the issues of data scarcity and class imbalance in crack image datasets, a crack image generation and dataset augmentation method based on low-rank adaptation (LoRA) fine-tuning of stable diffusion models was proposed. By freezing the backbone of stable diffusion and inserting low-rank adaptation matrices into the attention layers of the U-Net model to fine-tune the attention weights, efficient transfer and precise modeling of crack semantic features were achieved. Comparative experiments with mainstream generative models (DCGAN, WGAN-GP, and StyleGAN) and the original stable diffusion demonstrated that the proposed method achieved superior crack clarity, texture fidelity, and background consistency, with significant improvements in multiple image quality metrics. When combined with the DeepCrack dataset for mixed training, the generated images significantly improve the segmentation performance of U-Net, TransUNet, and MobileViT. In particular, on TransUNet, precision, recall, F1-score, and IoU are improved by 5.9, 7.2, 6.4, and 5.6 percentage points, respectively. The proposed method effectively generated crack images with realistic structures and diverse morphologies, enhancing the robustness and generalization ability of segmentation models, and demonstrated strong potential in scenarios with limited data, difficult annotation, and high-risk environments.

Key words: diffusion model    crack segmentation    deep learning    structural health monitoring    crack dataset
收稿日期: 2025-09-03 出版日期: 2026-05-06
CLC:  U 446  
基金资助: 国家自然科学基金资助项目(52208445, 52572372);武汉轻工大学科研资助项目(2025Y009).
通讯作者: 黄仕平     E-mail: wujiemc@whpu.edu.cn;ctasihuang@scut.edu.cn
作者简介: 吴杰(1988—),男,讲师,从事结构健康监测研究. orcid.org/0000-0003-1069-9372. E-mail:wujiemc@whpu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
吴杰
韩贝林
张舣航
邹超
辛莉峰
黄仕平

引用本文:

吴杰,韩贝林,张舣航,邹超,辛莉峰,黄仕平. 微调稳定扩散模型的裂缝图像数据集扩充方法[J]. 浙江大学学报(工学版), 2026, 60(5): 926-934.

Jie WU,Beilin HAN,Yihang ZHANG,Chao ZOU,Lifeng XIN,Shiping HUANG. Method for augmenting crack image datasets via fine-tuning of stable diffusion models. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 926-934.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.05.002        https://www.zjujournals.com/eng/CN/Y2026/V60/I5/926

图 1  去噪扩散概率模型的扩散过程
图 2  去噪扩散概率模型
图 3  潜在扩散模型
图 4  稳定扩散模型框架图
图 5  使用稳定扩散模型生成的裂缝图像
图 6  使用低秩自适应技术微调稳定扩散模型的流程图
图 7  使用低秩自适应技术微调稳定扩散模型生成裂缝图像的流程图
图 8  使用低秩自适应技术微调稳定扩散模型生成的裂缝图像
图 9  使用Labelme标注前后的裂缝图像
图 10  使用不同模型生成的裂缝图像示例
模型IS(↑)FID(↓)KID(↓)LPIPS(↓)
训练集2.996
DCGAN1.232314.2250.3550.646
WGAN-GP1.510162.9390.1430.420
StyleGAN2.016150.2530.1200.418
微调前的SD2.993292.2010.0090.582
本研究2.989116.6240.0550.390
表 1  不同模型在裂缝图像生成任务中的定量分析结果
实验组NSPT/%NST
训练集测试集DCADUAIGC
130023755.8300
236023760.330060
342023763.9300120
436023760.330060
542023763.9300120
表 2  对比实验中不同数据集的样本分布
实验组U-NetTransUNetMobileVIT
PRF1IoUPRF1IoUPRF1IoU
1(基准数据集)78.876.277.570.680.279.579.870.581.279.080.071.1
279.476.577.970.983.183.583.272.881.479.280.271.6
380.577.178.771.583.285.284.174.582.981.482.172.6
479.677.478.571.483.284.883.973.582.583.883.173.0
581.578.680.572.886.186.786.276.185.984.785.374.2
表 3  混合数据集在不同模型上的分割性能对比
1 SUN L, SHANG Z, XIA Y, et al Review of bridge structural health monitoring aided by big data and artificial intelligence: from condition assessment to damage detection[J]. Journal of Structural Engineering, 2020, 146 (5): 04020073
doi: 10.1061/(ASCE)ST.1943-541X.0002535
2 DENG L, SUN T, YANG L, et al Binocular video-based 3D reconstruction and length quantification of cracks in concrete structures[J]. Automation in Construction, 2023, 148: 104743
doi: 10.1016/j.autcon.2023.104743
3 CHA Y J, ALI R, LEWIS J, et al Deep learning-based structural health monitoring[J]. Automation in Construction, 2024, 161: 105328
doi: 10.1016/j.autcon.2024.105328
4 吴杰, 黄楚越, 韩贝林, 等 基于深度学习和图像处理的螺栓损伤检测[J]. 哈尔滨工程大学学报, 2025, 46 (9): 1754- 1764
WU Jie, HUANG Chuyue, HAN Beilin, et al Bolt damage detection based on deep learning and image processing[J]. Journal of Harbin Engineering University, 2025, 46 (9): 1754- 1764
doi: 10.11990/jheu.202403035
5 MAEDA H, KASHIYAMA T, SEKIMOTO Y, et al Generative adversarial network for road damage detection[J]. Computer-Aided Civil and Infrastructure Engineering, 2021, 36 (1): 47- 60
doi: 10.1111/mice.12561
6 DEEPA D, SIVASANGARI A ESSR-GAN: enhanced super and semi supervised remora resolution based generative adversarial learning framework model for smartphone based road damage detection[J]. Multimedia Tools and Applications, 2024, 83 (2): 5099- 5129
doi: 10.1007/s11042-023-15850-8
7 XU B, LIU C Pavement crack detection algorithm based on generative adversarial network and convolutional neural network under small samples[J]. Measurement, 2022, 196: 111219
doi: 10.1016/j.measurement.2022.111219
8 CRESWELL A, WHITE T, DUMOULIN V, et al Generative adversarial networks: an overview[J]. IEEE Signal Processing Magazine, 2018, 35 (1): 53- 65
doi: 10.20944/preprints202212.0191.v1
9 ZHONG J, JU H, ZHANG W, et al A deeper generative adversarial network for grooved cement concrete pavement crack detection[J]. Engineering Applications of Artificial Intelligence, 2023, 119: 105808
doi: 10.1016/j.engappai.2022.105808
10 DONG J, WANG N, FANG H, et al Innovative method for pavement multiple damages segmentation and measurement by the Road-Seg-CapsNet of feature fusion[J]. Construction and Building Materials, 2022, 324: 126719
doi: 10.1016/j.conbuildmat.2022.126719
11 DUNPHY K, FEKRI M N, GROLINGER K, et al Data augmentation for deep-learning-based multiclass structural damage detection using limited information[J]. Sensors, 2022, 22 (16): 6193
doi: 10.3390/s22166193
12 PEI L, SUN Z, XIAO L, et al Virtual generation of pavement crack images based on improved deep convolutional generative adversarial network[J]. Engineering Applications of Artificial Intelligence, 2021, 104: 104376
doi: 10.1016/j.engappai.2021.104376
13 程风雯, 甘进, 李星, 等 基于DCGAN的水下结构物表面缺陷图像生成[J]. 长江科学院院报, 2023, 40 (9): 155- 161
CHENG Fengwen, GAN Jin, LI Xing, et al Image generation for surface defects of underwater structures based on deep convolutional generative adversarial networks[J]. Journal of Changjiang River Scientific Research Institute, 2023, 40 (9): 155- 161
doi: 10.11988/ckyyb.20220421
14 赵阳, 康飞, 万刚 基于改进CycleGAN与YOLOv8s的混凝土坝水下裂缝识别方法[J]. 水电能源科学, 2025, 43 (4): 158- 162
ZHAO Yang, KANG Fei, WAN Gang Underwater crack identification for concrete dams based on improved CycleGAN and YOLOv8s[J]. Water Resources and Power, 2025, 43 (4): 158- 162
doi: 10.20040/j.cnki.1000-7709.2025.20240874
15 吴海鸣, 陈敬玉 基于AIGC技术的民族服饰设计研究: 以畲族为例[J]. 丝绸, 2025, 62 (1): 20- 29
WU Haiming, CHEN Jingyu Research on ethnic costume design based on AIGC technology: taking the She ethnic group as an example[J]. Silk, 2025, 62 (1): 20- 29
doi: 10.3969/j.issn.1001-7003.2025.01.003
16 师妹华 AIGC技术赋能江苏动画产业研究与应用[J]. 天津美术学院学报, 2024, (3): 18- 21
SHI Meihua Research and application of AIGC technology empowering animation industry in Jiangsu Province[J]. Journal of Tianjin Academy of Fine Arts, 2024, (3): 18- 21
doi: 10.3969/j.issn.1008-8822.2024.03.004
17 HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models [EB/OL]. (2021–10–16)[2025–07–12]. https://arxiv.org/pdf/2106.09685.
18 NICHOL A Q, DHARIWAL P. Improved denoising diffusion probabilistic models [C]// 38th International Conference on Machine Learning. [S.l.]: ML Research Press, 2021: 8162−8171.
19 ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans. IEEE, 2022: 10674–10685.
20 罗义凯, 徐金华, 李昱燃, 等 基于时空关联和异构图卷积的车道级流量预测[J]. 哈尔滨工业大学学报, 2025, 57 (11): 62- 70
LUO Yikai, XU Jinhua, LI Yuran, et al Lane-level traffic flow prediction based on spatiotemporal correlation and heterogeneous graph convolution[J]. Journal of Harbin Institute of Technology, 2025, 57 (11): 62- 70
doi: 10.11918/202407040
21 RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks [EB/OL]. (2015−11−19) [2026−04−10]. https://arxiv.org/pdf/1511.06434.
22 GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc., 2017: 5769−5779.
23 KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4401−4410
24 RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. [S.l.]: Springer, 2015: 234–241.
25 CHEN J, LU Y, YU Q, et al. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. (2021–02–08)[2025–07–12]. https://arxiv.org/pdf/2102.04306.
26 MEHTA S, RASTEGARI M. MobileVIT: light-weight, general-purpose, and mobile-friendly vision transformer [EB/OL]. (2022–03–04)[2025–07–12]. https://arxiv.org/pdf/2110.02178.
[1] 边文远,火久元,常琛. 基于改进的插补扩散模型与LSTM的风电数据清洗方法[J]. 浙江大学学报(工学版), 2026, 60(5): 1016-1026.
[2] 林洪彬,吕思进,王晨阳,蔡天放,骆鹏伟. 基于残差/梯度高斯自适应采样的径向基网络[J]. 浙江大学学报(工学版), 2026, 60(5): 1119-1127.
[3] 李子祥,陆克成,蔡海兵,解伟帅,张广东. 基于触觉仿生技术的黑暗环境混凝土裂缝检测[J]. 浙江大学学报(工学版), 2026, 60(5): 915-925.
[4] 武晓春,郭宁. 基于HMARU-net的隧道渗漏水轻量化检测方法[J]. 浙江大学学报(工学版), 2026, 60(3): 468-477.
[5] 林乐平,李量,欧阳宁. 基于双偏振雷达的高时空分辨率临近降水预报[J]. 浙江大学学报(工学版), 2026, 60(3): 574-584.
[6] 杨明辉,宋牧原,付大喜,郭炎伟,卢贤锥,张文聪,郑伟龙. 基于多头自注意力-Bi-LSTM模型的盾构掘进引发的土体沉降预测[J]. 浙江大学学报(工学版), 2026, 60(2): 415-424.
[7] 朱志航,闫云凤,齐冬莲. 基于扩散模型多模态提示的电力人员行为图像生成[J]. 浙江大学学报(工学版), 2026, 60(1): 43-51.
[8] 孙月,张兴兰. 基于双重引导的目标对抗攻击方法[J]. 浙江大学学报(工学版), 2026, 60(1): 81-89.
[9] 段继忠,李海源. 基于变分模型和Transformer的多尺度并行磁共振成像重建[J]. 浙江大学学报(工学版), 2025, 59(9): 1826-1837.
[10] 王福建,张泽天,陈喜群,王殿海. 基于多通道图聚合注意力机制的共享单车借还量预测[J]. 浙江大学学报(工学版), 2025, 59(9): 1986-1995.
[11] 张弘,张学成,王国强,顾潘龙,江楠. 基于三维视觉的软体机器人实时定位与控制[J]. 浙江大学学报(工学版), 2025, 59(8): 1574-1582.
[12] 王圣举,张赞. 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报(工学版), 2025, 59(7): 1471-1480.
[13] 章东平,王大为,何数技,汤斯亮,刘志勇,刘中秋. 基于跨维度特征融合的航空发动机寿命预测[J]. 浙江大学学报(工学版), 2025, 59(7): 1504-1513.
[14] 蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[15] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.