Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (5): 1016-1026    DOI: 10.3785/j.issn.1008-973X.2026.05.011
    
Wind power data cleaning method based on improved imputation diffusion model and LSTM
Wenyuan BIAN1(),Jiuyuan HUO1,2,*(),Chen CHANG1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. National Cryosphere Desert Data Center, Lanzhou 730000, China
Download: HTML     PDF(2472KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

To address the issue of poor data quality in wind turbine operational data collected by the supervisory control and data acquisition system, a method combining an improved imputation diffusion model and long short-term memory (IDM-LSTM) was proposed. A dual-mask collaborative strategy was employed in the training process of the imputation diffusion model, which helped the model focus on key abnormal distribution regions and enhanced its robustness against abnormal disturbances. A hierarchical residual inverted Transformer (HRIformer) was used as the denoising model, combining the iTransformer with residual connections to improve the model’s ability to capture complex features. During the inference phase of the imputation diffusion model, the periodic visibility reconstruction mask (PVRM) strategy was applied, controlling the mask range by setting an appropriate mask cycle, ensuring the consistency of sequence reconstruction and temporal integrity. The imputation diffusion model is responsible for anomaly detection, while LSTM handles the correction, resulting in an integrated data cleaning framework for unlabeled wind power data. Experimental results from a real wind farm show that IDM-LSTM cleaning improved the Pearson correlation coefficients for wind speed-power and rotational speed-power by 3.78% and 3.43%, respectively, compared with the original data, significantly enhancing wind power data quality.



Key wordswind power data cleaning      imputation diffusion model      Transformer      long short-term memory (LSTM)      mask strategy     
Received: 09 June 2025      Published: 06 May 2026
CLC:  TM 614  
Fund:  甘肃省重点研发计划-工业领域(25YFGA045);国家自然科学基金资质项目(62262038);甘肃省科技创新引导计划-科技专员专项(25CXGA030);甘肃省教育科技创新计划(2025CXZX-634).
Corresponding Authors: Jiuyuan HUO     E-mail: bwy0927@163.com;huojy@mail.lzjtu.cn
Cite this article:

Wenyuan BIAN,Jiuyuan HUO,Chen CHANG. Wind power data cleaning method based on improved imputation diffusion model and LSTM. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1016-1026.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.05.011     OR     https://www.zjujournals.com/eng/Y2026/V60/I5/1016


基于改进的插补扩散模型与LSTM的风电数据清洗方法

针对风电场监控与数据采集系统采集的风机运行数据质量差的问题,提出改进的插补扩散模型与长短期记忆网络结合的方法(IDM-LSTM). 在插补扩散模型训练过程中,掩码采用双重掩码协同策略,有助于模型聚焦关键异常分布区域并增强对异常干扰的鲁棒性. 分层残差倒置Transformer (HRIformer)作为去噪模型,将iTransformer与残差连接相结合用以提升复杂特征的建模能力. 在插补扩散模型推理阶段,掩码采用周期可见性重建掩码(PVRM)策略,通过设置合适掩码周期控制掩码范围,保证序列重构一致性与时序完整性. 插补扩散模型负责异常识别,LSTM负责修正,构建出应用于无标签风电数据的一体化数据清洗框架. 某风电场真实数据的实验结果表明,IDM-LSTM清洗后,风速-功率的皮尔森相关性系数和转速-功率的皮尔森相关性系数分别比原始数据提高了3.78%和3.43%,有效改善了风电数据质量.


关键词: 风电数据清洗,  插补扩散模型,  Transformer,  长短期记忆网络(LSTM),  掩码策略 
Fig.1 Wind speed-power anomalous data
Fig.2 Forward and reverse processes of denoising diffusion probabilistic models
Fig.3 Pearson correlation coefficient heat map of wind turbine characteristic parameters
Fig.4 Architecture of hierarchical residual inverted Transformer
Fig.5 Schematic diagram of periodic visibility reconstruction mask strategy
Fig.6 Training and inference process of imputation diffusion model
Fig.7 Workflow of wind power data cleaning method based on improved imputation diffusion model and LSTM
超参数数值超参数数值
Eps10.07Eps20.04
MinPts113MinPts24
Tab.1 Hyperparameter optimization results via grid search
Fig.8 Preliminary anomaly detection results
超参数数值超参数数值
扩散步数100检测窗口大小90
PVPM策略掩码步数5HRIformer中分层数2
PVRM策略掩码周期3iTransformer隐藏层维度128
Tab.2 Main hyperparameter settings of imputation diffusion model
Fig.9 Anomaly detection results of imputation diffusion model
方法总数据量正常数据量$ \varphi $/%$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $
无(原始数据)6 3006 3000.000.93420.9423
LOF6 3006 0484.000.93940.9448
DBSCAN6 3005 8826.630.94390.9481
DBSCAN+IF6 3005 9216.010.94730.9522
IMDiffusion6 3005 9864.980.94790.9577
TranAD6 3005 9914.900.94730.9533
TimeADDM6 3005 9675.290.95010.9592
IDM6 3005 9545.490.95110.9604
Tab.3 Performance comparison of different methods in outlier detection
Fig.10 Data distribution projection after anomaly detection
s$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $s${\rho }_{v{\text{-}}P} $${\rho }_{n{\text{-}}P} $
20.94170.949960.94630.9552
30.95110.960490.94250.9534
Tab.4 Results of mask period sensitivity experiments
Fig.11 Data distribution after outlier correction
Fig.12 Time series of wind power data before and after outlier correction
验证类别具体实施异常识别效果评估异常修正效果评估
$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $
双重掩码协同策略随机掩码0.92480.9342
四分位法+随机掩码0.94040.9536
去噪模型U-Net0.93630.9428
IDM不使用IDM0.94570.9513
LSTM线性插值修正0.95780.9649
IDM-LSTM完整流程0.95110.96040.96950.9746
Tab.5 Module ablation results of wind power data cleaning method based on improved imputation diffusion model and LSTM
Fig.13 Comparison of probability density curves for data corrected by different methods
方法$ {s}_{1} $$ {s}_{2} $
线性插值0.0280.032
LSTM0.0350.049
Tab.6 Comparison of sample probabilities across different methods
[1]   王永生, 关世杰, 刘利民, 等 基于XGBoost扩展金融因子的风电功率预测方法[J]. 浙江大学学报: 工学版, 2023, 57 (5): 1038- 1049
WANG Yongsheng, GUAN Shijie, LIU Limin, et al Wind power prediction method based on XGBoost extended financial factor[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (5): 1038- 1049
[2]   YAO Q, ZHU H, XIANG L, et al A novel composed method of cleaning anomy data for improving state prediction of wind turbine[J]. Renewable Energy, 2023, 204: 131- 140
doi: 10.1016/j.renene.2022.12.118
[3]   PANG G, SHEN C, CAO L, et al Deep learning for anomaly detection: a review[J]. ACM Computing Surveys, 2022, 54 (2): 1- 38
[4]   魏泰, 贺少雄, 胡子武, 等 基于改进孤立森林算法的风电机组异常数据清洗[J]. 科学技术与工程, 2024, 24 (9): 3691- 3699
WEI Tai, HE Shaoxiong, HU Ziwu, et al Wind turbine abnormal data cleaning based on an improved isolation forest algorithm[J]. Science Technology and Engineering, 2024, 24 (9): 3691- 3699
doi: 10.12404/j.issn.1671-1815.2302642
[5]   XIANG L, YANG X, HU A, et al Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks[J]. Applied Energy, 2022, 305: 117925
doi: 10.1016/j.apenergy.2021.117925
[6]   刘宇璐. 物理模型与数据驱动融合的风电机组功率数据异常辨识和插补方法 [D]. 北京: 华北电力大学, 2024.
LIU Yulu. A physics-guided and data-driven integration of wind turbine power data anomaly identification and interpolation method. [D]. Beijing: North China Electric Power University, 2024.
[7]   罗朗川, 李汝辉, 曾东, 等 基于RANSAC-DBSCAN的风速功率曲线异常数据清洗方法[J]. 太阳能学报, 2025, 46 (4): 445- 453
LUO Langchuan, LI Ruhui, ZENG Dong, et al Abnormal data cleaning method of wind speed-power curve based on RANSAC-DBSCAN[J]. Acta Energiae Solaris Sinica, 2025, 46 (4): 445- 453
doi: 10.19912/j.0254-0096.tynxb.2023-2072
[8]   DU W, GUO Z, LI C, et al From anomaly detection to novel fault discrimination for wind turbine gearboxes with a sparse isolation encoding forest[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 2512710
doi: 10.1109/tim.2022.3187737
[9]   ZHANG S, WANG F. B-LSTM ultra-short-term wind power prediction based on LOF data anomaly detection [C]// Proceedings of the Second International Conference on Physics, Photonics, and Optical Engineering. Kunming: SPIE, 2024: 22.
[10]   柳源, 李忠虎, 王金明, 等 风电机组SCADA“风速-功率”数据处理方法研究[J]. 太阳能学报, 2025, 46 (7): 353- 360
LIU Yuan, LI Zhonghu, WANG Jinming, et al Research on data processing methods for “wind speed-power” in wind turbine scada systems[J]. Acta Energiae Solaris Sinica, 2025, 46 (7): 353- 360
doi: 10.19912/j.0254-0096.tynxb.2024-0383
[11]   CHEN H, LIU H, CHU X, et al Anomaly detection and critical SCADA parameters identification for wind turbines based on LSTM-AE neural network[J]. Renewable Energy, 2021, 172: 829- 840
doi: 10.1016/j.renene.2021.03.078
[12]   SUI J, YU J, SONG Y, et al Anomaly detection for telemetry time series using a denoising diffusion probabilistic model[J]. IEEE Sensors Journal, 2024, 24 (10): 16429- 16439
doi: 10.1109/JSEN.2024.3383416
[13]   HU R, YUAN X, QIAO Y, et al. Unsupervised anomaly detection for multivariate time series using diffusion model [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 9606–9610.
[14]   CHEN Y, ZHANG C, MA M, et al ImDiffusion: imputed diffusion models for multivariate time series anomaly detection[J]. Proceedings of the VLDB Endowment, 2023, 17 (3): 359- 372
doi: 10.14778/3632093.3632101
[15]   苗长新, 周志伟, 杨千禧, 等 基于分布特征的风电异常数据检测方法[J]. 太阳能学报, 2025, 46 (7): 395- 402
MIAO Changxin, ZHOU Zhiwei, YANG Qianxi, et al Anomaly detection method for wind power based on distribution characteristics[J]. Acta Energiae Solaris Sinica, 2025, 46 (7): 395- 402
doi: 10.19912/j.0254-0096.tynxb.2024-0443
[16]   王圣举, 张赞 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报: 工学版, 2025, 59 (7): 1471- 1480
WANG Shengju, ZHANG Zan Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (7): 1471- 1480
doi: 10.3785/j.issn.1008-973X.2025.07.015
[17]   FENG C, LIU C, JIANG D Unsupervised anomaly detection using graph neural networks integrated with physical-statistical feature fusion and local-global learning[J]. Renewable Energy, 2023, 206: 309- 323
doi: 10.1016/j.renene.2023.02.053
[18]   LIU Y, HU T, ZHANG H, et al. iTransformer: inverted transformers are effective for time series forecasting [EB/OL]. (2024–05–14)[2025–05–30]. https://arxiv.org/pdf/2310.06625.
[19]   LI X, XIAO C, FENG Z, et al Controlled graph neural networks with denoising diffusion for anomaly detection[J]. Expert Systems with Applications, 2024, 237: 121533
doi: 10.1016/j.eswa.2023.121533
[20]   缑泽华. 基于扩散模型的时间序列数据填充与检测方法 [D]. 开封: 河南大学, 2024.
GOU Zehua. Time-series data imputation and detection method based on diffusion model [D]. Kaifeng: Henan University, 2024.
[21]   ZHANG Y, CHEN Y, WANG J, et al Unsupervised deep anomaly detection for multi-sensor time-series signals[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (2): 2118- 2132
doi: 10.1109/tkde.2021.3102110
[22]   姚禹, 张志厚, 石泽玉, 等 基于支持向量回归的一维频率域航空电磁反演[J]. 浙江大学学报: 工学版, 2022, 56 (1): 202- 212
YAO Yu, ZHANG Zhihou, SHI Zeyu, et al Airborne electromagnetic inversion in one-dimensional frequency-domain based on support vector regression[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (1): 202- 212
doi: 10.3785/j.issn.1008-973X.2022.01.023
[23]   TULI S, CASALE G, JENNINGS N R TranAD: deep transformer networks for anomaly detection in multivariate time series data[J]. Proceedings of the VLDB Endowment, 2022, 15 (6): 1201- 1214
doi: 10.14778/3514061.3514067
[24]   林立栋. 基于概率统计方法的风电机组异常数据识别方法研究 [D]. 北京: 华北电力大学, 2023.
LIN Lidong. Research on wind turbine abnormal data identification method based on probability and statisties method [D]. Beijing: North China Electric Power University, 2023.
[1] Yuzhen HOU,Xiaohong SHEN,Li LI,Mingyuan YANG,Caiming ZHANG. Dual-stage deraining network based on mask and non-local attention[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 791-799.
[2] Gang WAN,Xiaobo WANG,Gang SHI,Dezhen YE,Sisi ZHU,Fan SI. Underwater image enhancement algorithm based on feature refinement and attention-augmented reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 800-811.
[3] Xiao’an BAO,Shuyou PENG,Na ZHANG,Xiaomei TU,Qingqi ZHANG,Biao WU. Object detection algorithm based on multi-azimuth perception deep fusion detection head[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 32-42.
[4] Xuan MENG,Xueying ZHANG,Ying SUN,Yaru ZHOU. EEG emotion recognition based on electrode arrangement and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1872-1880.
[5] Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.
[6] Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.
[7] Mengyao ZHANG,Jie ZHOU,Wenting LI,Yong ZHAO. Three-dimensional mesh segmentation framework using global and local information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 912-919.
[8] Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.
[9] Li MA,Yongshun WANG,Yao HU,Lei FAN. Pre-trained long-short spatiotemporal interleaved Transformer for traffic flow prediction applications[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 669-678.
[10] Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.
[11] Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.
[12] Yan YANG,Cunpeng JIA. An efficient image dehazing algorithm with Agent Attention for domain feature interaction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2527-2538.
[13] Yuxuan LIU,Yizhi LIU,Zhuhua LIAO,Zhengbiao ZOU,Jingxin TANG. Adaptive graph attention Transformer for dynamic traffic flow prediction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2585-2592.
[14] Liying ZHAO,Zhanzhong WANG. Prediction model for regional freight volume on highways based on spatiotemporal information fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(10): 2096-2105.
[15] Bing YANG,Chuyang XU,Jinliang YAO,Xueqin XIANG. 3D hand pose estimation method based on monocular RGB images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 18-26.