Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (5): 1016-1026    DOI: 10.3785/j.issn.1008-973X.2026.05.011
能源与动力工程     
基于改进的插补扩散模型与LSTM的风电数据清洗方法
边文远1(),火久元1,2,*(),常琛1
1. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070
2. 国家冰川冻土沙漠科学数据中心,甘肃 兰州 730000
Wind power data cleaning method based on improved imputation diffusion model and LSTM
Wenyuan BIAN1(),Jiuyuan HUO1,2,*(),Chen CHANG1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. National Cryosphere Desert Data Center, Lanzhou 730000, China
 全文: PDF(2472 KB)   HTML
摘要:

针对风电场监控与数据采集系统采集的风机运行数据质量差的问题,提出改进的插补扩散模型与长短期记忆网络结合的方法(IDM-LSTM). 在插补扩散模型训练过程中,掩码采用双重掩码协同策略,有助于模型聚焦关键异常分布区域并增强对异常干扰的鲁棒性. 分层残差倒置Transformer (HRIformer)作为去噪模型,将iTransformer与残差连接相结合用以提升复杂特征的建模能力. 在插补扩散模型推理阶段,掩码采用周期可见性重建掩码(PVRM)策略,通过设置合适掩码周期控制掩码范围,保证序列重构一致性与时序完整性. 插补扩散模型负责异常识别,LSTM负责修正,构建出应用于无标签风电数据的一体化数据清洗框架. 某风电场真实数据的实验结果表明,IDM-LSTM清洗后,风速-功率的皮尔森相关性系数和转速-功率的皮尔森相关性系数分别比原始数据提高了3.78%和3.43%,有效改善了风电数据质量.

关键词: 风电数据清洗插补扩散模型Transformer长短期记忆网络(LSTM)掩码策略    
Abstract:

To address the issue of poor data quality in wind turbine operational data collected by the supervisory control and data acquisition system, a method combining an improved imputation diffusion model and long short-term memory (IDM-LSTM) was proposed. A dual-mask collaborative strategy was employed in the training process of the imputation diffusion model, which helped the model focus on key abnormal distribution regions and enhanced its robustness against abnormal disturbances. A hierarchical residual inverted Transformer (HRIformer) was used as the denoising model, combining the iTransformer with residual connections to improve the model’s ability to capture complex features. During the inference phase of the imputation diffusion model, the periodic visibility reconstruction mask (PVRM) strategy was applied, controlling the mask range by setting an appropriate mask cycle, ensuring the consistency of sequence reconstruction and temporal integrity. The imputation diffusion model is responsible for anomaly detection, while LSTM handles the correction, resulting in an integrated data cleaning framework for unlabeled wind power data. Experimental results from a real wind farm show that IDM-LSTM cleaning improved the Pearson correlation coefficients for wind speed-power and rotational speed-power by 3.78% and 3.43%, respectively, compared with the original data, significantly enhancing wind power data quality.

Key words: wind power data cleaning    imputation diffusion model    Transformer    long short-term memory (LSTM)    mask strategy
收稿日期: 2025-06-09 出版日期: 2026-05-06
CLC:  TM 614  
基金资助: 甘肃省重点研发计划-工业领域(25YFGA045);国家自然科学基金资质项目(62262038);甘肃省科技创新引导计划-科技专员专项(25CXGA030);甘肃省教育科技创新计划(2025CXZX-634).
通讯作者: 火久元     E-mail: bwy0927@163.com;huojy@mail.lzjtu.cn
作者简介: 边文远(2001—),男,硕士生,从事新能源功率预测研究. orcid.org/0009-0008-5485-7946. E-mail:bwy0927@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
边文远
火久元
常琛

引用本文:

边文远,火久元,常琛. 基于改进的插补扩散模型与LSTM的风电数据清洗方法[J]. 浙江大学学报(工学版), 2026, 60(5): 1016-1026.

Wenyuan BIAN,Jiuyuan HUO,Chen CHANG. Wind power data cleaning method based on improved imputation diffusion model and LSTM. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1016-1026.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.05.011        https://www.zjujournals.com/eng/CN/Y2026/V60/I5/1016

图 1  风速-功率异常数据
图 2  去噪扩散概率模型的前向与反向生成过程
图 3  风机特征参数的皮尔森相关性系数热力图
图 4  分层残差倒置Transformer结构图
图 5  周期可见性重建掩码策略示意图
图 6  插补扩散模型的训练与推理过程
图 7  基于改进的插补扩散模型与LSTM的风电数据清洗方法的工作流程图
超参数数值超参数数值
Eps10.07Eps20.04
MinPts113MinPts24
表 1  网格寻优确定的超参数结果
图 8  初步异常识别结果
超参数数值超参数数值
扩散步数100检测窗口大小90
PVPM策略掩码步数5HRIformer中分层数2
PVRM策略掩码周期3iTransformer隐藏层维度128
表 2  插补扩散模型的主要超参数设定
图 9  插补扩散模型异常值识别结果
方法总数据量正常数据量$ \varphi $/%$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $
无(原始数据)6 3006 3000.000.93420.9423
LOF6 3006 0484.000.93940.9448
DBSCAN6 3005 8826.630.94390.9481
DBSCAN+IF6 3005 9216.010.94730.9522
IMDiffusion6 3005 9864.980.94790.9577
TranAD6 3005 9914.900.94730.9533
TimeADDM6 3005 9675.290.95010.9592
IDM6 3005 9545.490.95110.9604
表 3  不同方法异常值识别效果对比
图 10  异常识别后数据分布投影
s$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $s${\rho }_{v{\text{-}}P} $${\rho }_{n{\text{-}}P} $
20.94170.949960.94630.9552
30.95110.960490.94250.9534
表 4  掩码周期敏感性实验结果
图 11  异常值修正后的数据分布
图 12  异常值修正前后的风电数据时序图
验证类别具体实施异常识别效果评估异常修正效果评估
$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $$ {\rho }_{v{\text{-}}P} $$ {\rho }_{n{\text{-}}P} $
双重掩码协同策略随机掩码0.92480.9342
四分位法+随机掩码0.94040.9536
去噪模型U-Net0.93630.9428
IDM不使用IDM0.94570.9513
LSTM线性插值修正0.95780.9649
IDM-LSTM完整流程0.95110.96040.96950.9746
表 5  基于改进的插补扩散模型与LSTM的风电数据清洗方法的模块消融实验结果
图 13  不同方法修正后的数据概率密度曲线对比
方法$ {s}_{1} $$ {s}_{2} $
线性插值0.0280.032
LSTM0.0350.049
表 6  不同方法的样本概率对比结果
1 王永生, 关世杰, 刘利民, 等 基于XGBoost扩展金融因子的风电功率预测方法[J]. 浙江大学学报: 工学版, 2023, 57 (5): 1038- 1049
WANG Yongsheng, GUAN Shijie, LIU Limin, et al Wind power prediction method based on XGBoost extended financial factor[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (5): 1038- 1049
2 YAO Q, ZHU H, XIANG L, et al A novel composed method of cleaning anomy data for improving state prediction of wind turbine[J]. Renewable Energy, 2023, 204: 131- 140
doi: 10.1016/j.renene.2022.12.118
3 PANG G, SHEN C, CAO L, et al Deep learning for anomaly detection: a review[J]. ACM Computing Surveys, 2022, 54 (2): 1- 38
4 魏泰, 贺少雄, 胡子武, 等 基于改进孤立森林算法的风电机组异常数据清洗[J]. 科学技术与工程, 2024, 24 (9): 3691- 3699
WEI Tai, HE Shaoxiong, HU Ziwu, et al Wind turbine abnormal data cleaning based on an improved isolation forest algorithm[J]. Science Technology and Engineering, 2024, 24 (9): 3691- 3699
doi: 10.12404/j.issn.1671-1815.2302642
5 XIANG L, YANG X, HU A, et al Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks[J]. Applied Energy, 2022, 305: 117925
doi: 10.1016/j.apenergy.2021.117925
6 刘宇璐. 物理模型与数据驱动融合的风电机组功率数据异常辨识和插补方法 [D]. 北京: 华北电力大学, 2024.
LIU Yulu. A physics-guided and data-driven integration of wind turbine power data anomaly identification and interpolation method. [D]. Beijing: North China Electric Power University, 2024.
7 罗朗川, 李汝辉, 曾东, 等 基于RANSAC-DBSCAN的风速功率曲线异常数据清洗方法[J]. 太阳能学报, 2025, 46 (4): 445- 453
LUO Langchuan, LI Ruhui, ZENG Dong, et al Abnormal data cleaning method of wind speed-power curve based on RANSAC-DBSCAN[J]. Acta Energiae Solaris Sinica, 2025, 46 (4): 445- 453
doi: 10.19912/j.0254-0096.tynxb.2023-2072
8 DU W, GUO Z, LI C, et al From anomaly detection to novel fault discrimination for wind turbine gearboxes with a sparse isolation encoding forest[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71: 2512710
doi: 10.1109/tim.2022.3187737
9 ZHANG S, WANG F. B-LSTM ultra-short-term wind power prediction based on LOF data anomaly detection [C]// Proceedings of the Second International Conference on Physics, Photonics, and Optical Engineering. Kunming: SPIE, 2024: 22.
10 柳源, 李忠虎, 王金明, 等 风电机组SCADA“风速-功率”数据处理方法研究[J]. 太阳能学报, 2025, 46 (7): 353- 360
LIU Yuan, LI Zhonghu, WANG Jinming, et al Research on data processing methods for “wind speed-power” in wind turbine scada systems[J]. Acta Energiae Solaris Sinica, 2025, 46 (7): 353- 360
doi: 10.19912/j.0254-0096.tynxb.2024-0383
11 CHEN H, LIU H, CHU X, et al Anomaly detection and critical SCADA parameters identification for wind turbines based on LSTM-AE neural network[J]. Renewable Energy, 2021, 172: 829- 840
doi: 10.1016/j.renene.2021.03.078
12 SUI J, YU J, SONG Y, et al Anomaly detection for telemetry time series using a denoising diffusion probabilistic model[J]. IEEE Sensors Journal, 2024, 24 (10): 16429- 16439
doi: 10.1109/JSEN.2024.3383416
13 HU R, YUAN X, QIAO Y, et al. Unsupervised anomaly detection for multivariate time series using diffusion model [C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul: IEEE, 2024: 9606–9610.
14 CHEN Y, ZHANG C, MA M, et al ImDiffusion: imputed diffusion models for multivariate time series anomaly detection[J]. Proceedings of the VLDB Endowment, 2023, 17 (3): 359- 372
doi: 10.14778/3632093.3632101
15 苗长新, 周志伟, 杨千禧, 等 基于分布特征的风电异常数据检测方法[J]. 太阳能学报, 2025, 46 (7): 395- 402
MIAO Changxin, ZHOU Zhiwei, YANG Qianxi, et al Anomaly detection method for wind power based on distribution characteristics[J]. Acta Energiae Solaris Sinica, 2025, 46 (7): 395- 402
doi: 10.19912/j.0254-0096.tynxb.2024-0443
16 王圣举, 张赞 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报: 工学版, 2025, 59 (7): 1471- 1480
WANG Shengju, ZHANG Zan Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (7): 1471- 1480
doi: 10.3785/j.issn.1008-973X.2025.07.015
17 FENG C, LIU C, JIANG D Unsupervised anomaly detection using graph neural networks integrated with physical-statistical feature fusion and local-global learning[J]. Renewable Energy, 2023, 206: 309- 323
doi: 10.1016/j.renene.2023.02.053
18 LIU Y, HU T, ZHANG H, et al. iTransformer: inverted transformers are effective for time series forecasting [EB/OL]. (2024–05–14)[2025–05–30]. https://arxiv.org/pdf/2310.06625.
19 LI X, XIAO C, FENG Z, et al Controlled graph neural networks with denoising diffusion for anomaly detection[J]. Expert Systems with Applications, 2024, 237: 121533
doi: 10.1016/j.eswa.2023.121533
20 缑泽华. 基于扩散模型的时间序列数据填充与检测方法 [D]. 开封: 河南大学, 2024.
GOU Zehua. Time-series data imputation and detection method based on diffusion model [D]. Kaifeng: Henan University, 2024.
21 ZHANG Y, CHEN Y, WANG J, et al Unsupervised deep anomaly detection for multi-sensor time-series signals[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (2): 2118- 2132
doi: 10.1109/tkde.2021.3102110
22 姚禹, 张志厚, 石泽玉, 等 基于支持向量回归的一维频率域航空电磁反演[J]. 浙江大学学报: 工学版, 2022, 56 (1): 202- 212
YAO Yu, ZHANG Zhihou, SHI Zeyu, et al Airborne electromagnetic inversion in one-dimensional frequency-domain based on support vector regression[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (1): 202- 212
doi: 10.3785/j.issn.1008-973X.2022.01.023
23 TULI S, CASALE G, JENNINGS N R TranAD: deep transformer networks for anomaly detection in multivariate time series data[J]. Proceedings of the VLDB Endowment, 2022, 15 (6): 1201- 1214
doi: 10.14778/3514061.3514067
24 林立栋. 基于概率统计方法的风电机组异常数据识别方法研究 [D]. 北京: 华北电力大学, 2023.
LIN Lidong. Research on wind turbine abnormal data identification method based on probability and statisties method [D]. Beijing: North China Electric Power University, 2023.
[1] 侯玉珍,沈晓红,李莉,杨明源,张彩明. 基于掩模和非局部注意力的双阶段去雨网络[J]. 浙江大学学报(工学版), 2026, 60(4): 791-799.
[2] 万刚,王小波,石纲,叶德震,朱思思,司帆. 基于特征细化与注意力增强重构的水下图像增强算法[J]. 浙江大学学报(工学版), 2026, 60(4): 800-811.
[3] 包晓安,彭书友,张娜,涂小妹,张庆琪,吴彪. 基于多方位感知深度融合检测头的目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 32-42.
[4] 孟璇,张雪英,孙颖,周雅茹. 基于电极排列和Transformer的脑电情感识别[J]. 浙江大学学报(工学版), 2025, 59(9): 1872-1880.
[5] 刘杰,吴优,田佳禾,韩轲. 改进Transformer的肺部CT图像超分辨率重建[J]. 浙江大学学报(工学版), 2025, 59(7): 1434-1442.
[6] 蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[7] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.
[8] 张梦瑶,周杰,李文婷,赵勇. 结合全局信息和局部信息的三维网格分割框架[J]. 浙江大学学报(工学版), 2025, 59(5): 912-919.
[9] 张德军,白燕子,曹锋,吴亦奇,徐战亚. 面向密集预测任务的点云Transformer适配器[J]. 浙江大学学报(工学版), 2025, 59(5): 920-928.
[10] 马莉,王永顺,胡瑶,范磊. 预训练长短时空交错Transformer在交通流预测中的应用[J]. 浙江大学学报(工学版), 2025, 59(4): 669-678.
[11] 张振利,胡新凯,李凡,冯志成,陈智超. 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报(工学版), 2025, 59(4): 778-786.
[12] 贾晓芬,王子祥,赵佰亭,梁镇洹,胡锐. 双维度交叉融合驱动的图像超分辨率重建方法[J]. 浙江大学学报(工学版), 2025, 59(12): 2516-2526.
[13] 杨燕,贾存鹏. 代理注意力下域特征交互的高效图像去雾算法[J]. 浙江大学学报(工学版), 2025, 59(12): 2527-2538.
[14] 刘宇轩,刘毅志,廖祝华,邹正标,汤璟昕. 面向动态交通流量预测的自适应图注意Transformer[J]. 浙江大学学报(工学版), 2025, 59(12): 2585-2592.
[15] 赵利英,王占中. 基于时空信息融合的高速公路区域货运量预测模型[J]. 浙江大学学报(工学版), 2025, 59(10): 2096-2105.