Please wait a minute...
浙江大学学报(工学版)  2021, Vol. 55 Issue (8): 1566-1575    DOI: 10.3785/j.issn.1008-973X.2021.08.018
能源工程     
基于样本优选的集成学习在脱硫优化中的应用
葛志辉(),邢江宽,罗坤*(),樊建人
浙江大学 能源清洁利用国家重点实验室,浙江 杭州 310027
Application of ensemble learning based on preferred sample selection in desulfurization optimization process
Zhi-hui GE(),Jiang-kuan XING,Kun LUO*(),Jian-ren FAN
State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
 全文: PDF(1325 KB)   HTML
摘要:

基于实际电厂的大量脱硫数据,删除初始脱硫数据库中异常值和非稳态值,提取与输出相关系数较高的集成学习模型输入参数,采用改进的基于随机采样和聚类采样的集成学习算法,建立预测脱硫塔循环泵开启台数的集成学习模型,研究分类问题中样本不均衡、优选样本评价标准缺失和脱硫优化的问题. 结果显示,与改进前模型相比,改进后的集成学习模型总体预测准确度提升了33%,并且基于聚类的采样略优于随机采样. 此外,对单一类别预测的召回率进行分析,对比不同算法对少数类和多数类的召回率,结果显示2种改进的采样方法对少数类的预测有较大的提升,预测的召回率大于90%,对多数类的预测也有一定的提升效果. 讨论泵组合作为模型输出时,其样本分布和模型精度的差异.

关键词: 聚类采样集成学习脱硫系统样本优选    
Abstract:

The ensemble learning approach based on random sampling or cluster sampling was developed to predict the number of desulfurization tower circulating pumps opened. The database was constructed from a realistic power plants, the outliers and unsteady values in the initial one were deleted, and the input parameters of the ensemble learning model with high correlation coefficients with the output were extracted. The problems of imbalanced samples in classification, missing evaluation criteria for optimal samples and desulfurization optimization were solved. Results showed that the improved ensemble learning model had a 33% increase in overall prediction accuracy compared with the original model. In addition, the cluster sampling was slightly better than the random sampling. Furthermore, the recall of a single category prediction was analyzed, and the recall values of different algorithms for the minority category and the majority category were compared. Results showed that the two improved sampling methods had greatly improved the minority category prediction, and the recall reached more than 90%, besides it also had certain improvement on the majority. Finally, the difference in sample distribution and model accuracy was discussed when the pump combination was used as the model’s output.

Key words: clustering    sampling    ensemble learning    desulfurization system    preferred sample selection
收稿日期: 2020-08-10 出版日期: 2021-09-01
CLC:  TK 01+8  
基金资助: 国家重点研发计划资助项目(2017YFB0601805)
通讯作者: 罗坤     E-mail: 21827006@zju.edu.cn;zjulk@zju.edu.cn
作者简介: 葛志辉(1995—),男,硕士生,从事电厂大数据研究. orcid.org/0000-0002-3689-534X. E-mail: 21827006@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
葛志辉
邢江宽
罗坤
樊建人

引用本文:

葛志辉,邢江宽,罗坤,樊建人. 基于样本优选的集成学习在脱硫优化中的应用[J]. 浙江大学学报(工学版), 2021, 55(8): 1566-1575.

Zhi-hui GE,Jiang-kuan XING,Kun LUO,Jian-ren FAN. Application of ensemble learning based on preferred sample selection in desulfurization optimization process. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1566-1575.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.08.018        https://www.zjujournals.com/eng/CN/Y2021/V55/I8/1566

图 1  混淆矩阵示意图
编号 输入参数 rx fimp
1 发电机有功功率/MW 0.352 400 0.089 773
2 1号脱硫原烟气O2体积分数/% ?0.194 900 0.064 280
3 1号脱硫净烟气O2体积分数/% ?0.299 390 0.071 927
4 1号脱硫原烟气含尘质量浓度/(mg·m?3 0.055 873 0.091 056
5 1号脱硫净烟气含尘质量浓度/(mg·m?3 0.117 241 0.076 682
6 1号吸收塔石膏浆液质量浓度/(kg·m?3 0.100 041 0.048 398
7 1号脱硫原烟气SO2质量浓度/(mg·m?3 0.274 171 0.141 799
8 1号脱硫净烟气SO2体积分数/10?6 0.130 771 0.064 986
9 1号吸收塔石灰石供浆体积流量/(m3·h?1 0.227 209 0.038395
10 烟气质量流量/(t·h?1 0.305 443 0.129 984
11 浆液pH值 ?0.003 030 0.080 751
12 进口烟气温度/℃ 0.250 639 0.101 970
表 1  不同输入参数与输出的相关系数和对模型的输入重要性
图 2  模型输入变量彼此相关系数的热力图
编号 输入参数 平均值 最小值 最大值
1 1号脱硫原烟气O2体积分数/% 6.5 3.1 10.6
2 1号脱硫原烟气含尘质量浓度/(mg·m?3 21.6 15.1 34.0
3 1号脱硫净烟气含尘质量浓度/(mg·m?3 1.8 0.2 3.4
4 1号吸收塔石膏浆液质量浓度/(kg·m?3 1 132.0 1 091.7 1 171.3
5 1号脱硫原烟气SO2质量浓度/(mg·m?3 1 582.0 710.7 2 434.0
6 1号脱硫净烟气SO2体积分数/% 5.9 1.0 11.1
7 1号吸收塔石灰石供浆体积流量/(m3·h?1 14.7 0.0 32.2
8 总风质量流量/(t·h?1 1.8 0.2 5.0
9 pH值 5.5 4.8 6.2
10 进口烟气温度/℃ 95.2 80.8 109.3
表 2  异常值删除后样本输入参数的特征
图 3  不同输入参数归一化后的样本分布图
图 4  机组功率随时间的变化
图 5  机组功率稳态数据和非稳态的划分
类别 N
AdaBoost RUSBoost CUSBoost
2 10 977 5 857 5 857
3 22 781 5 857 5 857
4 5 857 5 857 5 857
表 3  不同算法对不同类别选取的样本数量
图 6  不同聚类簇数下模型准确度变化折线图
图 7  测试集和训练集不同分类器个数的模型预测准确度变化
图 8  不同算法对不同循环泵台数的召回率
图 9  不同泵组合的样本数量
图 10  简化后不同泵组合的样本数量
1 中华人民共和国统计局. 中国统计年鉴[M]. 北京: 中国统计出版社, 2019.
2 BARMA M C, SAIDUR R, RAHMAN S M, et al A review on boilers energy use, energy savings, and emissions reductions[J]. Renewable and Sustainable Energy Reviews, 2017, 79: 970- 983
doi: 10.1016/j.rser.2017.05.187
3 赵顺毅, 陈子豪, 张瑾, 等 现代流程工业的机器学习建模[J]. 自动化仪表, 2019, 40 (9): 1- 7
ZHAO Shun-yi, CHEN Zi-hao, ZHANG Jin, et al Modeling based on machine learning for modern process industry[J]. Process Automation Instrumentation, 2019, 40 (9): 1- 7
4 向鸿鑫, 杨云 不平衡数据挖掘方法综述[J]. 计算机工程与应用, 2019, 55 (4): 1- 16
XIANG Hong-xin, YANG Yun Survey on imbalanced data mining methods[J]. Computer Engineering and Applications, 2019, 55 (4): 1- 16
5 张洋. SMOTE算法的改进与应用[D]. 重庆: 重庆大学, 2019.
ZHANG Yang. Improvement and application of SMOTE algorithm[D]. Chongqing: Chongqing University, 2019.
6 LIN W, TSAI C, HU Y, et al Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17- 26
7 SEIFFERT C, KHOSHGOFTAAR T M, VAN H J, et al Rusboost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2009, 40 (1): 185- 197
8 RAYHAN F, AHMED S, MAHBUB A, et al. Cusboost: cluster-based under-sampling with boosting for imbalanced classification[C]// 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution. Bangalore: IEEE, 2017: 70-75.
9 WANG R AdaBoost for feature selection, classification and its relation with SVM, a review[J]. Physics Procedia, 2012, 25: 800- 807
doi: 10.1016/j.phpro.2012.03.160
10 丁伟. 基于数据聚类的机组优化运行目标值研究[D]. 南京: 东南大学, 2019.
DING Wei. Research on target value of unit optimal operation based on data clustering[D]. Nanjing: Southeast University, 2019.
11 刘晓洋. 风速预测中数椐和样本的有效处理及其模型优化研究[D]. 太原: 太原理工大学, 2016.
LIU Xiao-yang. Research on effective processing of data and samples of wind speed forecasting and its model optimization [D]. Taiyuan: Taiyuan University of Technology, 2016.
12 RANA M, RAHMAN A Multiple steps ahead solar photovoltaic power forecasting based on univariate machine learning models and data re-sampling[J]. Sustainable Energy, 2020, 21: 100286
13 纪雪, 周兴华, 唐秋华, 等 多波束测深异常数据检测与剔除方法研究综述[J]. 测绘科学, 2018, 43 (1): 38- 44
JI Xue, ZHOU Xing-hua, TANG Qiu-hua, et al A survey offiltering methods in multibeam bathymetry outliers data[J]. Science of Surveying and Mapping, 2018, 43 (1): 38- 44
14 刘吉臻, 高萌, 吕游, 等 过程运行数据的稳态检测方法综述[J]. 仪器仪表学报, 2013, 34 (8): 1739- 1748
LIU Ji-zhen, GAO Meng, LV You, et al Overview on the steady-state detection methods of process operating data[J]. Chinese Journal of Scientific Instrument, 2013, 34 (8): 1739- 1748
doi: 10.3969/j.issn.0254-3087.2013.08.009
15 CAO S, RHINEHART R R An efficient method for on-line identification of steady state[J]. Journal of Process Control, 1995, 5 (6): 363- 374
doi: 10.1016/0959-1524(95)00009-F
16 CAO S, RHINEHART R R Critical values for a steady-state identifier[J]. Journal of Process Control, 1997, 7 (2): 149- 152
doi: 10.1016/S0959-1524(96)00026-1
17 金建国 聚类方法综述[J]. 计算机科学, 2014, 41 (Suppl. 2): 288- 293
JIN Jian-guo Review of clustering method[J]. Computer Science, 2014, 41 (Suppl. 2): 288- 293
18 高新. 一种改进K-means聚类算法与新的聚类有效性指标研究[D]. 合肥: 安徽大学, 2020.
GAO Xin. Research on improved K-means algorithm and new cluster validity index[D]. Hefei: Anhui University, 2020.
19 BELGIU M, DRAGUT L Random forest in remote sensing: a review of applications and future directions[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114: 24- 31
doi: 10.1016/j.isprsjprs.2016.01.011
20 胡蕊. 燃煤电厂湿法脱硫塔能效评价研究[D]. 济南: 山东大学, 2020.
HU Rui. Study on energy efficiency evaluation of wet flue gas desulphurization tower in coal-fired power plant[D]. Jinan: Shandong University, 2020.
[1] 章超波,刘永政,李宏波,赵阳,张丽珠,王子豪. 基于加权残差聚类的建筑负荷预测区间估计[J]. 浙江大学学报(工学版), 2022, 56(5): 930-937.
[2] 杜晓昕,王浩,崔连和,罗金琦,刘岩,张剑飞,王一萍. 基于聚类和探测精英引导的蜻蜓算法[J]. 浙江大学学报(工学版), 2022, 56(5): 977-986.
[3] 王云灏,孙铭会,辛毅,张博宣. 基于压电薄膜传感器的机器人触觉识别系统[J]. 浙江大学学报(工学版), 2022, 56(4): 702-710.
[4] 庞维庆,何宁,罗燕华,郁晞. 基于数据融合的ABC-SVM社区疾病预测方法[J]. 浙江大学学报(工学版), 2021, 55(7): 1253-1260.
[5] 张师林,马思明,顾子谦. 基于大边距度量学习的车辆再识别方法[J]. 浙江大学学报(工学版), 2021, 55(5): 948-956.
[6] 张琦,陈红,周继彪,张敏,郭璘,杨仁法. 道路开口对临近交叉口交通安全的影响[J]. 浙江大学学报(工学版), 2021, 55(4): 720-726.
[7] 王友卫,凤丽洲. 基于合群度-隶属度噪声检测及动态特征选择的改进AdaBoost算法[J]. 浙江大学学报(工学版), 2021, 55(2): 367-376.
[8] 刘克峰,何嘉保,奚剑雄,何乐年. 自适应死区时间控制的数字控制ACF变换器[J]. 浙江大学学报(工学版), 2021, 55(12): 2365-2372.
[9] 陈曦,曾亚武. 粗糙节理的改进形貌表征方法及采样点距效应[J]. 浙江大学学报(工学版), 2021, 55(11): 2161-2169.
[10] 余煇,柴登峰. 基于长方形点过程的遥感图像汽车提取[J]. 浙江大学学报(工学版), 2019, 53(9): 1741-1748.
[11] 王硕朋,杨鹏,孙昊,刘迈. 两级参考点匹配位置指纹声源定位方法[J]. 浙江大学学报(工学版), 2019, 53(6): 1198-1204.
[12] 陈思,蔡晓东,侯珍珍,李波. 基于非均匀邻居节点采样的聚合式图嵌入方法[J]. 浙江大学学报(工学版), 2019, 53(11): 2163-2167.
[13] 许越, 徐之海, 冯华君, 李奇, 陈跃庭, 徐毅, 赵洪波. 双场景类型遥感图像的配准拼接优化[J]. 浙江大学学报(工学版), 2019, 53(1): 107-114.
[14] 刘冬旭, 董红召. 共享自行车系统调度区域的分形树自平衡划分算法[J]. 浙江大学学报(工学版), 2018, 52(7): 1275-1283.
[15] 李文婧, 孙锋, 李茜瑶, 马东方. 采用递归有序聚类的信号控制时段划分方法[J]. 浙江大学学报(工学版), 2018, 52(6): 1150-1156.