The ensemble learning approach based on random sampling or cluster sampling was developed to predict the number of desulfurization tower circulating pumps opened. The database was constructed from a realistic power plants, the outliers and unsteady values in the initial one were deleted, and the input parameters of the ensemble learning model with high correlation coefficients with the output were extracted. The problems of imbalanced samples in classification, missing evaluation criteria for optimal samples and desulfurization optimization were solved. Results showed that the improved ensemble learning model had a 33% increase in overall prediction accuracy compared with the original model. In addition, the cluster sampling was slightly better than the random sampling. Furthermore, the recall of a single category prediction was analyzed, and the recall values of different algorithms for the minority category and the majority category were compared. Results showed that the two improved sampling methods had greatly improved the minority category prediction, and the recall reached more than 90%, besides it also had certain improvement on the majority. Finally, the difference in sample distribution and model accuracy was discussed when the pump combination was used as the model’s output.
Zhi-hui GE,Jiang-kuan XING,Kun LUO,Jian-ren FAN. Application of ensemble learning based on preferred sample selection in desulfurization optimization process. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1566-1575.
Tab.1Correlation coefficient with output and feature importance to model of different input parameters
Fig.2Heat map of correlation coefficient between model input variables
编号
输入参数
平均值
最小值
最大值
1
1号脱硫原烟气O2体积分数/%
6.5
3.1
10.6
2
1号脱硫原烟气含尘质量浓度/(mg·m?3)
21.6
15.1
34.0
3
1号脱硫净烟气含尘质量浓度/(mg·m?3)
1.8
0.2
3.4
4
1号吸收塔石膏浆液质量浓度/(kg·m?3)
1 132.0
1 091.7
1 171.3
5
1号脱硫原烟气SO2质量浓度/(mg·m?3)
1 582.0
710.7
2 434.0
6
1号脱硫净烟气SO2体积分数/%
5.9
1.0
11.1
7
1号吸收塔石灰石供浆体积流量/(m3·h?1)
14.7
0.0
32.2
8
总风质量流量/(t·h?1)
1.8
0.2
5.0
9
pH值
5.5
4.8
6.2
10
进口烟气温度/℃
95.2
80.8
109.3
Tab.2Characteristic of samples after outlier removal
Fig.3Sample distribution map after normalization for different input features
Fig.4Change of unit power over time
Fig.5Division of steady-state data and unsteady-state data of unit power
类别
N
AdaBoost
RUSBoost
CUSBoost
2
10 977
5 857
5 857
3
22 781
5 857
5 857
4
5 857
5 857
5 857
Tab.3Number of samples selected by different algorithms for different categories
Fig.6Line chart of model’s accuracy variation for different cluster numbers
Fig.7Model prediction accuracy changes for different numbers of classify in train and test set
Fig.8Recall of different algorithms for different numbers of circulating pump
Fig.9Number of samples for different pump combinations
Fig.10Number of samples for different pump combinations after simplification
[1]
中华人民共和国统计局. 中国统计年鉴[M]. 北京: 中国统计出版社, 2019.
[2]
BARMA M C, SAIDUR R, RAHMAN S M, et al A review on boilers energy use, energy savings, and emissions reductions[J]. Renewable and Sustainable Energy Reviews, 2017, 79: 970- 983
doi: 10.1016/j.rser.2017.05.187
[3]
赵顺毅, 陈子豪, 张瑾, 等 现代流程工业的机器学习建模[J]. 自动化仪表, 2019, 40 (9): 1- 7 ZHAO Shun-yi, CHEN Zi-hao, ZHANG Jin, et al Modeling based on machine learning for modern process industry[J]. Process Automation Instrumentation, 2019, 40 (9): 1- 7
[4]
向鸿鑫, 杨云 不平衡数据挖掘方法综述[J]. 计算机工程与应用, 2019, 55 (4): 1- 16 XIANG Hong-xin, YANG Yun Survey on imbalanced data mining methods[J]. Computer Engineering and Applications, 2019, 55 (4): 1- 16
[5]
张洋. SMOTE算法的改进与应用[D]. 重庆: 重庆大学, 2019. ZHANG Yang. Improvement and application of SMOTE algorithm[D]. Chongqing: Chongqing University, 2019.
[6]
LIN W, TSAI C, HU Y, et al Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17- 26
[7]
SEIFFERT C, KHOSHGOFTAAR T M, VAN H J, et al Rusboost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2009, 40 (1): 185- 197
[8]
RAYHAN F, AHMED S, MAHBUB A, et al. Cusboost: cluster-based under-sampling with boosting for imbalanced classification[C]// 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution. Bangalore: IEEE, 2017: 70-75.
[9]
WANG R AdaBoost for feature selection, classification and its relation with SVM, a review[J]. Physics Procedia, 2012, 25: 800- 807
doi: 10.1016/j.phpro.2012.03.160
[10]
丁伟. 基于数据聚类的机组优化运行目标值研究[D]. 南京: 东南大学, 2019. DING Wei. Research on target value of unit optimal operation based on data clustering[D]. Nanjing: Southeast University, 2019.
[11]
刘晓洋. 风速预测中数椐和样本的有效处理及其模型优化研究[D]. 太原: 太原理工大学, 2016. LIU Xiao-yang. Research on effective processing of data and samples of wind speed forecasting and its model optimization [D]. Taiyuan: Taiyuan University of Technology, 2016.
[12]
RANA M, RAHMAN A Multiple steps ahead solar photovoltaic power forecasting based on univariate machine learning models and data re-sampling[J]. Sustainable Energy, 2020, 21: 100286
[13]
纪雪, 周兴华, 唐秋华, 等 多波束测深异常数据检测与剔除方法研究综述[J]. 测绘科学, 2018, 43 (1): 38- 44 JI Xue, ZHOU Xing-hua, TANG Qiu-hua, et al A survey offiltering methods in multibeam bathymetry outliers data[J]. Science of Surveying and Mapping, 2018, 43 (1): 38- 44
[14]
刘吉臻, 高萌, 吕游, 等 过程运行数据的稳态检测方法综述[J]. 仪器仪表学报, 2013, 34 (8): 1739- 1748 LIU Ji-zhen, GAO Meng, LV You, et al Overview on the steady-state detection methods of process operating data[J]. Chinese Journal of Scientific Instrument, 2013, 34 (8): 1739- 1748
doi: 10.3969/j.issn.0254-3087.2013.08.009
[15]
CAO S, RHINEHART R R An efficient method for on-line identification of steady state[J]. Journal of Process Control, 1995, 5 (6): 363- 374
doi: 10.1016/0959-1524(95)00009-F
[16]
CAO S, RHINEHART R R Critical values for a steady-state identifier[J]. Journal of Process Control, 1997, 7 (2): 149- 152
doi: 10.1016/S0959-1524(96)00026-1
高新. 一种改进K-means聚类算法与新的聚类有效性指标研究[D]. 合肥: 安徽大学, 2020. GAO Xin. Research on improved K-means algorithm and new cluster validity index[D]. Hefei: Anhui University, 2020.
[19]
BELGIU M, DRAGUT L Random forest in remote sensing: a review of applications and future directions[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114: 24- 31
doi: 10.1016/j.isprsjprs.2016.01.011
[20]
胡蕊. 燃煤电厂湿法脱硫塔能效评价研究[D]. 济南: 山东大学, 2020. HU Rui. Study on energy efficiency evaluation of wet flue gas desulphurization tower in coal-fired power plant[D]. Jinan: Shandong University, 2020.