Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2021, Vol. 55 Issue (8): 1566-1575    DOI: 10.3785/j.issn.1008-973X.2021.08.018
    
Application of ensemble learning based on preferred sample selection in desulfurization optimization process
Zhi-hui GE(),Jiang-kuan XING,Kun LUO*(),Jian-ren FAN
State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China
Download: HTML     PDF(1325KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The ensemble learning approach based on random sampling or cluster sampling was developed to predict the number of desulfurization tower circulating pumps opened. The database was constructed from a realistic power plants, the outliers and unsteady values in the initial one were deleted, and the input parameters of the ensemble learning model with high correlation coefficients with the output were extracted. The problems of imbalanced samples in classification, missing evaluation criteria for optimal samples and desulfurization optimization were solved. Results showed that the improved ensemble learning model had a 33% increase in overall prediction accuracy compared with the original model. In addition, the cluster sampling was slightly better than the random sampling. Furthermore, the recall of a single category prediction was analyzed, and the recall values of different algorithms for the minority category and the majority category were compared. Results showed that the two improved sampling methods had greatly improved the minority category prediction, and the recall reached more than 90%, besides it also had certain improvement on the majority. Finally, the difference in sample distribution and model accuracy was discussed when the pump combination was used as the model’s output.



Key wordsclustering      sampling      ensemble learning      desulfurization system      preferred sample selection     
Received: 10 August 2020      Published: 01 September 2021
CLC:  TK 01+8  
Fund:  国家重点研发计划资助项目(2017YFB0601805)
Corresponding Authors: Kun LUO     E-mail: 21827006@zju.edu.cn;zjulk@zju.edu.cn
Cite this article:

Zhi-hui GE,Jiang-kuan XING,Kun LUO,Jian-ren FAN. Application of ensemble learning based on preferred sample selection in desulfurization optimization process. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1566-1575.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.08.018     OR     https://www.zjujournals.com/eng/Y2021/V55/I8/1566


基于样本优选的集成学习在脱硫优化中的应用

基于实际电厂的大量脱硫数据,删除初始脱硫数据库中异常值和非稳态值,提取与输出相关系数较高的集成学习模型输入参数,采用改进的基于随机采样和聚类采样的集成学习算法,建立预测脱硫塔循环泵开启台数的集成学习模型,研究分类问题中样本不均衡、优选样本评价标准缺失和脱硫优化的问题. 结果显示,与改进前模型相比,改进后的集成学习模型总体预测准确度提升了33%,并且基于聚类的采样略优于随机采样. 此外,对单一类别预测的召回率进行分析,对比不同算法对少数类和多数类的召回率,结果显示2种改进的采样方法对少数类的预测有较大的提升,预测的召回率大于90%,对多数类的预测也有一定的提升效果. 讨论泵组合作为模型输出时,其样本分布和模型精度的差异.


关键词: 聚类,  采样,  集成学习,  脱硫系统,  样本优选 
Fig.1 Schematic diagram of confusion matrix
编号 输入参数 rx fimp
1 发电机有功功率/MW 0.352 400 0.089 773
2 1号脱硫原烟气O2体积分数/% ?0.194 900 0.064 280
3 1号脱硫净烟气O2体积分数/% ?0.299 390 0.071 927
4 1号脱硫原烟气含尘质量浓度/(mg·m?3 0.055 873 0.091 056
5 1号脱硫净烟气含尘质量浓度/(mg·m?3 0.117 241 0.076 682
6 1号吸收塔石膏浆液质量浓度/(kg·m?3 0.100 041 0.048 398
7 1号脱硫原烟气SO2质量浓度/(mg·m?3 0.274 171 0.141 799
8 1号脱硫净烟气SO2体积分数/10?6 0.130 771 0.064 986
9 1号吸收塔石灰石供浆体积流量/(m3·h?1 0.227 209 0.038395
10 烟气质量流量/(t·h?1 0.305 443 0.129 984
11 浆液pH值 ?0.003 030 0.080 751
12 进口烟气温度/℃ 0.250 639 0.101 970
Tab.1 Correlation coefficient with output and feature importance to model of different input parameters
Fig.2 Heat map of correlation coefficient between model input variables
编号 输入参数 平均值 最小值 最大值
1 1号脱硫原烟气O2体积分数/% 6.5 3.1 10.6
2 1号脱硫原烟气含尘质量浓度/(mg·m?3 21.6 15.1 34.0
3 1号脱硫净烟气含尘质量浓度/(mg·m?3 1.8 0.2 3.4
4 1号吸收塔石膏浆液质量浓度/(kg·m?3 1 132.0 1 091.7 1 171.3
5 1号脱硫原烟气SO2质量浓度/(mg·m?3 1 582.0 710.7 2 434.0
6 1号脱硫净烟气SO2体积分数/% 5.9 1.0 11.1
7 1号吸收塔石灰石供浆体积流量/(m3·h?1 14.7 0.0 32.2
8 总风质量流量/(t·h?1 1.8 0.2 5.0
9 pH值 5.5 4.8 6.2
10 进口烟气温度/℃ 95.2 80.8 109.3
Tab.2 Characteristic of samples after outlier removal
Fig.3 Sample distribution map after normalization for different input features
Fig.4 Change of unit power over time
Fig.5 Division of steady-state data and unsteady-state data of unit power
类别 N
AdaBoost RUSBoost CUSBoost
2 10 977 5 857 5 857
3 22 781 5 857 5 857
4 5 857 5 857 5 857
Tab.3 Number of samples selected by different algorithms for different categories
Fig.6 Line chart of model’s accuracy variation for different cluster numbers
Fig.7 Model prediction accuracy changes for different numbers of classify in train and test set
Fig.8 Recall of different algorithms for different numbers of circulating pump
Fig.9 Number of samples for different pump combinations
Fig.10 Number of samples for different pump combinations after simplification
[1]   中华人民共和国统计局. 中国统计年鉴[M]. 北京: 中国统计出版社, 2019.
[2]   BARMA M C, SAIDUR R, RAHMAN S M, et al A review on boilers energy use, energy savings, and emissions reductions[J]. Renewable and Sustainable Energy Reviews, 2017, 79: 970- 983
doi: 10.1016/j.rser.2017.05.187
[3]   赵顺毅, 陈子豪, 张瑾, 等 现代流程工业的机器学习建模[J]. 自动化仪表, 2019, 40 (9): 1- 7
ZHAO Shun-yi, CHEN Zi-hao, ZHANG Jin, et al Modeling based on machine learning for modern process industry[J]. Process Automation Instrumentation, 2019, 40 (9): 1- 7
[4]   向鸿鑫, 杨云 不平衡数据挖掘方法综述[J]. 计算机工程与应用, 2019, 55 (4): 1- 16
XIANG Hong-xin, YANG Yun Survey on imbalanced data mining methods[J]. Computer Engineering and Applications, 2019, 55 (4): 1- 16
[5]   张洋. SMOTE算法的改进与应用[D]. 重庆: 重庆大学, 2019.
ZHANG Yang. Improvement and application of SMOTE algorithm[D]. Chongqing: Chongqing University, 2019.
[6]   LIN W, TSAI C, HU Y, et al Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17- 26
[7]   SEIFFERT C, KHOSHGOFTAAR T M, VAN H J, et al Rusboost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2009, 40 (1): 185- 197
[8]   RAYHAN F, AHMED S, MAHBUB A, et al. Cusboost: cluster-based under-sampling with boosting for imbalanced classification[C]// 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution. Bangalore: IEEE, 2017: 70-75.
[9]   WANG R AdaBoost for feature selection, classification and its relation with SVM, a review[J]. Physics Procedia, 2012, 25: 800- 807
doi: 10.1016/j.phpro.2012.03.160
[10]   丁伟. 基于数据聚类的机组优化运行目标值研究[D]. 南京: 东南大学, 2019.
DING Wei. Research on target value of unit optimal operation based on data clustering[D]. Nanjing: Southeast University, 2019.
[11]   刘晓洋. 风速预测中数椐和样本的有效处理及其模型优化研究[D]. 太原: 太原理工大学, 2016.
LIU Xiao-yang. Research on effective processing of data and samples of wind speed forecasting and its model optimization [D]. Taiyuan: Taiyuan University of Technology, 2016.
[12]   RANA M, RAHMAN A Multiple steps ahead solar photovoltaic power forecasting based on univariate machine learning models and data re-sampling[J]. Sustainable Energy, 2020, 21: 100286
[13]   纪雪, 周兴华, 唐秋华, 等 多波束测深异常数据检测与剔除方法研究综述[J]. 测绘科学, 2018, 43 (1): 38- 44
JI Xue, ZHOU Xing-hua, TANG Qiu-hua, et al A survey offiltering methods in multibeam bathymetry outliers data[J]. Science of Surveying and Mapping, 2018, 43 (1): 38- 44
[14]   刘吉臻, 高萌, 吕游, 等 过程运行数据的稳态检测方法综述[J]. 仪器仪表学报, 2013, 34 (8): 1739- 1748
LIU Ji-zhen, GAO Meng, LV You, et al Overview on the steady-state detection methods of process operating data[J]. Chinese Journal of Scientific Instrument, 2013, 34 (8): 1739- 1748
doi: 10.3969/j.issn.0254-3087.2013.08.009
[15]   CAO S, RHINEHART R R An efficient method for on-line identification of steady state[J]. Journal of Process Control, 1995, 5 (6): 363- 374
doi: 10.1016/0959-1524(95)00009-F
[16]   CAO S, RHINEHART R R Critical values for a steady-state identifier[J]. Journal of Process Control, 1997, 7 (2): 149- 152
doi: 10.1016/S0959-1524(96)00026-1
[17]   金建国 聚类方法综述[J]. 计算机科学, 2014, 41 (Suppl. 2): 288- 293
JIN Jian-guo Review of clustering method[J]. Computer Science, 2014, 41 (Suppl. 2): 288- 293
[18]   高新. 一种改进K-means聚类算法与新的聚类有效性指标研究[D]. 合肥: 安徽大学, 2020.
GAO Xin. Research on improved K-means algorithm and new cluster validity index[D]. Hefei: Anhui University, 2020.
[19]   BELGIU M, DRAGUT L Random forest in remote sensing: a review of applications and future directions[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114: 24- 31
doi: 10.1016/j.isprsjprs.2016.01.011
[20]   胡蕊. 燃煤电厂湿法脱硫塔能效评价研究[D]. 济南: 山东大学, 2020.
HU Rui. Study on energy efficiency evaluation of wet flue gas desulphurization tower in coal-fired power plant[D]. Jinan: Shandong University, 2020.
[1] Chao-bo ZHANG,Yong-zheng LIU,Hong-bo LI,Yang ZHAO,Li-zhu ZHANG,Zi-hao WANG. Weighted residual clustering-based building load prediction interval estimation[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 930-937.
[2] Xiao-xin DU,Hao WANG,Lian-he CUI,Jin-qi LUO,Yan LIU,Jian-fei ZHANG,Yi-ping WANG. Dragonfly algorithm based on clustering and detection elite guidance[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 977-986.
[3] Hai-bo ZHANG,Zi-qi LIU,Kai-jian LIU,Yong-jun XU. Activity-aware social vehicle clustering algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1044-1054.
[4] Yun-hao WANG,Ming-hui SUN,Yi XIN,Bo-xuan ZHANG. Robot tactile recognition system based on piezoelectric film sensor[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 702-710.
[5] Shi-lin ZHANG,Si-ming MA,Zi-qian GU. Large margin metric learning based vehicle re-identification method[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 948-956.
[6] Qi ZHANG,Hong CHEN,Ji-biao ZHOU,Min ZHANG,Lin GUO,Ren-fa YANG. Effect of roadway access on traffic safety at adjacent intersection[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 720-726.
[7] You-wei WANG,Li-zhou FENG. Improved AdaBoost algorithm using group degree and membership degree based noise detection and dynamic feature selection[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 367-376.
[8] Ke-feng LIU,Jia-bao HE,Jian-xiong XI,Le-nian HE. Digitally controlled active clamp flyback converter with adaptive dead time control[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2365-2372.
[9] Xi CHEN,Ya-wu ZENG. Improved morphology characterization method and sampling effect of rough rock joint[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(11): 2161-2169.
[10] Shuo-peng WANG,Peng YANG,Hao SUN,Mai LIU. Fingerprint-based sound source localization method using two-stage reference points matching[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(6): 1198-1204.
[11] Xiao-dong CAI,Meng WANG,Xiao-xi LIANG,Yun CHEN. Community detection method based on graph convolutional network via importance sampling[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 541-547.
[12] Si CHEN,Xiao-dong CAI,Zhen-zhen HOU,Bo LI. Aggregate graph embedding method based on non-uniform neighbor nodes sampling[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(11): 2163-2167.
[13] XU Yue, XU Zhi-hai, FENG Hua-jun, LI Qi, CHEN Yue-ting, XU Yi, ZHAO Hong-bo. Registration and stitching optimization for two-scene-type remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(1): 107-114.
[14] CHEN Rong-hua, WANG Ying-han, BU Jia-jun, YU Zhi, GAO Fei. Website accessibility sampling evaluation based on KNN and local regression[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1702-1708.
[15] LIU Dong-xu, DONG Hong-zhao. Fractal tree based self-balanced partitioning algorithms for bike sharing system[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(7): 1275-1283.