Application of ensemble learning based on preferred sample selection in desulfurization optimization process

doi:10.3785/j.issn.1008-973X.2021.08.018

Journal of ZheJiang University (Engineering Science)

2021, Vol. 55

Issue (8): 1566-1575 DOI: 10.3785/j.issn.1008-973X.2021.08.018

Application of ensemble learning based on preferred sample selection in desulfurization optimization process

Zhi-hui GE(

),Jiang-kuan XING,Kun LUO*(

),Jian-ren FAN

State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou 310027, China

Download:

HTML

PDF(1325KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The ensemble learning approach based on random sampling or cluster sampling was developed to predict the number of desulfurization tower circulating pumps opened. The database was constructed from a realistic power plants, the outliers and unsteady values in the initial one were deleted, and the input parameters of the ensemble learning model with high correlation coefficients with the output were extracted. The problems of imbalanced samples in classification, missing evaluation criteria for optimal samples and desulfurization optimization were solved. Results showed that the improved ensemble learning model had a 33% increase in overall prediction accuracy compared with the original model. In addition, the cluster sampling was slightly better than the random sampling. Furthermore, the recall of a single category prediction was analyzed, and the recall values of different algorithms for the minority category and the majority category were compared. Results showed that the two improved sampling methods had greatly improved the minority category prediction, and the recall reached more than 90%, besides it also had certain improvement on the majority. Finally, the difference in sample distribution and model accuracy was discussed when the pump combination was used as the model’s output.

Key words： clustering sampling ensemble learning desulfurization system preferred sample selection

Received: 10 August 2020 Published: 01 September 2021

CLC:

TK 01+8

Fund: 国家重点研发计划资助项目（2017YFB0601805）

Corresponding Authors: Kun LUO E-mail: 21827006@zju.edu.cn;zjulk@zju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zhi-hui GE
	Jiang-kuan XING
	Kun LUO
	Jian-ren FAN

Cite this article:

Zhi-hui GE,Jiang-kuan XING,Kun LUO,Jian-ren FAN. Application of ensemble learning based on preferred sample selection in desulfurization optimization process. Journal of ZheJiang University (Engineering Science), 2021, 55(8): 1566-1575.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.08.018 OR https://www.zjujournals.com/eng/Y2021/V55/I8/1566

基于样本优选的集成学习在脱硫优化中的应用

基于实际电厂的大量脱硫数据，删除初始脱硫数据库中异常值和非稳态值，提取与输出相关系数较高的集成学习模型输入参数，采用改进的基于随机采样和聚类采样的集成学习算法，建立预测脱硫塔循环泵开启台数的集成学习模型，研究分类问题中样本不均衡、优选样本评价标准缺失和脱硫优化的问题. 结果显示，与改进前模型相比，改进后的集成学习模型总体预测准确度提升了33%，并且基于聚类的采样略优于随机采样. 此外，对单一类别预测的召回率进行分析，对比不同算法对少数类和多数类的召回率，结果显示2种改进的采样方法对少数类的预测有较大的提升，预测的召回率大于90%，对多数类的预测也有一定的提升效果. 讨论泵组合作为模型输出时，其样本分布和模型精度的差异.

关键词： 聚类, 采样, 集成学习, 脱硫系统, 样本优选

Fig.1 Schematic diagram of confusion matrix

Tab.1 Correlation coefficient with output and feature importance to model of different input parameters

Fig.2 Heat map of correlation coefficient between model input variables

Tab.2 Characteristic of samples after outlier removal

Fig.3 Sample distribution map after normalization for different input features

Fig.4 Change of unit power over time

Fig.5 Division of steady-state data and unsteady-state data of unit power

Tab.3 Number of samples selected by different algorithms for different categories

Fig.6 Line chart of model’s accuracy variation for different cluster numbers

Fig.7 Model prediction accuracy changes for different numbers of classify in train and test set

Fig.8 Recall of different algorithms for different numbers of circulating pump

Fig.9 Number of samples for different pump combinations

Fig.10 Number of samples for different pump combinations after simplification


[1]	中华人民共和国统计局. 中国统计年鉴[M]. 北京: 中国统计出版社, 2019.

[2]	BARMA M C, SAIDUR R, RAHMAN S M, et al A review on boilers energy use, energy savings, and emissions reductions[J]. Renewable and Sustainable Energy Reviews, 2017, 79: 970- 983 doi: 10.1016/j.rser.2017.05.187

[3]	赵顺毅, 陈子豪, 张瑾, 等现代流程工业的机器学习建模[J]. 自动化仪表, 2019, 40 (9): 1- 7 ZHAO Shun-yi, CHEN Zi-hao, ZHANG Jin, et al Modeling based on machine learning for modern process industry[J]. Process Automation Instrumentation, 2019, 40 (9): 1- 7

[4]	向鸿鑫, 杨云不平衡数据挖掘方法综述[J]. 计算机工程与应用, 2019, 55 (4): 1- 16 XIANG Hong-xin, YANG Yun Survey on imbalanced data mining methods[J]. Computer Engineering and Applications, 2019, 55 (4): 1- 16

[5]	张洋. SMOTE算法的改进与应用[D]. 重庆: 重庆大学, 2019. ZHANG Yang. Improvement and application of SMOTE algorithm[D]. Chongqing: Chongqing University, 2019.

[6]	LIN W, TSAI C, HU Y, et al Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17- 26

[7]	SEIFFERT C, KHOSHGOFTAAR T M, VAN H J, et al Rusboost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2009, 40 (1): 185- 197

[8]	RAYHAN F, AHMED S, MAHBUB A, et al. Cusboost: cluster-based under-sampling with boosting for imbalanced classification[C]// 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution. Bangalore: IEEE, 2017: 70-75.

[9]	WANG R AdaBoost for feature selection, classification and its relation with SVM, a review[J]. Physics Procedia, 2012, 25: 800- 807 doi: 10.1016/j.phpro.2012.03.160

[10]	丁伟. 基于数据聚类的机组优化运行目标值研究[D]. 南京: 东南大学, 2019. DING Wei. Research on target value of unit optimal operation based on data clustering[D]. Nanjing: Southeast University, 2019.

[11]	刘晓洋. 风速预测中数椐和样本的有效处理及其模型优化研究[D]. 太原: 太原理工大学, 2016. LIU Xiao-yang. Research on effective processing of data and samples of wind speed forecasting and its model optimization [D]. Taiyuan: Taiyuan University of Technology, 2016.

[12]	RANA M, RAHMAN A Multiple steps ahead solar photovoltaic power forecasting based on univariate machine learning models and data re-sampling[J]. Sustainable Energy, 2020, 21: 100286

[13]	纪雪, 周兴华, 唐秋华, 等多波束测深异常数据检测与剔除方法研究综述[J]. 测绘科学, 2018, 43 (1): 38- 44 JI Xue, ZHOU Xing-hua, TANG Qiu-hua, et al A survey offiltering methods in multibeam bathymetry outliers data[J]. Science of Surveying and Mapping, 2018, 43 (1): 38- 44

[14]	刘吉臻, 高萌, 吕游, 等过程运行数据的稳态检测方法综述[J]. 仪器仪表学报, 2013, 34 (8): 1739- 1748 LIU Ji-zhen, GAO Meng, LV You, et al Overview on the steady-state detection methods of process operating data[J]. Chinese Journal of Scientific Instrument, 2013, 34 (8): 1739- 1748 doi: 10.3969/j.issn.0254-3087.2013.08.009

[15]	CAO S, RHINEHART R R An efficient method for on-line identification of steady state[J]. Journal of Process Control, 1995, 5 (6): 363- 374 doi: 10.1016/0959-1524(95)00009-F

[16]	CAO S, RHINEHART R R Critical values for a steady-state identifier[J]. Journal of Process Control, 1997, 7 (2): 149- 152 doi: 10.1016/S0959-1524(96)00026-1

[17]	金建国聚类方法综述[J]. 计算机科学, 2014, 41 (Suppl. 2): 288- 293 JIN Jian-guo Review of clustering method[J]. Computer Science, 2014, 41 (Suppl. 2): 288- 293

[18]	高新. 一种改进K-means聚类算法与新的聚类有效性指标研究[D]. 合肥: 安徽大学, 2020. GAO Xin. Research on improved K-means algorithm and new cluster validity index[D]. Hefei: Anhui University, 2020.

[19]	BELGIU M, DRAGUT L Random forest in remote sensing: a review of applications and future directions[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114: 24- 31 doi: 10.1016/j.isprsjprs.2016.01.011

[20]	胡蕊. 燃煤电厂湿法脱硫塔能效评价研究[D]. 济南: 山东大学, 2020. HU Rui. Study on energy efficiency evaluation of wet flue gas desulphurization tower in coal-fired power plant[D]. Jinan: Shandong University, 2020.

[1]	Chao-bo ZHANG,Yong-zheng LIU,Hong-bo LI,Yang ZHAO,Li-zhu ZHANG,Zi-hao WANG. Weighted residual clustering-based building load prediction interval estimation[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 930-937.

[2]	Xiao-xin DU,Hao WANG,Lian-he CUI,Jin-qi LUO,Yan LIU,Jian-fei ZHANG,Yi-ping WANG. Dragonfly algorithm based on clustering and detection elite guidance[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 977-986.

[3]	Hai-bo ZHANG,Zi-qi LIU,Kai-jian LIU,Yong-jun XU. Activity-aware social vehicle clustering algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1044-1054.

[4]	Yun-hao WANG,Ming-hui SUN,Yi XIN,Bo-xuan ZHANG. Robot tactile recognition system based on piezoelectric film sensor[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 702-710.

[5]	Shi-lin ZHANG,Si-ming MA,Zi-qian GU. Large margin metric learning based vehicle re-identification method[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 948-956.

[6]	Qi ZHANG,Hong CHEN,Ji-biao ZHOU,Min ZHANG,Lin GUO,Ren-fa YANG. Effect of roadway access on traffic safety at adjacent intersection[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 720-726.

[7]	You-wei WANG,Li-zhou FENG. Improved AdaBoost algorithm using group degree and membership degree based noise detection and dynamic feature selection[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 367-376.

[8]	Ke-feng LIU,Jia-bao HE,Jian-xiong XI,Le-nian HE. Digitally controlled active clamp flyback converter with adaptive dead time control[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2365-2372.

[9]	Xi CHEN,Ya-wu ZENG. Improved morphology characterization method and sampling effect of rough rock joint[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(11): 2161-2169.

[10]	Shuo-peng WANG,Peng YANG,Hao SUN,Mai LIU. Fingerprint-based sound source localization method using two-stage reference points matching[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(6): 1198-1204.

[11]	Xiao-dong CAI,Meng WANG,Xiao-xi LIANG,Yun CHEN. Community detection method based on graph convolutional network via importance sampling[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 541-547.

[12]	Si CHEN,Xiao-dong CAI,Zhen-zhen HOU,Bo LI. Aggregate graph embedding method based on non-uniform neighbor nodes sampling[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(11): 2163-2167.

[13]	XU Yue, XU Zhi-hai, FENG Hua-jun, LI Qi, CHEN Yue-ting, XU Yi, ZHAO Hong-bo. Registration and stitching optimization for two-scene-type remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(1): 107-114.

[14]	CHEN Rong-hua, WANG Ying-han, BU Jia-jun, YU Zhi, GAO Fei. Website accessibility sampling evaluation based on KNN and local regression[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1702-1708.

[15]	LIU Dong-xu, DONG Hong-zhao. Fractal tree based self-balanced partitioning algorithms for bike sharing system[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(7): 1275-1283.

Viewed

Full text

Abstract

Cited

Shared

Discussed