Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (2): 278-288    DOI: 10.3785/j.issn.1008-973X.2025.02.006
    
Offline small data-driven evolutionary algorithm based on multi-kernel data synthesis
Erchao LI(),Yun LIU
College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China
Download: HTML     PDF(838KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An offline data-driven evolutionary algorithm based on multi-kernel data synthesis (DDEA-MKDS) was proposed to enhance the performance of such algorithms in small data scenarios and weaken the dependence of surrogate model on data set size. The empirical formula and traversal method were used to calculate the optimal number of hidden layer nodes for the offline data set in order to simplify model structure by considering that the surrogate model is prone to overfitting due to small data. Three radial basis networks with different kernel functions were trained to generate synthetic data in order to make up for the lack of data. A part of synthetic data was selected by roulette to combine with original data, and new data set was used to train the surrogate model. The experimental results showed that DDEA-MKDS had good performance under the condition of small data by comparing with five states of the art offline data-driven evolutionary algorithms on six single objective benchmark problems, and its efficiency was significantly better than other algorithms.



Key wordsoffline data-driven      evolutionary algorithm      small data      surrogate model      hidden layer node      synthetic data     
Received: 13 December 2023      Published: 11 February 2025
CLC:  TP 181  
Fund:  国家自然科学基金资助项目(62063019);甘肃省自然科学基金资助项目(24JRRA173,22JR5RA241).
Cite this article:

Erchao LI,Yun LIU. Offline small data-driven evolutionary algorithm based on multi-kernel data synthesis. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 278-288.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.02.006     OR     https://www.zjujournals.com/eng/Y2025/V59/I2/278


基于多核数据合成的离线小数据驱动的进化算法

为了增强离线数据驱动的进化算法在小数据情景中的表现, 削弱代理模型对数据集规模的依赖, 提出基于多核数据合成的离线小数据驱动的进化算法(DDEA-MKDS). 考虑到代理模型易因小数据陷入过拟合, 通过经验公式与遍历法找出针对离线数据集的最优隐含层节点数,以简化模型结构. 为了弥补数据量的不足, 训练了3个不同核函数的径向基网络生成合成数据, 通过轮盘赌法选择其中的部分数据与原数据集合并, 使用新数据集训练代理模型. 将DDEA-MKDS与其他5种流行的离线数据驱动的进化算法在6个单目标基准测试问题上进行对比, 实验结果表明, 所提算法在数据量极小的条件下能够取得良好的效果, 寻优效率显著优于其他算法.


关键词: 离线数据驱动,  进化算法,  小数据,  代理模型,  隐含层节点,  合成数据 
Fig.1 Structure of radial basis function network
Fig.2 Flow chart of DDEA-MKDS
算法1 DDEA-MKDS的总体框架
输入:离线数据集D、初始化种群pop、最大迭代次数G
输出:最优解sbest
1) 进行隐含层节点数寻优,得到基于D的最佳隐含层节点数hB.
2) 使用D训练多核模型MRBFN1MRBFN2MRBFN3,隐含层节点数为hB.
3) 由多核数据合成,得到合成数据集DS.
4) 由DS训练新模型MRBFN4, 隐含层节点数为hB.
5) 使用遗传算法对pop进行迭代寻优,以MRBFN4对个体的预测值代替个体适应度. 迭代次数为G.
6) 记最后一代种群中的最优个体为sbest.
7) return sbest
 
算法2 DDEA-MKDS中的隐含层节点数寻优
输入:离线数据集D、输入层节点数n、输出层节点数m
输出:最佳隐含层节点数hB
1) 由式(6)计算得到寻优区间上限hT.
2) for i = 2 : hT
  从D中取出数据ptest(pxpreal), 若preal = 0,则重新抽取. 将剩余数据集记为DR.
  使用DR训练模型MiRBFNMiRBFN为隐含层节点数为i的模型.
  记MiRBFNpx的预测值为ppre.
  由式(7)计算ei.
 end for
3) 记ei最小时的iis, 令hB = is.
4) return hB
 
算法3 DDEA-MKDS中的多核数据合成
输入:离线数据集D、无标签数据采样量dm
输出:合成数据集DS
1) DS = D.
2) 采样得到无标签数据集DU{(xa,·), a=1, 2$, \cdots , $ n},数量为Ddm倍.
3) 使用D训练模型MRBFN1MRBFN2MRBFN3,模型分别使用式(3)~(5)所示的核函数.
4) 由模型预测得到伪标签数据集DF{(xaya), a=1,2$, \cdots , $n}, 其中yaMRBFN1MRBFN2MRBFN3xa预测值的平均值.
5) 利用式(8)、(9)得到带选择概率的数据集DF{(xayaqa), a=1, 2$, \cdots , $ n}.
6) for a = 1 : n
  if qa > rand
   将数据(xaya)加入DS.
  end if
 end for
7) return DS
 
问题dDDEA-MKDSTT-DDEASRK-DDEACC-DDEACL-DDEADDEA-SE
Ellipsoid100.79±0.734.52±6.36(+)1.91±2.52(≈)1.91±1.12(+)114.41±97.15(+)2.81±1.38(+)
305.58±2.9918.68±14.99(+)11.40±11.59(+)9.05±5.60(+)451.17±394.59(+)48.57±21.00(+)
5027.88±9.13110.19±84.30(+)23.99±14.62(?)32.19±12.61(+)1181.39±1035.79(+)195.13±57.46(+)
100252.72±52.581380.95±1237.13(+)269.84±54.97(≈)49.60±22.66(?)4908.28±5817.24(+)1399.45±377.21(+)
Rosenbrock1016.22±7.0020.84±10.33(≈)23.90±12.02(+)31.33±15.88(+)941.78±796.76(+)31.81±14.69(+)
3039.82±9.8057.50±25.14(+)65.48±29.53(+)61.12±16.51(+)1170.28±1082.81(+)80.95±28.13(+)
5069.24±18.95125.76±37.03(+)85.74±20.65(≈)104.91±51.58(+)1298.44±1265.39(+)217.48±61.36(+)
100209.61±36.64373.37±68.49(+)189.10±24.13(≈)178.94±42.90(≈)1907.06±1789.96(+)714.23±121.78(+)
Ackley105.78±1.739.71±4.75(+)6.42±1.90(≈)7.57±2.24(+)10.27±4.71(≈)6.35±1.63(≈)
304.25±0.696.91±1.98(+)6.10±2.17(+)8.19±4.71(+)16.19±2.45(+)8.96±1.35(+)
504.73±0.477.63±0.81(+)5.94±1.65(≈)5.73±1.98(≈)11.51±3.09(+)10.25±0.86(+)
1007.20±0.5711.02±2.01(+)7.28±0.60(≈)4.95±0.85(?)13.01±4.70(+)11.59±0.52(+)
Levy101.60±0.363.19±3.00(≈)2.30±1.22(≈)2.48±2.42(≈)18.56±9.85(+)2.95±2.53(≈)
303.57±0.7610.18±10.03(+)4.39±1.33(≈)4.02±0.87(≈)51.71±39.59(+)8.62±3.44(+)
505.94±0.7716.57±11.72(+)6.01±1.06(≈)10.30±4.54(+)109.68±64.49(+)20.91±6.25(+)
10013.70±3.59106.97±66.70(+)13.88±3.64(≈)11.03±1.97(≈)154.60±79.65(+)73.42±17.79(+)
Griewank101.21±0.283.51±3.64(+)1.12±0.11(≈)1.19±0.21(≈)26.32±25.28(+)2.26±0.54(+)
303.32±1.005.72±3.12(+)1.47±0.14(?)1.70±0.33(?)96.19±122.65(+)12.34±4.19(+)
506.24±1.819.23±6.83(≈)3.04±0.51(?)2.21±0.40(?)136.93±114.01(+)22.87±6.96(+)
10021.39±4.1261.68±30.82(+)18.56±3.96(≈)3.61±0.74(?)148.10±168.05(+)90.66±20.69(+)
Rastrigin1042.23±26.4658.47±28.90(≈)64.47±35.35(+)89.87±37.44(+)138.64±34.53(+)76.92±26.52(≈)
3095.41±61.47242.85±162.46(+)152.73±104.34(≈)195.63±60.41(+)322.84±68.42(+)219.82±36.95(+)
50184.75±61.41397.02±89.28(+)245.34±101.93(+)280.60±89.96(+)570.56±47.71(+)431.38±57.44(+)
100692.34±122.681061.33±190.85(+)697.66±128.78(≈)319.72±271.23(?)1307.19±388.38(+)953.94±51.30(+)
+/≈/?NA20/4/06/15/312/6/623/1/021/3/0
Friedman Rank1.634.042.402.406.004.54
Tab.1 Optimization result of DDEA-MKDS and other comparison algorithms on six test problems
Fig.3 Convergence curves of DDEA-MKDS and other comparison algorithms on Ellipsoid and Rastrigin
Fig.4 Average running time of DDEA-MKDS and other comparison algorithms on all problems
问题dDDEA-MKDSDDEA-MKDS(h/n)DDEA-RBFN(h)DDEA-MKDS(g)DDEA-RBFN
Ellipsoid100.79±0.7360.57±35.39(+)2.83±2.18(+)3.10±1.97(+)50.96±54.94(+)
305.58±2.99422.51±574.92(+)10.23±5.03(≈)36.28±14.82(+)4506.62±3160.73(+)
5027.88±9.131317.62±1044.82(+)44.91±29.69(≈)63.95±26.79(+)19728.63±3086.11(+)
100252.72±52.586072.01±2636.78(+)377.01±240.00(≈)370.92±133.13(+)68220.71±5090.82(+)
Rastrigin1042.23±26.46138.32±30.58(+)70.23±29.99(+)67.95±24.70(≈)142.56±51.07(+)
3095.41±61.47331.56±32.75(+)233.53±93.44(+)178.98±85.91(≈)665.82±133.87(+)
50184.75±61.41540.96±69.72(+)393.08±159.99(+)300.36±68.50(+)1215.14±55.33(+)
100692.34±122.681255.65±147.53(+)1120.25±492.24(+)759.84±59.12(+)2703.70±228.01(+)
+/≈/?NA8/0/05/3/06/2/08/0/0
Tab.2 Optimization result of DDEA-MKDS and other various algorithm on Ellipsoid and Rastrigin
Fig.5 Average running time of DDEA-MKDS and other various algorithm on all problems
问题dDDEA-MKDSDDEA-MKDS(k3b)DDEA-MKDS(k3c)DDEA-MKDS(k3d)DDEA-MKDS(k4)
Ellipsoid100.79±0.730.87±0.740.62±0.401.35±1.541.02±1.08
5027.88±9.1335.22±29.2721.75±8.8430.53±13.8628.66±10.13
100252.72±52.58265.44±34.73314.06±123.93270.38±60.79307.81±86.81
Rastrigin1042.23±26.4642.78±32.7737.91±26.0645.81±36.6442.30±26.75
50184.75±61.41180.45±88.49190.50±110.37225.95±108.19221.98±107.42
100692.34±122.68695.47±127.53725.05±134.91706.52±134.54704.88±81.15
Friedman Rank1.672.832.674.333.50
Tab.3 Optimization result of DDEA-MKDS with different kernel function on Ellipsoid and Rastrigin
问题dDDEA-MKDS(k3)DDEA-MKDS(k4)
Ellipsoid102.67±0.202.81±0.22
505.34±0.495.43±0.46
1008.31±0.688.63±0.69
Rastrigin102.62±0.212.71±0.21
505.23±0.425.43±0.48
1008.26±0.708.30±0.86
Tab.4 Average running time of DDEA-MKDS with different number of kernel function
问题ddm=5dm=20dm=40dm=60dm=80dm=100dm=120
Ellipsoid102.90±1.921.56±1.440.92±0.960.79±0.731.24±1.600.93±1.021.24±1.34
Ellipsoid5078.96±26.0038.55±20.3133.67±15.6827.88±9.1330.44±13.8330.71±19.0625.34±16.28
Ellipsoid100533.97±134.03360.59±84.39265.83±53.41252.72±52.58277.75±63.42262.33±46.02264.12±81.76
Rastrigin1080.36±24.3546.61±31.0640.37±29.6042.23±46.4649.75±35.8340.66±28.6030.73±21.73
Rastrigin50381.77±75.94279.04±115.55202.29±63.64184.75±61.41188.77±68.81186.22±120.27218.12±101.16
Rastrigin100857.95±112.39811.04±131.71777.63±161.02692.34±122.68670.59±108.23684.92±108.10675.76±126.05
Friedman Rank7.005.833.672.173.752.832.75
p0.0000.0030.229NA0.2040.5930.640
Tab.5 Optimization result of DDEA-MKDS on Ellipsoid and Rastrigin when dm takes different values
Fig.6 Variation trend of optimization result of DDEA-MKDS when dm takes different values
[1]   JIN Y, WANG H, CHUGH T, et al Data-driven evolutionary optimization: an overview and case studies[J]. IEEE Transactions on Evolutionary Computation, 2019, 23 (3): 442- 458
doi: 10.1109/TEVC.2018.2869001
[2]   CHEN R, HE C, JIN Y, et al Model-based evolutionary algorithms: a short survey[J]. Complex and Intelligent Systems, 2018, 4 (4): 283- 292
doi: 10.1007/s40747-018-0080-1
[3]   HE C, TIAN Y, WANG H, et al A repository of real-world datasets for data-driven evolutionary multiobjective optimization[J]. Complex and Intelligent Systems, 2020, 6 (1): 189- 197
doi: 10.1007/s40747-019-00126-2
[4]   WANG H, JIN Y, JANSEN J O Data-driven surrogate-assisted multiobjective evolutionary optimization of a trauma system[J]. IEEE Transactions on Evolutionary Computation, 2016, 20 (6): 939- 952
doi: 10.1109/TEVC.2016.2555315
[5]   GUO D, CHAI T, DING J, et al. Small data driven evolutionary multi-objective optimization of fused magnesium furnaces [C]// IEEE Symposium Series on Computational Intelligence . Athens: IEEE, 2016: 1-8.
[6]   黄鹏飞. 离线数据驱动的进化优化研究[D]. 西安: 西安电子科技大学, 2021: 15-18, 46-47.
HUANG Pengfei. A study on offline data-driven evolutionary optimization [D]. Xi’an: Xidian University, 2021: 15-18, 46-47.
[7]   HUANG P, WANG H, MA W. Stochastic ranking for offline data-driven evolutionary optimization using radial basis function networks with multiple kernels [C]// IEEE Symposium Series on Computational Intelligence . Xiamen: IEEE, 2019: 2050-2057.
[8]   CHENG R, JIN Y, NARUKAWA K, et al A multiobjective evolutionary algorithm using gaussian process-based inverse modeling[J]. IEEE Transactions on Evolutionary Computation, 2015, 19 (6): 838- 856
doi: 10.1109/TEVC.2015.2395073
[9]   梁正平, 黄锡均, 李燊钿, 等 基于剪枝堆栈泛化的离线数据驱动进化优化[J]. 自动化学报, 2023, 49 (6): 1306- 1325
LIANG Zhengping, HUANG Xijun, LI Shentian, et al Offline data driven evolutionary optimization based on pruning stacked generalization[J]. Acta Automatica Sinica, 2023, 49 (6): 1306- 1325
[10]   ZHOU Z, ONG Y S, NGUYEN M H, et al. A study on polynomial regression and gaussian process global surrogate model in hierarchical surrogate-assisted evolutionary algorithm [C]// IEEE Congress on Evolutionary Computation . Edinburgh: IEEE, 2005: 2832-2839.
[11]   HUANG P, WANG H, JIN Y Offline data-driven evolutionary optimization based on tri-training[J]. Swarm and Evolutionary Computation, 2021, 60: 100800
doi: 10.1016/j.swevo.2020.100800
[12]   CHUGH T, CHAKRABORTI N, SINDHYA K, et al A data-driven surrogate-assisted evolutionary algorithm applied to a many-objective blast furnace optimization problem[J]. Materials and Manufacturing Processes, 2017, 32 (10): 1172- 1178
doi: 10.1080/10426914.2016.1269923
[13]   LI J, ZHAN Z, ZHANG J Evolutionary computation for expensive optimization: a survey[J]. Machine Intelligence Research, 2022, 19 (1): 3- 23
doi: 10.1007/s11633-022-1317-4
[14]   CHUGH T, JIN Y, MIETTINEN K, et al A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization[J]. IEEE Transactions on Evolutionary Computation, 2016, 22 (1): 129- 142
[15]   LIM D, ONG Y S, JIN Y, et al. A study on metamodeling techniques, ensembles, and multi-surrogates in evolutionary computation [C]// Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation . London: ACM, 2007: 1288-1295.
[16]   MEY A, LOOG M Improved generalization in semi-supervised learning: a survey of theoretical results[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (4): 4747- 4767
doi: 10.1109/TPAMI.2022.3198175
[17]   GÖNEN M, ALPAYDIN E Multiple kernel learning algorithms[J]. The Journal of Machine Learning Research, 2011, 12 (7): 2211- 2268
[18]   MA X, LI X, ZHANG Q, et al A survey on cooperative co-evolutionary algorithms[J]. IEEE Transactions on Evolutionary Computation, 2019, 23 (3): 421- 441
doi: 10.1109/TEVC.2018.2868770
[19]   WANG H, JIN Y, SUN C, et al Offline data-driven evolutionary optimization using selective surrogate ensembles[J]. IEEE Transactions on Evolutionary Computation, 2018, 23 (2): 203- 216
[20]   ZHOU Z, LI M Tri-training: exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17 (11): 1529- 1541
doi: 10.1109/TKDE.2005.186
[21]   GOSAIN A, SACHDEVA K Materialized view selection for query performance enhancement using stochastic ranking based cuckoo search algorithm[J]. International Journal of Reliability, Quality and Safety Engineering, 2020, 27 (3): 2050008
doi: 10.1142/S0218539320500084
[22]   GONG Y, ZHONG Y, HUANG H. Offline data-driven optimization at scale: a cooperative coevolutionary approach [EB/OL]. (2023-12-04)[2024-01-30]. https://doi.org/10.1109/TEVC.2023.3338693.
[23]   HUANG H, GONG Y Contrastive learning: an alternative surrogate for offline data-driven evolutionary computation[J]. IEEE Transactions on Evolutionary Computation, 2023, 27 (2): 370- 384
doi: 10.1109/TEVC.2022.3170638
[24]   WANG R, ZHU F, ZHANG X, et al Training with scaled logits to alleviate class-level over-fitting in few-shot learning[J]. Neurocomputing, 2023, 522: 142- 151
doi: 10.1016/j.neucom.2022.12.011
[25]   ROHLOFF C T, KOHLI N, CHUNG S The impact of functional form complexity on model overfitting for nonlinear mixed-effects models[J]. Multivariate Behavioral Research, 2023, 58 (4): 723- 742
doi: 10.1080/00273171.2022.2119360
[26]   RASTEGAR R, HARIRI A A step forward in studying the compact genetic algorithm[J]. Evolutionary Computation, 2006, 14 (3): 277- 289
doi: 10.1162/evco.2006.14.3.277
[27]   王嵘冰, 徐红艳, 李波, 等 BP神经网络隐含层节点数确定方法研究[J]. 计算机技术与发展, 2018, 28 (4): 31- 35
WANG Rongbing, XU Hongyan, LI Bo, et al Research on method of determining hidden layer nodes in BP neural network[J]. Computer Technology and Development, 2018, 28 (4): 31- 35
doi: 10.3969/j.issn.1673-629X.2018.04.007
[28]   MAO Y, LIU C, XIAO D, et al Study of the magnetic properties of haematite based on spectroscopy and the IPSO-ELM neural network[J]. Journal of Sensors, 2018, 2018 (1): 1- 9
[29]   MIN M, CHEN X, LEI Y, et al A novel kernel-based extreme learning machine with incremental hidden layer nodes[J]. IFAC PapersOnLine, 2020, 53 (2): 11836- 11841
doi: 10.1016/j.ifacol.2020.12.695
[30]   TAO L, CAO T, WANG Q, et al Distribution adaptation and classification framework based on multiple kernel learning for motor imagery BCI illiteracy[J]. Sensors, 2022, 22 (17): 6572
doi: 10.3390/s22176572
[31]   PRICE S R, ANDERSEN D T, HAVENS T C, et al Kernel matrix-based heuristic multiple kernel learning[J]. Mathematics, 2022, 10 (12): 2026
doi: 10.3390/math10122026
[1] Fei WU,Jiacheng CHEN,Wanliang WANG. Review on computational intelligence based on parallel computing[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 27-38.
[2] Bo-ping YU,Gao-hua LI,Liang XIE,Fu-xin WANG. Dynamic stall optimization design of rotor airfoil based on surrogate model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 833-842.
[3] ZHANG Xuan-wu, ZHENG Yao, YANG Bo-wei, ZHANG Ji-fa. Aerodynamic optimization design of airfoil configurations based on cascade feedforward neural network[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(7): 1405-1411.
[4] GUO Xiao fang, WANG Yu ping, DAI Cai. New hybrid decomposition many-objective evolutionary algorithm[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1313-1321.
[5] WANG Yun, FENG Yi-xiong, TAN Jian-rong, GAO Yi-cong. Multi-objective optimization method of flexible
job-shop lot-splitting scheduling
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(4): 719-726.
[6] ZHAO Peng, FU Jian-zhong, LI Yang, CUI Shu-biao. Iterative optimization method for injection parameters based on
surrogate model
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(2): 197-200.
[7] LIU Lin, SHU Xiao-Chi, HU Jun-Jiang, et al. Modification on non-dominated sorting genetic algorithm used for air powered engine design[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(5): 907-910.