Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (7): 1354-1364    DOI: 10.3785/j.issn.1008-973X.2023.07.010
自动化技术     
时序基因驱动的特征表示模型
黄建平1(),陈可2,张建松1,沈思琪1
1. 国网浙江省电力有限公司,浙江 杭州 310063
2. 国网浙江省电力有限公司信息通信分公司,浙江 杭州 310016
Time-series gene driven feature representation model
Jian-ping HUANG1(),Ke CHEN2,Jian-song ZHANG1,Si-qi SHEN1
1. Net Zhejiang Electric Power Limited Company, Hangzhou 310063, China
2. Information Communication Branch, Net Zhejiang Electric Power Limited Company, Hangzhou 310016, China
 全文: PDF(881 KB)   HTML
摘要:

定义“演变基因”的概念来捕获时间序列所隐含的用户行为,描述这些行为如何导致时间序列的产生. 提出统一的框架,通过学习分类器来识别片段的不同演变基因,采用对抗性生成器估计片段的分布来实现演变基因. 该模型有3个主要组成部分:基因识别,旨在学习片段的相应基因;基因生成,旨在学习从基因中生成片段;基因应用,旨在建模行为演变,将学习到的基因应用于未来值和事件的预测中. 本研究的实验基于1个合成数据集和5个真实数据集,相关结果表明,该方法不仅可以获得好的预测结果,而且能够提供对结果的有效解释.

关键词: 时间序列演变基因生成模型对抗性生成器特征学习    
Abstract:

The concept of "evolutionary genes" was defined to capture the underlying user behaviors in time series and describe how these behaviors lead to the generation of time series. A unified framework was proposed. A classifier was learned to identify different evolutionary genes of segments, and an adversarial generator was adopted to estimate the distribution of segments for evolutionary genes. The model consists of three main components: gene identification which aims at learning the corresponding genes of segments; gene generation which aims at learning to generate segments from genes; gene application which aims at modeling behavioral evolution and applying the learned genes to predict future values and events. The experiments of this study were based on one synthetic dataset and five real datasets. Results demonstrate that the method not only achieves good prediction results, but also provides effective explanations for the results.

Key words: time series    evolutionary gene    generation model    adversarial generator    representation learning
收稿日期: 2022-07-11 出版日期: 2023-07-17
CLC:  TP 399  
作者简介: 黄建平(1972—),男,高级工程师,从事数据治理、企业运营管理、数据分析应用、大数据、云计算、人工智能、机电工程的研究. orcid.org/0000-0002-2319-1968. E-mail: huang_jianping@zj.sgcc.com.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
黄建平
陈可
张建松
沈思琪

引用本文:

黄建平,陈可,张建松,沈思琪. 时序基因驱动的特征表示模型[J]. 浙江大学学报(工学版), 2023, 57(7): 1354-1364.

Jian-ping HUANG,Ke CHEN,Jian-song ZHANG,Si-qi SHEN. Time-series gene driven feature representation model. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1354-1364.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.07.010        https://www.zjujournals.com/eng/CN/Y2023/V57/I7/1354

图 1  GeNE模型的结构
数据集 N T Pt V
合成 50 000 10 20 3
地震 461 21 24 1
WebTraffic 142 753 12 30 1
INS 241 045 15 24 2
TMP 16 792 3 30 12
MCE 3 833 213 12 4 2
表 1  使用的6个数据集详细情况统计
指标 H Co
K-means 0.546 0.091
Agllo 0.533 0.089
Birch 0.537 0.092
HMM 0.612 0.101
GMM 0.637 0.112
GeNE 0.674 0.158
表 2  不同方法对合成数据的识别性能
数据集 MAPE
ARIMA LSTM TRMF CVAE GeNE
地震 0.343 0.314 0.222 0.258 0.221
WebTraffic 4.438 3.937 3.091 3.166 2.945
MCE 0.782 0.694 0.574 0.581 0.539
INS 3.654 3.247 2.935 2.797 2.751
TMP 4.715 4.501 3.977 3.981 3.742
表 3  不同方法在5个数据集的回归性能(MAPE)
%
数据集 方法 A 数据集 方法 A
地震 NN-ED 68.22 WebTraffic NN-ED 73.40
地震 NN-DTW 70.31 WebTraffic NN-DTW 74.03
地震 NN-CID 69.41 WebTraffic NN-CID 74.26
地震 FS 74.66 WebTraffic FS 73.89
地震 TSF 74.67 WebTraffic TSF 75.38
地震 SAX-VSM 73.76 WebTraffic SAX-VSM 74.91
地震 MC-DCNN 70.29 WebTraffic MC-DCNN 75.29
地震 LSTM 68.35 WebTraffic LSTM 73.15
地震 CVAE 74.82 WebTraffic CVAE 75.17
地震 GeNE 75.54 WebTraffic GeNE 75.91
表 4  采用不同方法对地震和WebTraffic数据集的分类性能
%
数据集 方法 P R F1 F0.5
MCE NN-ED 59.90 34.82 44.01 52.38
MCE NN-DTW 60.17 41.41 49.04 55.15
MCE NN-CID 57.12 40.86 47.55 52.93
MCE FS 54.34 43.54 48.34 51.74
MCE TSF 76.80 52.61 62.50 70.30
MCE SAX-VSM 65.12 59.96 62.44 64.01
MCE MC-DCNN 78.94 49.27 60.70 70.43
MCE LSTM 79.69 53.56 64.10 72.58
MCE CVAE 77.92 54.12 64.32 72.02
MCE GeNE 80.33 58.17 67.45 74.61
INS NN-ED 28.51 19.33 23.01 26.01
INS NN-DTW 27.14 21.73 24.13 25.84
INS NN-CID 52.65 10.25 17.05 28.75
INS FS 31.66 16.73 21.84 26.85
INS TSF 48.11 21.04 29.13 38.20
INS SAX-VSM 62.71 28.41 40.11 50.51
INS MC-DCNN 53.77 5.79 10.38 20.06
INS LSTM 60.25 28.01 38.23 48.93
INS CVAE 63.27 26.78 37.57 49.67
INS GeNE 71.50 33.15 45.34 58.01
TMP NN-ED 54.43 47.88 50.95 52.92
TMP NN-DTW 51.95 52.43 52.14 52.04
TMP NN-CID 56.12 49.26 52.44 54.61
TMP FS 65.17 58.82 61.85 63.76
TMP TSF 54.20 60.94 57.42 55.47
TMP SAX-VSM 72.22 59.05 64.94 69.10
TMP MC-DCNN 76.79 66.13 71.06 74.37
TMP LSTM 56.21 53.15 54.63 55.69
TMP CVAE 74.86 59.22 66.14 71.15
TMP GeNE 80.23 64.57 71.55 76.51
表 5  采用不同方法对MCE、INS、TMP数据集的分类性能
图 2  GeNE在国家电网提供的数据集上的真实应用
1 BARBOSA S, COSLEY D, SHARMA A, et al. Averaging gone wrong: using time-aware analyses to better understand behavior [C]// Proceedings of the 25th International Conference on World Wide Web. Montréal: ACM, 2016: 829-841.
2 CHAPFUWA P, TAO C, LI C, et al. Adversarial time-to-event modeling [C]// International Conference on Machine Learning. Stockholm: ACM, 2018: 735-744.
3 DU N, DAI H, TRIVEDI R, et al. Recurrent marked temporal point processes: Embedding event history to vector [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1555-1564.
4 JANAKIRAMAN V M, MATTHEWS B, OZA N. Finding precursors to anomalous drop in airspeed during a flight's takeoff [C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax: ACM, 2017: 1843-1852.
5 KINGMA D P, WELLING M. Auto-encoding variational Bayes [EB/OL] . [2023-04-27]. https://arxiv.org/abs/1312.6114.
6 BOUTTEFROY P L M, BOUZERDOUM A, PHUNG S L, et al. On the analysis of background subtraction techniques using Gaussian mixture models [C]// 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas: IEEE, 2010: 4042-4045.
7 YANG Y, JIANG J HMM-based hybrid meta-clustering ensemble for temporal data[J]. Knowledge-based Systems, 2014, 56: 299- 310
doi: 10.1016/j.knosys.2013.12.004
8 LINES J, BAGNALL A Time series classification with ensembles of elastic distance measures[J]. Data Mining and Knowledge Discovery, 2015, 29 (3): 565- 592
doi: 10.1007/s10618-014-0361-2
9 BATISTA G E, KEOGH E J, TATAW O M, et al CID: an efficient complexity-invariant distance for time series[J]. Data Mining and Knowledge Discovery, 2014, 28 (3): 634- 669
doi: 10.1007/s10618-013-0312-3
10 ALTHOFF T, HORVITZ E, WHITE R W, et al. Harnessing the web for population-scale physiological sensing: a case study of sleep and performance [C]// Proceedings of the 26th International Conference on World Wide Web. New York: ACM, 2017: 113-122.
11 PIERSON E, ALTHOFF T, LESKOVEC J. Modeling individual cyclic variation in human behavior [C]// Proceedings of the 2018 World Wide Web Conference. Lyon: ACM, 2018: 107-116.
12 BULL J R, ROWLAND S P, SCHERWITZL E B, et al. Real-world menstrual cycle characteristics of more than 600,000 menstrual cycles [J]. NPJ Digital Medicine, 2019, 2(1): 83.
13 STEFAN A, ATHITSOS V, DAS G The move-split-merge metric for time series[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 25 (6): 1425- 1438
14 BAYTAS I M, XIAO C, ZHANG X, et al. Patient subtyping via time-aware LSTM networks [C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax: ACM, 2017: 65-74.
15 BAYDOGAN M G, RUNGER G Time series representation and similarity based on local autopatterns[J]. Data Mining and Knowledge Discovery, 2016, 30 (2): 476- 509
doi: 10.1007/s10618-015-0425-y
16 KURASHIMA T, ALTHOFF T, LESKOVEC J. Modeling interdependent and periodic real-world action sequences [C]// Proceedings of the 2018 World Wide Web Conference. Lyon: ACM, 2018: 803-812.
17 LIN J, KHADE R, LI Y Rotation-invariant similarity in time series using bag-of-patterns representation[J]. Journal of Intelligent Information Systems, 2012, 39 (2): 287- 315
doi: 10.1007/s10844-012-0196-5
18 XU H, CHEN W, ZHAO N, et al. Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications [C]// Proceedings of the 2018 World Wide Web Conference. Lyon: ACM, 2018: 187-196.
19 RAJAN D, THIAGARAJAN J J. A generative modeling approach to limited channel ECG classification [C]// 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Hawaii: IEEE, 2018: 2571-2574.
20 LIU C L, HSAIO W H, TU Y C Time series classification with multivariate convolutional neural network[J]. IEEE Transactions on Industrial Electronics, 2018, 66 (6): 4788- 4797
21 ZHANG X, GAO Y, LIN J, et al. Tapnet: multivariate time series classification with attentional prototypical network [C]// Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020, 34(4): 6845-6852.
22 SHOKOOHI-YEKTA M, CHEN Y, CAMPANA B, et al. Discovery of meaningful rules in time series [C]// Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney: ACM, 2015: 1085-1094.
23 WU T, GLEICH D F. Retrospective higher-order markov processes for user trails [C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax: ACM, 2017: 1185-1194.
24 BINKOWSKI M, MARTI G, DONNAT P. Autoregressive convolutional neural networks for asynchronous time series [C]// International Conference on Machine Learning. Stockholm: ACM, 2018: 580-589.
25 WANG J, WANG Z, LI J, et al. Multilevel wavelet decomposition network for interpretable time series analysis [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 2437-2446.
26 WANG Y, GAO Z, LONG M, et al. PredRNN++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning [C]// International Conference on Machine Learning. Stockholm: ACM, 2018: 5123-5132.
27 ZHOU H, ZHANG S, PENG J, et al. Informer: beyond efficient transformer for long sequence time-series forecasting [C]// Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI, 2021, 35(12): 11106-11115.
28 ZHOU T, MA Z, WEN Q, et al. FEDformer: frequency enhanced decomposed transformer for long-term series forecasting [EB/OL]. [2023-04-27]. https://arxiv.org/abs/2201.12740.
29 YUE Z, WANG Y, DUAN J, et al. TS2Vec: towards universal representation of time series [EB/OL]. [2023-04-27]. https://arxiv.org/abs/2106.10466.
30 SHANG C, CHEN J, BI J. Discrete graph structure learning for forecasting multiple time series [EB/OL]. [2023-04-27]. https://arxiv.org/abs/2101.06861.
31 CAO D, WANG Y, DUAN J, et al Spectral temporal graph neural network for multivariate time-series forecasting[J]. Advances in Neural Information Processing Systems, 2020, 33: 17766- 17778
32 ARJOVSKY M, BOTTOU L. Towards principled methods for training generative adversarial networks [EB/OL]. [2023-04-27]. https://arxiv.org/abs/1701.04862.
33 KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation [EB/OL]. [2023-04-27]. https://arxiv.org/abs/1710.10196.
34 GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27: 2672- 2680
35 BAO J, CHEN D, WEN F, et al. CVAE-GAN: fine-grained image generation through asymmetric training [C]// Proceedings of the IEEE International Conference on Computer Vision. Cambridge: IEEE, 2017: 2745-2754.
36 ODENA A, OLAH C, SHLENS J. Conditional image synthesis with auxiliary classifier GANs [C]// International Conference on Machine Learning. Sydney: ACM, 2017: 2642-2651.
37 SOHN K, LEE H, YAN X Learning structured output representation using deep conditional generative models[J]. Advances in Neural Information Processing Systems, 2015, 28: 3483- 3491
38 MESCHEDER L, GEIGER A, NOWOZIN S. Which training methods for GANs do actually converge? [C]// International Conference on Machine Learning. Stockholm: ACM, 2018: 3481-3490.
39 GUI J, SUN Z, WEN Y, et al A review on generative adversarial networks: algorithms, theory, and applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35: 3313- 3332
40 SAXENA D, CAO J Generative adversarial networks (GANs) challenges, solutions, and future directions[J]. ACM Computing Surveys, 2021, 54 (3): 1- 42
41 ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1125-1134.
42 LIU M Y, TUZEL O Coupled generative adversarial networks[J]. Advances in Neural Information Processing Systems, 2016, 29: 469- 477
43 EHSANI K, MOTTAGHI R, FARHADI A. Segan: segmenting and generating the invisible [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6144-6153.
44 BALAJI Y, MIN M R, BAI B, et al. Conditional GAN with discriminative filter generation for text-to-video synthesis [C]// International Joint Conferences on Artificial Intelligence. Macao: Morgan Kaufmann, 2019, 28: 1995-2001.
45 ZHANG H, XU T, LI H, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks [C]// Proceedings of the IEEE International Conference on Computer Vision. Honolulu: IEEE, 2017: 5907-5915.
46 JIN G, WANG Q, ZHAO X, et al. Crime-GAN: a context-based sequence generative network for crime forecasting with adversarial loss [C]// 2019 IEEE International Conference on Big Data. Los Angeles: IEEE, 2019: 1460-1469.
47 KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks[J]. Advances in Neural Information Processing Systems, 2019, 32: 137- 146
48 WANG H, WANG J, WANG J, et al. GraphGAN: graph representation learning with generative adversarial nets (2017) [EB/OL]. [2023-04-27]. https://arxiv.org/abs/1711.08267.
49 BAGNALL A, LINES J, BOSTROM A, et al The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances[J]. Data Mining and Knowledge Discovery, 2017, 31 (3): 606- 660
doi: 10.1007/s10618-016-0483-9
50 GULRAJANI I, AHMED F, ARJOVSKY M, et al Improved training of Wasserstein GANs[J]. Advances in Neural Information Processing Systems, 2017, 30: 5769- 5779
51 ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 818-833.
52 LIU C, HOI S C H, ZHAO P, et al. Online arima algorithms for time series prediction [C]// 30th AAAI Conference on Artificial Intelligence. Phoenix: AAAI, 2016 : 1867-1873.
53 HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
54 YU H F, RAO N, DHILLON I S Temporal regularized matrix factorization for high-dimensional time series prediction[J]. Advances in Neural Information Processing Systems, 2016, 29: 847- 855
55 BERNDT D J, CLIFFORD J. Using dynamic time warping to find patterns in time series [C]// Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. Seattle: ACM, 1994: 359-370.
56 BATISTA G E, WANG X, KEOGH E J. A complexity-invariant distance measure for time series [C]// Proceedings of the 2011 SIAM International Conference on Data Mining. Mesa: SIAM, 2011: 699-710.
57 RAKTHANMANON T, KEOGH E. Fast shapelets: a scalable algorithm for discovering time series shapelets [C]// Proceedings of the 2013 SIAM International Conference on Data Mining. Austin: SIAM, 2013: 668-676.
58 DENG H, RUNGER G, TUV E, et al A time series forest for classification and feature extraction[J]. Information Sciences, 2013, 239: 142- 153
doi: 10.1016/j.ins.2013.02.030
59 SENIN P, MALINCHIK S. Sax-VSM: interpretable time series classification using sax and vector space model [C]// 2013 IEEE 13th International Conference on Data Mining. Dallas: IEEE, 2013: 1175-1180.
[1] 李文娟,邓洪高,马谋,蒋俊正. 基于图信号处理的传染病传播预测方法[J]. 浙江大学学报(工学版), 2022, 56(5): 1017-1024.
[2] 张楠,董红召,佘翊妮. 公交专用道条件下公交车辆轨迹的Seq2Seq预测[J]. 浙江大学学报(工学版), 2021, 55(8): 1482-1489.
[3] 朱宝强,王述红,张泽,王鹏宇,董福瑞. 基于时间序列与DEGWO-SVR模型的隧道变形预测方法[J]. 浙江大学学报(工学版), 2021, 55(12): 2275-2285.
[4] 陶成飞,周昊,胡流斌,刘子华,岑可法. 液雾燃烧的热声不稳定动态特性[J]. 浙江大学学报(工学版), 2021, 55(11): 2108-2114.
[5] 陈功,郑春华,翁献明,HAMEEDBaustani,胡鸿昊,马晓宇,柳景青. 基于多水文数据关联分析的雨水口异常诊断[J]. 浙江大学学报(工学版), 2021, 55(1): 55-61.
[6] 展鹏,陈琳,曹鲁慧,李学庆. 基于特征符号表示的网络异常流量检测算法[J]. 浙江大学学报(工学版), 2020, 54(7): 1281-1288.
[7] 王晨霖,杨洁,居文军,顾复,陈芨熙,纪杨建. 基于智能家电的短期电力负荷预测与削峰填谷优化[J]. 浙江大学学报(工学版), 2020, 54(7): 1418-1424.
[8] 汪子龙,王柱,於志文,郭斌,周兴社. 多源数据跨国人口迁移预测[J]. 浙江大学学报(工学版), 2019, 53(9): 1759-1767.
[9] 李麟玮, 吴益平, 苗发盛. 基于灰狼支持向量机的非等时距滑坡位移预测[J]. 浙江大学学报(工学版), 2018, 52(10): 1998-2006.
[10] 巫江虹, 姜峰. 基于动态负荷的空调生命周期气候性能[J]. 浙江大学学报(工学版), 2017, 51(10): 2061-2069.
[11] 蔡青林,陈岭,梅寒蕾,孙建伶. 基于两级过滤的时间序列近似查询[J]. 浙江大学学报(工学版), 2016, 50(7): 1290-1297.
[12] 魏媛,冯天恒,黄平捷,侯迪波,张光新. 管网水质多指标动态关联异常检测方法[J]. 浙江大学学报(工学版), 2016, 50(7): 1402-1409.
[13] 谭海龙, 刘康玲, 金鑫, 石向荣, 梁军. 基于μσ-DWC特征和树结构M-SVM的多维时间序列分类[J]. 浙江大学学报(工学版), 2015, 49(6): 1061-1069.
[14] 杨力,刘济林. 基于非监督特征学习的分叉道路检测算法[J]. 浙江大学学报(工学版), 2014, 48(9): 1558-1563.
[15] 赵建军,王毅,杨利斌. 基于时间序列预测的威胁估计方法[J]. J4, 2014, 48(3): 398-403.