Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Computer Technology, Information Engineering     
Two-step filtering based time series similarity search
CAI Qing lin, CHEN Ling, MEI Han lei, SUN Jian ling
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Download:   PDF(926KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A two-step filtering based search model was proposed in order to solve the problms that existing approximated search models have the poor controllability on the accuracy and have the low efficiency at the post-processing step. The symbolic aggregate approximation (SAX) method was employed to extract the symbolic feature vectors from time series. The high-dimensional time series was projected into the low-dimensional feature space. The symbolic feature vectors were saved as the form of vector approximation file (VA-File), and the efficient inverted index was introduced. A heuristic query and filtering algorithm was proposed at the procedure of searching in order to perform the first filtering, and an efficient boundary pruning method was proposed over the VA-File to reach the second filtering. The model can effectively control the search accuracy by the SAX feature vectors with multiple precision. Experimental results show that the proposed model has the efficient performance and the robust scalability on the series length, the kNN query scale and the dataset size.



Published: 23 July 2016
CLC:  TP 39  
Cite this article:

CAI Qing lin, CHEN Ling, MEI Han lei, SUN Jian ling. Two-step filtering based time series similarity search. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(7): 1290-1297.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2016.07.010     OR     http://www.zjujournals.com/eng/Y2016/V50/I7/1290


基于两级过滤的时间序列近似查询

针对现有的近似查询模型对查询精度的可控性较差,后续处理效率较低的问题,提出基于两级过滤的查询模型.通过采用不同粒度的SAX表示方法提取时间序列的字符型特征向量,可以将高维的时间序列映射到低维的特征空间;将不同粒度的特征向量以向量近似文件(VA-File)的结构进行存储,有效引入了倒排索引.在查询过程中,设计了启发式的查询过滤算法,根据粗粒度特征向量查询细粒度特征向量,实现第一级过滤;针对VA-File设计了高效的边界剪枝算法,实现第二级过滤.模型基于多粒度的SAX特征向量进行构建,可以对查询精度进行有效控制;在第二级过滤中采用的边界剪枝算法可以有效地提高后续处理的执行效率.实验结果表明,提出的查询模型具有较高的性能,对时间序列长度、kNN查询规模及数据集规模具有稳定的扩展性.

[1] ESLING P, AGON C. Timeseries data mining [J]. ACM Computing Surveys, 2012, 45(1): 134.
[2] CONG F, CHEN J, DONG G, et al. Shorttime matrix series based singular value decomposition for rolling bearing fault diagnosis [J]. Mechanical Systems and Signal Processing, 2013, 34(1): 218-230.
[3] AGRAWAL R, FALOUTSOS C, SWAMI A. Efficient similarity search in sequence databases [J]. Foundations of Data Organization and Algorithms, 1993, 730(1): 69-84.
[4] CHAOVALIT P, GANGOPADHYAY A, KARABATIS G, et al. Discrete wavelet transformbased time series analysis and mining [J]. ACM Computing Surveys, 2011, 43(2): 137.
[5] KEOGH E, CHAKRABARTI K, PAZZANI M, et al. Dimensionality reduction for fast similarity search in large time series databases [J]. Knowledge and Information Systems, 2001, 3(3): 263-286.
[6] LIN J, KEOGH E, WEI L, et al. Experiencing SAX: a novel symbolic representation of time series [J]. Data Mining and Knowledge Discovery, 2007, 15(2): 107-144.
[7] FALOUTSOS C, RANGANATHAN M, MANOLOPOULOS Y. Fast subsequence matching in timeseries databases [C]∥Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1994: 419-429.
[8] ATHITSOS V, PAPAPETROU P, POTAMIAS M, et al. Approximate embeddingbased subsequence matching of time series [C]∥ Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 365-378.
[9] SHIEH J, KEOGH E. iSAX: diskaware mining and indexing of massive time series datasets [J]. Data Mining and Knowledge Discovery, 2009, 19(1): 24-57.
[10] CAMERRA A, SHIEH J, PALPANAS T, et al. Beyond one billion time series: indexing and mining very large time series collections with iSAX2+ [J]. Knowledge and Information Systems, 2013, 39(1): 129.
[11] LI Y, YIU M L, GONG Z. Discovering longestlasting correlation in sequence databases [J]. Proceedings of the VLDB Endowment, 2013, 6(14): 1666-1677.
[12] KEOGH E, CHAKRABARTI K, PAZZANI M, et al. Locally adaptive dimensionality reduction for indexing large time series databases [J]. ACM SIGMOD Record, 2001, 30(2): 151-162.
[13] KEOGH E, CHU S, HART D, et al. Segmenting time series: a survey and novel approach [J]. Data Mining in Time Series Databases, 2004, 57(1): 122.
[14] GULLO F, PONTI G, TAGARELLI A, et al. A time series representation model for accurate and fast similarity detection [J]. Pattern Recognition, 2009, 42(11): 2998-3014.
[15] PAPADIMITRIOU S, SUN J, FALOUTSOS C. Streaming pattern discovery in multiple timeseries [C]∥ Proceedings of the 31st International Conference on Very Large Data Bases. New York: ACM, 2005: 697-708.
[16] RAKTHANMANON T, CAMPANA B, MUEEN A, et al. Searching and mining trillions of time series subsequences under dynamic time warping [C]∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2012: 262-270.
[17] MANNING, CHRISTOPHER D, PRABHAKAR R, et al. Introduction to information retrieval [M]. Cambridge: Cambridge University Press, 2008.
[18] BLOTT S, WEBER R. A simple vectorapproximation file for similarity search in highdimensional vector spaces [R].Zurich: ETH Zentrum, 1997.
[19] YAHOO! Finance. [2015-05-01]. http:∥finance.yahoo.com/.
[20] Monte Carlo simulated stock price generator. [2015-05-01]. http:∥25yearsofprogramm- ing.com/.

[1] HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.
[2] WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.
[3] YOU Lu-jin, LU Xing-jian, HE Gao-qi. Research on sub-health in cloud environment[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1181-1189.
[4] YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.
[5] ZHANG Xin-xin, XU Ke, ZHONG Yi-Feng, SU Hui. Evolutionary game analysis on cooperative behaviors of  internet service providers[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1214-1224.
[6] LI Jian-li, DING Ding, LI Tao. Multi-objective hybrid cloud task scheduling using twice clustering[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1233-1241.
[7] BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.
[8] LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.
[9] MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.
[10] HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.
[11] JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.
[12] WANG Yu-xiang, LI Sheng-jie, WANG Hao, MA Jun-yi, WANG Ya-sha, ZHANG Da-qing. Survey on Wi-Fi based contactless activity recognition[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 648-654.
[13] GUO Tong, GUO Bin, ZHANG Jia-fan, YU Zhi-wen, ZHOU Xing-she. CrowdTravel: leveraging heterogeneous crowdsourced data for scenic spot profiling[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 663-668.
[14] WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.
[15] JING Yao, GUO Bin, WANG Zhu, YU Zhi-wen, ZHOU Xing-she. CrowdReview: personalized product review presentation based on crowd intelligence mining[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 675-681.