Please wait a minute...
J4  2012, Vol. 46 Issue (5): 858-865    DOI: 10.3785/j.issn.1008-973X.2012.05.014
    
A new algorithm for frequency tendency prediction over data streams
GUO Li-chao1, SU Hong-ye1, GOU Qian-wen2
1. Institute of Cyber-System and Control, Zhejiang University, Hangzhou 310027, China;
2. College of Public Administration, Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

For the frequency tendency prediction problem of itemsets over streams, a novel max-min-frequency tendency prediction (MMFTP) algorithm is proposed based on the Max-Min Frequency Window model. A max-min-frequency pattern Tree (MMFP-Tree) structure is established to store the summary information of streams a new measure frequency changing rate (FCR) is presented to describe the frequency tendency of itemsets quantitatively. The MM-FTP algorithm is useful in the index tendency prediction  and the confidence prediction  of classification. Based on the result of the case study on web log data stream, the MM-FTP algorithm could be used to predict the frequency tendency efficiently and effectively.



Published: 01 May 2012
CLC:  TP 311.13  
Cite this article:

GUO Li-chao, SU Hong-ye, GOU Qian-wen. A new algorithm for frequency tendency prediction over data streams. J4, 2012, 46(5): 858-865.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2012.05.014     OR     http://www.zjujournals.com/eng/Y2012/V46/I5/858


一种新的数据流频繁度变化趋势预测算法

针对数据对象在数据流中的频繁度变化趋势的预测问题,提出基于最大最小频率时间窗模型的最大最小频繁趋势预测算法(MM-FTP).设计一种新的最大最小频繁模式树结构(MMFP-Tree),存储数据流概要信息;提出一种新的数据对象频繁度变化趋势衡量指标——频繁度变化率(FCR),定量地对数据对象的频繁度变化趋势进行描述.该算法同样能够对数据流分类置信度变化趋势及传统的指数变化趋势进行有效预测.结果表明,在真实的网络点击数据流上,该算法能够快速准确地预测数据对象的频繁度变化趋势.

[1] MELEK W W, LU Z, KAPPS A, et al. Comparison of trend detection algorithms in the analysis of physiological timeseries data [J]. IEEE Transactions on Biomedical Engineering, 2005, 52(4): 639-651.
[2] ELFEKY M G, AREF W G, ELMAGARMID A K. Periodicity detection in time series databases [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(7): 875-887.
[3] ZHU Y, SHASHA D. StatStream: Statistical monitoring of thousands of data streams in real time [C]. In Proceedings of International Conference on Very Large Data Bases. HongKong: VLDB Endowment, 2002: 358-369.
[4] KIFER D, BENDAVID S, GEHRKE J. Detecting change in data streams [C]∥In Proc. of the 30th Int. Conf. on Very Large Data Bases. Toronto: VLDB Endowment, 2004: 180-191.
[5] 宋国杰, 唐世渭, 杨冬青, 等. 数据流中异常模式的提取与趋势监测[J]. 计算机研究与发展, 2004, 41(10): 1754-1759.
SONG Guojie, TANG Shiwei, YANG Dongqing, et al. Extraction and trend detection of unusual patterns over data streams [J]. Journal of Computer Research and Development, 2004, 41(10):1754-1759.
[6] 周黔, 吴铁军. 一种动态数据流的实时趋势分析算法[J]. 控制与决策, 2008, 23(10): 1182-1185, 1191.
ZHOU Qian, WU Tiejun. Realtime algorithm for trend analysis of dynamic data streams [J]. Control and Decision, 2008, 23(10): 1182-1185, 1191.
[7] CALDERS T, DEXTERS N, GOETHALS B, Mining frequent items in a stream using flexible windows [J]. Intelligent Data Analysis, 2008, 12(3): 293-304.
[8] GUO Lichao, SU Hongye, QU Yu. A new algorithm for mining global frequent itemsets in a stream [C]∥ In Proc. of the 6th Int. Conf. on Fuzzy Systems and Knowledge Discovery. Tianjin: IEEE Computer Society, 2009: 232-238.
[9] FRANK A, ASUNCION A. UCI machine learning Repository[EB/OL]. [2010 ]http:∥archive.ics.uci.edu/ml.                                                            

[1] WU Yu, SHOU Li-dan, CHEN Gang. CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap[J]. J4, 2012, 46(3): 377-385.
[2] HONG Yin-jie, CHEN Gang, CHEN Ke. Set similarity join using partition index[J]. J4, 2012, 46(2): 286-293.
[3] ZHOU Jia-qing, WU Yu, JIANG Jin-hua, CHEN Gang, DONG Yi. Object cache optimization strategy for real-time vertical search engine[J]. J4, 2011, 45(1): 14-19.
[4] JIANG Jin-hua, WU Yu, HU Tian-lei, CHEN Gang. Efficient processing of complex XML twig pattern queries
based on path-joins
[J]. J4, 2011, 45(1): 1-8.