Please wait a minute...
J4
计算机科学技术     
增量式频繁闭合序列挖掘算法
石怀东1,蔡铭1,吴洪森2,董金祥1,富浩1
(1.浙江大学 计算机科学与技术学院, 浙江 杭州 310027; 2.浙江警察学院, 浙江 杭州 310053)
An incremental algorithm for mining frequent closed patterns
SHI Huai-dong1, CAI Ming1, WU Hong-sen2, DONG Jin-xiang1, FU Hao1
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Police College, Hangzhou 310053, China
 全文: PDF(795 KB)   HTML
摘要:

在许多场合挖掘频繁闭合序列时,输入串数据库呈现实时动态增长的特点.分析Bide算法,给出并证明了闭合序列前缀中任意一个项目的后向扩展事件(BEE)项目交集随前缀的生长单调不增的定理,据此对BEE累计操作进行了优化,使其性能平均提高了48%.定义了闭合序列树作为频繁闭合序列的表示形式,并阐述了它的3个性质.分析发现,当新增输入串不同时包含前缀串和频繁项目时,两次连续挖掘的结果是相同的,给出了相应的定理和证明,据此实现了增量式频繁闭合序列挖掘算法BideInc.实验验证了BideInc算法的正确性,使用该算法后挖掘性能平均提高了47%.

Abstract:

While mining frequent closed patterns (FCP), the input sequence database dynamically increases in many situations. By analyzing Bide algorithm, the theorem of backward-extension event (BEE) detection was proposed and proved. It shows that the BEE set of any prefix item is non-increasing with the extension of the prefix. Based on the theorem, the accumulation performance of the BEE set was optimized by 4.8% averagely. The FCP tree was defined to represent the final result of FCP mining and its three characteristics were demonstrated. When the frequent item and the prefix are not coexistent in the new input sequence, the results of contiguous FCP mining are equal. And the corresponding theorem was proved. The BideInc algorithm was proposed to incrementally mine FCPs. The experiments validated the algorithm, and the performance was improved by 47% averagely.

出版日期: 2009-09-28
:  TP 311  
基金资助:

教育部新世纪优秀人才支持计划资助项目(NCET-06-051).

通讯作者: 蔡铭,男,副教授.     E-mail: cm@cs.zju.edu.cn
作者简介: 石怀东(1979-),男,浙江宁海人,博士生,从事普适计算、操作系统研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

石怀东,蔡铭,吴洪森,董金祥,富浩. 增量式频繁闭合序列挖掘算法[J]. J4, 10.3785/j.issn.1008-973X.2009..

DAN Fu-Dong, CA Ming, TUN Hong-Sen, DONG Jin-Xiang, FU Gao. An incremental algorithm for mining frequent closed patterns. J4, 10.3785/j.issn.1008-973X.2009..

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2009.        http://www.zjujournals.com/eng/CN/Y2009/V43/I8/1389


[1] WEISER M. The computer for the twenty-first century
[J]. Scientific American, 1991, 265(3): 94-104.


[2] NORMAN D. The invisible computer
[M]. Cambridge: MIT Press, 1999.


[3] AGRAWAL R, SRIKANT R. Mining sequential patterns
[C]∥ Proceedings of the 11th International Conference on Data Engineering. Taipei: IEEE, 1995: 3-14.


[4] SRIKANT R, AGRAWAL R. Mining sequential patterns: generalizations and performance improvements
[C]∥ Proceedings of the 5th International Conference on Extending Database Technology. Avignon: Springer, 1996: 3-17.


[5] ZAKI M J. SPADE: an efficient algorithm for mining frequent sequences
[J]. Machine Learning, 2001, 42(1): 31-60.


[6] PEI J, HAN J, MORTAZAVI-ASL B, et al. Prefix- Span: mining sequential patterns efficiently by prefix-projected pattern growth
[C]∥ Proceedings of the 17th International Conference on Data Engineering. Heidelberg: IEEE, 2001: 215-224.


[7] AYRES J, GEHRKE J, YIU T, et al. Sequential pattern mining using a bitmap representation
[C]∥ Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002: 429-435.


[8] PASQUIER N, BASTIDE Y, TAOUIL R, et al. Discovering frequent closed itemsets for association rules
[C]∥ Proceedings of the 7th International Conference on Database Theory. Jerusalem: Springer, 1999: 398-416.


[9] PEI J, HAN J, MAO R. CLOSET: an efficient algorithm for mining frequent closed itemsets
[C]∥ Proceedings of the 2000 ACM SIGMOD International Workshop Data Mining and Knowledge Discovery. Dallas: ACM, 2001: 11-20.


[10] WANG J, HAN J, PEI J. CLOSET+: searching for the best strategies for mining frequent closed itemsets
[C]∥ Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington: ACM, 2003: 236-245.


[11] ZAKI M J, HSIAO C. CHARM: an efficient algorithm for closed itemset mining
[C]∥ Proceedings of the Second SIAM International Conference on Data Mining. Arlington: SIAM, 2002: 457-473.


[12] HAN J, WANG J, LU Y, et al. Mining Top-K frequent closed patterns without minimum support
[C]∥ Proceedings of the 2002 IEEE International Conference on Data Mining. Maebashi: IEEE, 2002: 211-218.


[13] YAN X, HAN J, AFSHAR R, CloSpan: mining closed sequential patterns in large databases
[C]∥ Proceedings of the 3th SIAM International Conference on Data Mining. San Francisco: SIAM, 2003: 166-177.


[14] WANG J, HAN J. BIDE: efficient mining of frequent closed sequences
[C]∥ Proceedings of the 20th International Conference on Data Engineering. Boston: IEEE, 2004: 79-90.


[15] TZVETKOV P, YAN X, HAN J. TSP: mining top-k closed sequential patterns
[C]∥ Proceedings of the Third IEEE International Conference on Data Mining. Melbourne: IEEE, 2003: 347-354.


[16] 刘君强,孙晓莹,庄越挺,等. 挖掘闭合模式的高性能算法
[J]. 软件学报, 2004, 15(1): 94-102.

LIU Jun-qiang, SUN Xiao-ying, ZHUANG Yue-ting, et al. Mining frequent closed patterns by adaptive pruning
[J]. Journal of Software, 2004, 15(1): 94-102.


[17] MUNGUIA E T, STEPHEN S. Activity recognition in the home using simple and ubiquitous sensors
[C]∥ Proceedings of Pervasive Computing: Second International Conference. Vienna: Springer, 2004: 158-175. 

[1] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3] 朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.
[4] 冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.
[5] 刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[6] 殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.
[7] 肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.
[8] 张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[9] 陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[10] 吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[11] 傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[12] 杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[13] 熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[14] 张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[15] 蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.