Please wait a minute...
J4  2013, Vol. 47 Issue (2): 223-230    DOI: 10.3785/j.issn.1008-973X.2013.02.005
    
Strategies of efficiency improvement for Eclat algorithm
FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin
State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

For the purpose of efficiency improvement, Eclat algorithm was optimized in three aspects-pruning, itemsets connection and intersection. Firstly, the equivalence classes were divided in the suffix-based way to make the best of pruning in which a double layer hash table was utilized to accelerate the search process of subsets of candidate itemsets. Secondly, a partition list of the set of itemsets was presented to eliminate the connection judgment of itemsets. Finally, a transaction id (Tid) lost threshold was introduced to speed up intersection. Based on the above three improvement strategies an Eclat_opt algorithm was proposed. The performance comparison between the Eclat_opt algorithm, the original Eclat algorithm (ZAKI) and two other improved Eclat algorithms Diffset(ZAKI), hEclat (XIONG Zhong-yang) showed that the efficiency of the Eclat_opt algorithm ranked the first among the four algorithms on sparse datasets, and its overall time performance was the best.



Published: 01 February 2013
CLC:  TP 311  
Cite this article:

FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm. J4, 2013, 47(2): 223-230.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2013.02.005     OR     http://www.zjujournals.com/eng/Y2013/V47/I2/223


提高Eclat算法效率的策略

为了提高Eclat算法的效率,从剪枝、项集连接和交叉计数3方面对Eclat算法进行优化.将后缀相同的项集归为一个等价类,使剪枝更充分,剪枝时引入双层哈希表加快搜索候选项集子集的速度;提出项集集合划分链表,以减少项集连接过程中比较判断的环节;提出事务标识(Tid)失去阈值,以加快交叉计数的速度.在此基础上提出一种优化的Eclat_opt算法(ZAKI),把它与Eclat原算法以及其他2种Eclat改进算法Diffset (ZAKI), hEclat(熊忠阳)进行对比实验的结果表明,Eclat_opt算法的效率在稀疏数据集上最高,总体时间性能最好.

[1] AGRAWAL R, SRIKANT R. Fast Algorithms for mining association rules [C]∥ Proceedings of 20th International Conference on Very Large Data Bases. Santiago, Chile: Morgankaufman, 1994: 487-499.
[2] HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation [C]∥ Proceedings of the 2000 ACM Data. Dallas, United States: ACM, 2000: 1-12.
[3] FENG Pei-en, ZHANG Hui, QIU Qing-ying, et al. PCAR: an efficient approach for mining association rules [C]∥ Proceedings of the ICNC-FSKD 2008 International Conference on Fussy Systems and Knowledge Discovery. Jinan: IEEE, 2008: 605-609.
[4] ZAKI M J. Scalable algorithms for association mining[J]. IEEE Transactions on Knowledge and Data Engineering, 2000,12(3): 372-390.
[5] 宋长新, 马克. 改进的Eclat数据挖掘算法的研究[J]. 微计算机信息, 2008, 24(8): 92-94
SONG Chang-xin, MA Ke. Research on the improved eclat data mining algorithm [J]. Control & Automation, 2008, 24(8): 92-94.
[6] ZAKI M J. Fast vertical mining using diffsets [R].Technical Report 01-1, Troy, New York: Rensselaer Polytechnic Institute. 2001.
[7] 熊忠阳, 陈培恩, 张玉芳. 基于散列布尔矩阵的关联规则Eclat改进算法[J]. 计算机应用研究, 2010, 27(4): 1323-1325.
XIONG Zhong-yang, CHEN Pei-en, ZHANG Yu-fang. Improvement of Eclat algorithm for association rules based on hash Boolean matrix [J]. Application Research of Computers, 2010, 27(4): 1323-1325.
[8] 李敏, 李春平. 频繁模式挖掘算法分析和比较[J]. 计算机应用, 2005, 25: 166-171.
LI Min, LI Chun-ping. Analysis and Comparison of frequent patterns mining algorithms [J]. Journal of Computer Applications, 2005, 25: 166-171.
[9] HAN J, KAMBE M. Data mining: concepts and Techniques [M]. San Francisco, United States: Morgan Kaufmann Publishers Inc, 2001: 231.
[10] 刘井莲. Eclat与Eclat+算法的比较分析[J]. 绥化学院学报, 2010, 30(2): 189-190.
LIU Jing-lian. Comparative Analysis of Eclat Algorithm and Eclat+ Algorithm [J]. Journal of Suihua University, 2010, 30(2): 189-190.
[11] GOETHALS B. Frequent itemset mining dataset repository [EB/OL]. [2004-12-2]. http:∥fimi.ua.ac.be/data/

[1] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. J4, 2014, 48(2): 0-0.
[2] JIN Cang-hong, WU Ming-hui, YING Jing. A context-aware index based text extraction framework[J]. J4, 2013, 47(9): 1537-1546.
[3] ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data[J]. J4, 2013, 47(6): 990-999.
[4] LIU Ying, CHEN Ling, CHEN Gen-cai, ZHAO Jiang-qi, WANG Jing-chang. Approach for collection selection based on click-through data[J]. J4, 2013, 47(1): 23-28.
[5] YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation[J]. J4, 2012, 46(11): 2052-2060.
[6] XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords[J]. J4, 2012, 46(6): 974-979.
[7] ZHANG Li-ping, LI Song, HAO Xiao-hong, HAO Zhong-xiao. Jrv  rough Vague region relation[J]. J4, 2012, 46(1): 105-111.
[8] CHEN Ling, XU Xiao-long, YANG Qing, CHEN Gen-cai. Wireless signal strength propagation model
 base on cubic spline interpolation
[J]. J4, 2011, 45(9): 1521-1527.
[9] WU Ming-hui, YING Jing. Business process modeling and formal verification[J]. J4, 2011, 45(2): 280-287.
[10] FU Chao-yang, GAO Ji, ZHOU You-ming. Service discovery based on integrating lexical multi-level hashing
with subsumption semantics
[J]. J4, 2010, 44(12): 2274-2283.
[11] YANG Qing, CHEN Ling, CHEN Gen-Cai. Estimating walking distance based on single accelerometer[J]. J4, 2010, 44(9): 1681-1686.
[12] XIONG Wei, WANG Xiao-Tun. Method for mapping software dependability requirements
based on quality function deployment
[J]. J4, 2010, 44(5): 881-886.
[13] ZHANG Yin, HE Gao, DIAO Li-Na, ZHANG San-Yuan. Abstract state machine design of Internetware model[J]. J4, 2010, 44(5): 923-929.
[14] JIANG Chao, YING Jing, TUN Meng-Hui, et al. Feature increment oriented  approach for software product line analysis[J]. J4, 2009, 43(12): 2142-2148.
[15] CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns[J]. J4, 2009, 43(12): 2171-2177.