提高Eclat算法效率的策略

doi:10.3785/j.issn.1008-973X.2013.02.005

2013, Vol. 47

Issue (2): 223-230 DOI: 10.3785/j.issn.1008-973X.2013.02.005

机械工程

提高Eclat算法效率的策略

冯培恩, 刘屿, 邱清盈, 李立新

浙江大学 CAD&CG国家重点实验室,浙江杭州 310027

Strategies of efficiency improvement for Eclat algorithm

FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin

State Key Laboratory of CAD&CG, Zhejiang University, Hangzhou, 310027, China

全文: PDF HTML

摘要：

为了提高Eclat算法的效率,从剪枝、项集连接和交叉计数3方面对Eclat算法进行优化.将后缀相同的项集归为一个等价类,使剪枝更充分,剪枝时引入双层哈希表加快搜索候选项集子集的速度；提出项集集合划分链表,以减少项集连接过程中比较判断的环节；提出事务标识（Tid）失去阈值,以加快交叉计数的速度.在此基础上提出一种优化的Eclat_opt算法(ZAKI),把它与Eclat原算法以及其他2种Eclat改进算法Diffset (ZAKI), hEclat(熊忠阳）进行对比实验的结果表明,Eclat_opt算法的效率在稀疏数据集上最高,总体时间性能最好.

Abstract:

For the purpose of efficiency improvement, Eclat algorithm was optimized in three aspects-pruning, itemsets connection and intersection. Firstly, the equivalence classes were divided in the suffix-based way to make the best of pruning in which a double layer hash table was utilized to accelerate the search process of subsets of candidate itemsets. Secondly, a partition list of the set of itemsets was presented to eliminate the connection judgment of itemsets. Finally, a transaction id (Tid) lost threshold was introduced to speed up intersection. Based on the above three improvement strategies an Eclat_opt algorithm was proposed. The performance comparison between the Eclat_opt algorithm, the original Eclat algorithm (ZAKI) and two other improved Eclat algorithms Diffset(ZAKI), hEclat (XIONG Zhong-yang) showed that the efficiency of the Eclat_opt algorithm ranked the first among the four algorithms on sparse datasets, and its overall time performance was the best.

出版日期: 2013-02-01

TP 311

基金资助:

国家自然科学基金资助项目（51175455）；浙江省自然科学基金资助项目（Y1100257）.

通讯作者: 邱清盈,男,副教授. E-mail: medesign@zju.edu.cn

作者简介: 冯培恩（1943—）,男,教授,博导,主要从事工程设计理论与方法学,智能设计,机械设计知识挖掘等方面的研究.E-mail: fpe@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

引用本文:

冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.

FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm. J4, 2013, 47(2): 223-230.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2013.02.005 或 http://www.zjujournals.com/eng/CN/Y2013/V47/I2/223

［1］ AGRAWAL R, SRIKANT R. Fast Algorithms for mining association rules ［C］∥ Proceedings of 20th International Conference on Very Large Data Bases. Santiago, Chile: Morgankaufman, 1994: 487-499.
［2］ HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation ［C］∥ Proceedings of the 2000 ACM Data. Dallas, United States： ACM, 2000: 1-12.
［3］ FENG Pei-en, ZHANG Hui, QIU Qing-ying, et al. PCAR: an efficient approach for mining association rules ［C］∥ Proceedings of the ICNC-FSKD 2008 International Conference on Fussy Systems and Knowledge Discovery. Jinan： IEEE, 2008: 605-609.
［4］ ZAKI M J. Scalable algorithms for association mining［J］. IEEE Transactions on Knowledge and Data Engineering, 2000,12(3): 372-390.
［5］宋长新, 马克. 改进的Eclat数据挖掘算法的研究［J］. 微计算机信息, 2008, 24(8): 92-94
SONG Chang-xin, MA Ke. Research on the improved eclat data mining algorithm ［J］. Control & Automation, 2008, 24(8): 92-94.
［6］ ZAKI M J. Fast vertical mining using diffsets ［R］.Technical Report 01-1, Troy, New York: Rensselaer Polytechnic Institute. 2001.
［7］熊忠阳, 陈培恩, 张玉芳. 基于散列布尔矩阵的关联规则Eclat改进算法［J］. 计算机应用研究, 2010, 27(4): 1323-1325.
XIONG Zhong-yang, CHEN Pei-en, ZHANG Yu-fang. Improvement of Eclat algorithm for association rules based on hash Boolean matrix ［J］. Application Research of Computers, 2010, 27(4): 1323-1325.
［8］李敏, 李春平. 频繁模式挖掘算法分析和比较［J］. 计算机应用, 2005, 25: 166-171.
LI Min, LI Chun-ping. Analysis and Comparison of frequent patterns mining algorithms ［J］. Journal of Computer Applications, 2005, 25: 166-171.
［9］ HAN J, KAMBE M. Data mining: concepts and Techniques ［M］. San Francisco, United States: Morgan Kaufmann Publishers Inc, 2001: 231.
［10］刘井莲. Eclat与Eclat+算法的比较分析［J］. 绥化学院学报, 2010, 30(2): 189-190.
LIU Jing-lian. Comparative Analysis of Eclat Algorithm and Eclat+ Algorithm ［J］. Journal of Suihua University, 2010, 30(2): 189-190.
［11］ GOETHALS B. Frequent itemset mining dataset repository ［EB/OL］. ［2004-12-2］. http:∥fimi.ua.ac.be/data/

[1]	柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2]	金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3]	朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.
[4]	刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[5]	殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.
[6]	肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.
[7]	张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[8]	陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[9]	吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[10]	傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[11]	杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[12]	熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[13]	张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[14]	蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.
[15]	沈斌, 姚敏. 关联且项项正相关频繁模式挖掘[J]. J4, 2009, 43(12): 2171-2177.

Viewed

Full text

Abstract

Cited

Shared

Discussed