Please wait a minute...
J4  2009, Vol. 43 Issue (12): 2171-2177    DOI: 10.3785/j.issn.1008-973X.2009.12.008
自动化技术、计算机技术     
关联且项项正相关频繁模式挖掘
沈斌1,2,姚敏2
(1.浙江大学 宁波理工学院,浙江 宁波 315100;2.浙江大学 计算机科学与技术学院,浙江 杭州 310027)
Mining associated and item-item correlated frequent patterns
SHEN Bin1,2, YAO Min2
(1.Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China;
2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027,China)
 全文: PDF(839 KB)   HTML
摘要:

针对频繁模式和已有的相关模式不能完全去除交叉支持可疑模式和包含负相关商品项的可疑模式的问题,提出了关联且项项正相关频繁模式挖掘的新问题及其解决方案.阐述了一种新颖的all-item-confidence相关兴趣度量,探讨了该度量所具有的合适的上下界、反单调性等性质.选取all-item-confidence描述模式的项项正相关性,从而有效过滤包含负相关商品项的可疑模式;同时采用all-confidence描述模式的关联性,去除交叉支持可疑模式.进一步给出相关定义,提出两种挖掘算法:ItemCoMine_AP和ItemCoMine_CT,并对算法性能、度量减枝效果、实际零售数据集应用效果进行了测试. 实验结果表明,两种算法执行性能良好,all-confidence和all-item-confidence对可疑模式有明显的减枝效果,挖掘得到的关联且项项正相关模式具有较好的应用价值.

Abstract:

Frequent patterns mining and current correlated patterns mining cannot completely wipe off the suspicious cross-support patterns and the patterns containing two negative-correlated items. A new problem of mining associated and item-item correlated frequent patterns and its solution were proposed. A new correlated interest measure named all-item-confidence was presented, and its properties such as proper upper bound and lower bound, anti-monotone property were discussed. All-item-confidence was chosen to describe pattern’s item-item correlation, thus the patterns which contain two negative-correlated items can be filtered. Meanwhile, all-confidence was used to describe pattern’s association, and the suspicious cross-support patterns can be eliminated. Then the correlated definitions were given, and two mining algorithms, ItemCoMine_AP and ItemCoMine_CT, were presented. The performance of these two algorithms, the pruning capability of measures, and their practical effect in real retail dataset were also tested. These two algorithms perform well, all-confidence and all-item-confidence have the good pruning effect on eliminating suspicious patterns, and associated and item-item correlated frequent patterns have the good application value.

出版日期: 2010-01-16
:  TP 311  
基金资助:

国家自然科学基金资助项目(60533040, 60525202,10876036,70871111);浙江省自然科学基金重点资助项目(Z104267);浙江大学宁波理工学院科研启动基金资助项目.

通讯作者: 姚敏,男,教授,博导.     E-mail: myao@zju.edu.cn
作者简介: 沈斌(1980-),男, 浙江上虞人,博士,从事决策分析、数据分析和挖掘等研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

沈斌, 姚敏. 关联且项项正相关频繁模式挖掘[J]. J4, 2009, 43(12): 2171-2177.

CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns. J4, 2009, 43(12): 2171-2177.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2009.12.008        http://www.zjujournals.com/eng/CN/Y2009/V43/I12/2171


[1] XIONG Hui, TAN Pang-ning, KUMAR V. Hyperclique pattern discovery
[J]. Data Mining and Knowledge Discovery, 2006, 13(2): 219-242.

[2] LEE Young-koo, KIM Won-young, CAI Y D, et al. CoMine: efficient mining of correlated patterns
[C]∥ Proceedings of ICDM’03. Melbourne: IEEE, 2003: 581-584.

[3] KIM Won-young, LEE Young-koo, HAN Jia-wei. CCMine: Efficient mining of confidence-closed correlated patterns
[C]∥ Proceedings of PAKDD 2004. Sydney: Springer- Verlag, 2004: 569-579.

[4] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Mining both associated and correlated patterns
[C]∥ Proceedings of ICCS 2006. Reading: Springer-Verlag, 2006,4: 468-475.

[5] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining mutually and positively correlated patterns
[C]∥ Proceedings of ADMA 2006. Xi’an: Springer- Verlag, 2006: 118-125.

[6] TAN Pang-ning, KUMAR V, SRIVASTAVA J. Selecting the right interestingness measure for association patterns
[C]∥ Proceedings of the ACM SIGKDD’02. Edmonton: ACM, 2002: 32-41.

[7] SERGEY B, RAJEEV M, CRAIG S. Beyond market baskets: generalizing association rules to correlations
[C]∥ Proceedings of SIGMOD 1997, Tucson: ACM, 1997: 256-276.

[8] OMIECINSKI E R. Alternative interesting measures for mining associations
[J]. IEEE Trans. Knowledge and Data Engineering, 2003, 15: 57-69.

[9] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining maximal frequent mutually associated patterns
[C]∥ Proceedings of ADMA 2006. Xi’an: Springer -Verlag, 2006: 110-117.

[10] SUCAHYO Y G, GOPALAN R. CT-PRO: A Bottom-up non recursive frequent itemset mining algorithm using compressed FP-Tree Data Structure
[C]∥ Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations 2004. Brighton:
[s.n.], 2004.

[1] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3] 朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.
[4] 冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.
[5] 刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[6] 殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.
[7] 肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.
[8] 张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[9] 陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[10] 吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[11] 傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[12] 杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[13] 熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[14] 张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[15] 蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.