|
|
Mining associated and item-item correlated frequent patterns |
SHEN Bin1,2, YAO Min2 |
(1.Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China;
2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027,China) |
|
|
Abstract Frequent patterns mining and current correlated patterns mining cannot completely wipe off the suspicious cross-support patterns and the patterns containing two negative-correlated items. A new problem of mining associated and item-item correlated frequent patterns and its solution were proposed. A new correlated interest measure named all-item-confidence was presented, and its properties such as proper upper bound and lower bound, anti-monotone property were discussed. All-item-confidence was chosen to describe pattern’s item-item correlation, thus the patterns which contain two negative-correlated items can be filtered. Meanwhile, all-confidence was used to describe pattern’s association, and the suspicious cross-support patterns can be eliminated. Then the correlated definitions were given, and two mining algorithms, ItemCoMine_AP and ItemCoMine_CT, were presented. The performance of these two algorithms, the pruning capability of measures, and their practical effect in real retail dataset were also tested. These two algorithms perform well, all-confidence and all-item-confidence have the good pruning effect on eliminating suspicious patterns, and associated and item-item correlated frequent patterns have the good application value.
|
Published: 16 January 2010
|
|
关联且项项正相关频繁模式挖掘
针对频繁模式和已有的相关模式不能完全去除交叉支持可疑模式和包含负相关商品项的可疑模式的问题,提出了关联且项项正相关频繁模式挖掘的新问题及其解决方案.阐述了一种新颖的all-item-confidence相关兴趣度量,探讨了该度量所具有的合适的上下界、反单调性等性质.选取all-item-confidence描述模式的项项正相关性,从而有效过滤包含负相关商品项的可疑模式;同时采用all-confidence描述模式的关联性,去除交叉支持可疑模式.进一步给出相关定义,提出两种挖掘算法:ItemCoMine_AP和ItemCoMine_CT,并对算法性能、度量减枝效果、实际零售数据集应用效果进行了测试. 实验结果表明,两种算法执行性能良好,all-confidence和all-item-confidence对可疑模式有明显的减枝效果,挖掘得到的关联且项项正相关模式具有较好的应用价值.
|
|
[1] XIONG Hui, TAN Pang-ning, KUMAR V. Hyperclique pattern discovery [J]. Data Mining and Knowledge Discovery, 2006, 13(2): 219-242.
[2] LEE Young-koo, KIM Won-young, CAI Y D, et al. CoMine: efficient mining of correlated patterns [C]∥ Proceedings of ICDM’03. Melbourne: IEEE, 2003: 581-584.
[3] KIM Won-young, LEE Young-koo, HAN Jia-wei. CCMine: Efficient mining of confidence-closed correlated patterns [C]∥ Proceedings of PAKDD 2004. Sydney: Springer- Verlag, 2004: 569-579.
[4] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Mining both associated and correlated patterns [C]∥ Proceedings of ICCS 2006. Reading: Springer-Verlag, 2006,4: 468-475.
[5] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining mutually and positively correlated patterns [C]∥ Proceedings of ADMA 2006. Xi’an: Springer- Verlag, 2006: 118-125.
[6] TAN Pang-ning, KUMAR V, SRIVASTAVA J. Selecting the right interestingness measure for association patterns [C]∥ Proceedings of the ACM SIGKDD’02. Edmonton: ACM, 2002: 32-41.
[7] SERGEY B, RAJEEV M, CRAIG S. Beyond market baskets: generalizing association rules to correlations [C]∥ Proceedings of SIGMOD 1997, Tucson: ACM, 1997: 256-276.
[8] OMIECINSKI E R. Alternative interesting measures for mining associations [J]. IEEE Trans. Knowledge and Data Engineering, 2003, 15: 57-69.
[9] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining maximal frequent mutually associated patterns [C]∥ Proceedings of ADMA 2006. Xi’an: Springer -Verlag, 2006: 110-117.
[10] SUCAHYO Y G, GOPALAN R. CT-PRO: A Bottom-up non recursive frequent itemset mining algorithm using compressed FP-Tree Data Structure [C]∥ Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations 2004. Brighton: [s.n.], 2004.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|