Please wait a minute...
J4  2009, Vol. 43 Issue (12): 2171-2177    DOI: 10.3785/j.issn.1008-973X.2009.12.008
    
Mining associated and item-item correlated frequent patterns
SHEN Bin1,2, YAO Min2
(1.Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, China;
2.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027,China)
Download:   PDF(839KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Frequent patterns mining and current correlated patterns mining cannot completely wipe off the suspicious cross-support patterns and the patterns containing two negative-correlated items. A new problem of mining associated and item-item correlated frequent patterns and its solution were proposed. A new correlated interest measure named all-item-confidence was presented, and its properties such as proper upper bound and lower bound, anti-monotone property were discussed. All-item-confidence was chosen to describe pattern’s item-item correlation, thus the patterns which contain two negative-correlated items can be filtered. Meanwhile, all-confidence was used to describe pattern’s association, and the suspicious cross-support patterns can be eliminated. Then the correlated definitions were given, and two mining algorithms, ItemCoMine_AP and ItemCoMine_CT, were presented. The performance of these two algorithms, the pruning capability of measures, and their practical effect in real retail dataset were also tested. These two algorithms perform well, all-confidence and all-item-confidence have the good pruning effect on eliminating suspicious patterns, and associated and item-item correlated frequent patterns have the good application value.



Published: 16 January 2010
CLC:  TP 311  
Cite this article:

CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns. J4, 2009, 43(12): 2171-2177.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2009.12.008     OR     http://www.zjujournals.com/eng/Y2009/V43/I12/2171


关联且项项正相关频繁模式挖掘

针对频繁模式和已有的相关模式不能完全去除交叉支持可疑模式和包含负相关商品项的可疑模式的问题,提出了关联且项项正相关频繁模式挖掘的新问题及其解决方案.阐述了一种新颖的all-item-confidence相关兴趣度量,探讨了该度量所具有的合适的上下界、反单调性等性质.选取all-item-confidence描述模式的项项正相关性,从而有效过滤包含负相关商品项的可疑模式;同时采用all-confidence描述模式的关联性,去除交叉支持可疑模式.进一步给出相关定义,提出两种挖掘算法:ItemCoMine_AP和ItemCoMine_CT,并对算法性能、度量减枝效果、实际零售数据集应用效果进行了测试. 实验结果表明,两种算法执行性能良好,all-confidence和all-item-confidence对可疑模式有明显的减枝效果,挖掘得到的关联且项项正相关模式具有较好的应用价值.


[1] XIONG Hui, TAN Pang-ning, KUMAR V. Hyperclique pattern discovery
[J]. Data Mining and Knowledge Discovery, 2006, 13(2): 219-242.

[2] LEE Young-koo, KIM Won-young, CAI Y D, et al. CoMine: efficient mining of correlated patterns
[C]∥ Proceedings of ICDM’03. Melbourne: IEEE, 2003: 581-584.

[3] KIM Won-young, LEE Young-koo, HAN Jia-wei. CCMine: Efficient mining of confidence-closed correlated patterns
[C]∥ Proceedings of PAKDD 2004. Sydney: Springer- Verlag, 2004: 569-579.

[4] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Mining both associated and correlated patterns
[C]∥ Proceedings of ICCS 2006. Reading: Springer-Verlag, 2006,4: 468-475.

[5] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining mutually and positively correlated patterns
[C]∥ Proceedings of ADMA 2006. Xi’an: Springer- Verlag, 2006: 118-125.

[6] TAN Pang-ning, KUMAR V, SRIVASTAVA J. Selecting the right interestingness measure for association patterns
[C]∥ Proceedings of the ACM SIGKDD’02. Edmonton: ACM, 2002: 32-41.

[7] SERGEY B, RAJEEV M, CRAIG S. Beyond market baskets: generalizing association rules to correlations
[C]∥ Proceedings of SIGMOD 1997, Tucson: ACM, 1997: 256-276.

[8] OMIECINSKI E R. Alternative interesting measures for mining associations
[J]. IEEE Trans. Knowledge and Data Engineering, 2003, 15: 57-69.

[9] ZHOU Zhong-mei, WU Zhao-hui, WANG Chun-shan, et al. Efficiently mining maximal frequent mutually associated patterns
[C]∥ Proceedings of ADMA 2006. Xi’an: Springer -Verlag, 2006: 110-117.

[10] SUCAHYO Y G, GOPALAN R. CT-PRO: A Bottom-up non recursive frequent itemset mining algorithm using compressed FP-Tree Data Structure
[C]∥ Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations 2004. Brighton:
[s.n.], 2004.

[1] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. J4, 2014, 48(2): 0-0.
[2] JIN Cang-hong, WU Ming-hui, YING Jing. A context-aware index based text extraction framework[J]. J4, 2013, 47(9): 1537-1546.
[3] ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data[J]. J4, 2013, 47(6): 990-999.
[4] FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm[J]. J4, 2013, 47(2): 223-230.
[5] LIU Ying, CHEN Ling, CHEN Gen-cai, ZHAO Jiang-qi, WANG Jing-chang. Approach for collection selection based on click-through data[J]. J4, 2013, 47(1): 23-28.
[6] YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation[J]. J4, 2012, 46(11): 2052-2060.
[7] XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords[J]. J4, 2012, 46(6): 974-979.
[8] ZHANG Li-ping, LI Song, HAO Xiao-hong, HAO Zhong-xiao. Jrv  rough Vague region relation[J]. J4, 2012, 46(1): 105-111.
[9] CHEN Ling, XU Xiao-long, YANG Qing, CHEN Gen-cai. Wireless signal strength propagation model
 base on cubic spline interpolation
[J]. J4, 2011, 45(9): 1521-1527.
[10] WU Ming-hui, YING Jing. Business process modeling and formal verification[J]. J4, 2011, 45(2): 280-287.
[11] FU Chao-yang, GAO Ji, ZHOU You-ming. Service discovery based on integrating lexical multi-level hashing
with subsumption semantics
[J]. J4, 2010, 44(12): 2274-2283.
[12] YANG Qing, CHEN Ling, CHEN Gen-Cai. Estimating walking distance based on single accelerometer[J]. J4, 2010, 44(9): 1681-1686.
[13] XIONG Wei, WANG Xiao-Tun. Method for mapping software dependability requirements
based on quality function deployment
[J]. J4, 2010, 44(5): 881-886.
[14] ZHANG Yin, HE Gao, DIAO Li-Na, ZHANG San-Yuan. Abstract state machine design of Internetware model[J]. J4, 2010, 44(5): 923-929.
[15] JIANG Chao, YING Jing, TUN Meng-Hui, et al. Feature increment oriented  approach for software product line analysis[J]. J4, 2009, 43(12): 2142-2148.