Please wait a minute...
J4  2012, Vol. 46 Issue (11): 2052-2060    DOI: 10.3785/j.issn.1008-973X.2012.11.017
计算机技术     
基于CQPM的OLAP查询日志挖掘及推荐
殷婷1,肖敏1,陈岭1,赵江奇2,王敬昌2
1.浙江大学 计算机科学与技术学院, 浙江 杭州 310027;2.浙江鸿程计算机系统有限公司, 浙江 杭州 310009
CQPM based OLAP query log mining and recommendation
YIN Ting1, XIAO Min1, CHEN Ling1, ZHAO Jiang-qi2, WANG Jing-chang2
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2.Zhejiang Hongcheng Computer Systems Company Limited, Hangzhou 310009, China
 全文: PDF  HTML
摘要:

为提高用户的使用效率,提出基于连续查询模式挖掘(CQPM)算法的联机分析处理(OLAP)查询日志挖掘及推荐方法.CQPM算法在双向扩展频繁闭合序列模式挖掘算法(BIDE)的基础上加入查询之间的间隔约束,确保查询模式的连续性.提出方法通过基于查询后缀树的模糊查询模式匹配(AQPM)算法预测用户下一步有效查询,并将预测结果按概率大小排序后推荐给用户.通过8名OLAP分析人员在Mondrian OLAP服务器上的查询日志对提出方法进行性能评价,结果表明,相较基于prefixspan的改进算法,采用CQPM算法能够去除数量庞大的冗余的查询模式,相较基本的前缀匹配算法,AQPM算法能够提高推荐的准确率.

Abstract:

In order to improve the efficiency of use, a continuous query pattern mining (CQPM) based online analytical processing (OLAP) log mining and recommendation method was proposed. CQPM was based on a closed sequential pattern mining algorithm ,called BI-directional extension (BIDE), which added interval constraint between queries to ensure their continuity. A query suffix tree based approximate query pattern matching (AQPM) algorithm was also developed to predict the next effective query of users, and the prediction result, ranked by the magnitude of probabilities, was exploited to do recommendation for users. The performance of the proposed query recommendation method was evaluated with the query logs of 8 OLAP analysts, which were recorded by Mondrian OLAP server. The results show that compared to prefixspan based algorithm, CQPM is able to get rid of lots of redundant query patterns. Compared to the basic prefix matching method, AQPM increases the accuracy of recommendation.

出版日期: 2012-12-11
:  TP 311  
基金资助:

国家“核高基”科技重大专项课题资助项目(2010ZX01042-002-003),国家自然科学基金资助项目(60703040),浙江省科技计划重大资助项目(2007C13019),浙江省重大科技专项资助项目(2011C13042),杭州市重大科技创新专项资助项目(20112311A20).

通讯作者: 陈岭,男,副教授.     E-mail: lingchen@cs.zju.edu.cn
作者简介: 殷婷(1987-),女,硕士生,从事数据仓库、商业智能和数据挖掘研究.E-mail:yintingeye@163.com.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.

YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation. J4, 2012, 46(11): 2052-2060.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2012.11.017        http://www.zjujournals.com/eng/CN/Y2012/V46/I11/2052

[1] HAN J , KAMBER M. Data mining: concepts and techniques [M]. San Francisco, CA: Morgan Kaufmann, 2006.
[2] RESNICK P, IACOVOU N, SUSHAK M, et al. GroupLens an open architecture for collaborative filtering of netnews [C] ∥ Proceedings of ACM CSCW 1994. Chapel Hill, North Carolina: ACM New York, 1994: 175-186.
[3] HILL W, STEAD L, ROSENSTEIN M, et al. Recommending and evaluating choices in a virtual community of use [C] ∥ Proceedings of SIGCHI 1995. Denver, Colorado: ACM New York, 1995: 194-201.
[4] SHARDANAND U, MAES P. Social information filtering: algorithms for automating “word of mouth” [C] ∥ Proceedings of SIGCHI 1995. Denver, Colorado: ACM New York, 1995: 210-217.
[5] BAEZAYATES R, RIBEIRONETO B. Modern information retrieval [M]. New York: ACM Press, 1999.
[6] MURTHI B P S, SARKAR S. The role of the management sciences in research on personalization [J]. Management Science, 2003, 49(10): 1344-1362.
[7] HAUSER W J. Marketing analytics: the evolution of marketing research in the twentyfirst century [J]. Direct Marketing: An International Journal, 2007, 1(1): 38-54.
[8] SRIVASTAVA J, COOLEY R, DESHPANDE M, et al. Web usage mining: discovery and applications of usage patterns from web data [J]. SIGKDD Explorer Newsletter, 2000, 1(2): 12-23.
[9] GIACOMETTI A, MARCEL P, NEGRE E. A framework for recommending OLAP queries[C] ∥ Proceedings of ACM DOLAP 2008. Napa Valley, California: ACM New York, 2008: 73-80.

[10] GIACOMETTI A, MARCEL P, NEGRE E. Recommending multidimensional queries[C] ∥ Proceedings of DaWaK 2009. Linz, Austria,:Springer Berlin / Heidelberg, 2009:453-466.
[11] 陈元中. 基于数据挖掘的OLAP智能查询推荐技术研究[D]. 杭州: 浙江大学 2010.
CHEN Yuanzhong. Data mining based OLAP intelligent query recommendation [D]. Hangzhou: Zhejiang University, 2010.
[12] ZHOU B, JIANG D, PEI J, et al. OLAP on search logs: an infrastructure supporting datadriven applications in search engines[C] ∥ Proceedings of ACM Sigkdd Kdd 2009. Paris, France: ACM New York, 2009: 1395-1404.
[13] WANG J, HAN J. BIDE: Efficient mining of frequent closed sequences[C] ∥ Proceedings of ICDE 2004. Boston, MA: IEEE, 2004. 79-90.
[14] CAO H, MAMOULIS N, CHEUNG D W. Mining frequent spatiotemporal sequential patterns[C] ∥ Proceedings of IEEE ICDM 2005. Houston, Texas:IEEE, 2005: 82-89.
[15] Apache logging service log4j[EB/OL]. [20121016]. http:∥logging.apache.org/log4j/1.2/.

[1] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3] 朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.
[4] 冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.
[5] 刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[6] 肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.
[7] 张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[8] 陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[9] 吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[10] 傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[11] 杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[12] 熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[13] 张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[14] 沈斌, 姚敏. 关联且项项正相关频繁模式挖掘[J]. J4, 2009, 43(12): 2171-2177.
[15] 蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.