Please wait a minute...
J4  2012, Vol. 46 Issue (6): 974-979    DOI: 10.3785/j.issn.1008-973X.2012.06.003
计算机技术     
基于数据仓库内在特征的OLAP关键词查询
肖敏1,2, 陈岭1, 夏海元3, 陈根才1
1.浙江大学 计算机科学与技术学院, 浙江 杭州 310027;2.中国人民解放军75733部队, 广东 广州 510800;
3.浙江省公安厅, 浙江 杭州 310009
Data warehouse native feature based OLAP querying with keywords
XIAO Min1,2, CHEN Iing1, XIA Hai-yuan3, CHEN Gen-cai1
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. PLA 75733, Guangzhou 510800, China; 3. Department of State Security, Zhejiang Province, Hangzhou 310009, China
 全文: PDF  HTML
摘要:

 提出基于数据仓库多维模型维度及维度属性特征的联机分析处理(OLAP)关键词查询方法,将获得广泛应用的关键词查询方法和OLAP结合,可大大降低OLAP应用的门槛.该方法用列结构创建数据仓库维度全文索引,根据用户输入的查询获得按关键词划分的命中集,对命中集进行连接并排序后将产生的候选结果提供给用户.该方法基于对数据仓库用户更加关注概要性数据的特性,以及多维模型维度及其属性的非均衡型特征的分析,采用过滤不相关维度属性及重复维度列值的方法,并在传统的全文检索排序算法上增加维度层次权重系数.在MS SQL Server提供的FoodMart和AdventureWorks示例数据集上的实验对上述因素的影响做出了比较和分析,结果表明,首选候选结果命中率均优于基于关键词的分析处理方法.

Abstract:

 In order to support naive users to use on-line analytical processing (OLAP) tools, this work proposed a native feature of the dimension and its attributes for the multidimensional model of data warehouse based OLAP query method, which combined OLAP and search engine-alike methods. The method firstly created column based full-text index for dimension tables, secondly generated hit groups in terms of the keywords provided by the user, then constructed a candidate result set by joining these hit groups, and finally presented the ranked results to the user. Based on the features that a user cared much more about the aggregated data and the imbalance of the dimensions and their attributes, irrelative attributes and repeated column values were filtered in proposed method to relieve the negative effects on the ranking results. A weighted coefficient, named dimensional level coefficient (DLC) was also introduced to the text ranking arithmetic. Experiments were conducted on the FoodMart and AdventureWorks provided by Microsoft SQL Server to confirm how these factors influenced the hitting rate. The results indicated that the proposed method achieved a higher hitting rate than the keyword-driven analytical processing (KDAP) methods for the first candidate.

出版日期: 2012-07-24
:  TP 311  
基金资助:

国家“核高基”重大科技专项课题资助项目(2010ZX01042002003);国家自然科学基金资助项目(60703040);浙江省科技计划重大资助项目(2007C13019).

通讯作者: 陈岭,男,副教授.     E-mail: lingchen@cs.zju.edu.cn
作者简介: 肖敏(1978—),男,博士生,从事数据仓库、商业智能和数据挖掘研究,E-mail:xiaomin1978@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.

XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords. J4, 2012, 46(6): 974-979.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2012.06.003        http://www.zjujournals.com/eng/CN/Y2012/V46/I6/974

[1] BHALOTIA G, NAKHE C, HULGERI A, et al. Keyword searching and browsing in databases using banks[C]∥18th International Conference on Data Engineering. San Jose, CA: IEEE Computer Society, 2002.
[2] AGRAWAL S, CHAUDHURI S, DAS G. Dbxplorer: a system for keywordbased search over relational databases[C]∥18th International Conference on Data Engineering. San Jose, CA: IEEE Computer Society, 2002.
[3] HRISTIDIS V, PAPAKONSTANTINOU Y. Discover: keyword search in relational databases [C]∥Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, China: VLDB Endowment, 2002.
[4] HRISTIDIS V, GRAVANO L, PAPAKONSTANTINOU Y. Efficient irstyle keyword search over relational databases[C]∥Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, Germany: Morgan Kaufmann, 2003.
[5] BALMIN A, HRISTIDIS V, PAPAKONSTANTINOU Y. Objectrank: authoritybased keyword search in databases [C]∥Proceedings of the Thirtieth International Conference on Very Large Data BasesVolume 30.Toronto, Canada: VLDB Endowment, 2004.
[6] ARIYACHANDRA T, WATSON H. Key organizational factors in data warehouse architecture selection[J]. Decision Support Systems, 2010, 49(2): 200-212.
[7] SIFER M, LIN J, WATANOBE Y, et al. Integrating keyword search with multiple dimension tree views over a summary corpus data cube[C]∥ Proceedings of the 2010 International Conference on Management of Data.Indianapolis, Indiana: ACM, 2010.
[8] WU P, SISMANIS Y, REINWALD B. Towards keyworddriven analytical processing [C]∥Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data.Beijing, China: ACM, 2007.

[9] The Apache Software Foundation. Apache lucene [EB/OL].( 2009-09-25 ). http:∥lucene.apache.org/.
[10] The Apache Software Foundation. Lucene scoring [EB/OL].( 2006-08-20 ). http:∥lucene.apache.org/java/2_4_0/scoring.htm.
[11] RALPH KIMBALL M R. The data warehouse toolkit: the complete guide to dimensional modeling[M]. 2nd ed. New York: Wiley, 2002.
[12] Federal Aviation Administration. Collaborative decision making [EB/OL]. \
[20000512\]. http:∥cdm.fly.faa.gov/.
[13] JERBI H, RAVAT F, TESTE O, et al. Applying recommendation technology in olap systems[J]. Enterprise Information Systems, 2009, 22(1): 220-233.
[14] SARAWAGI S, AGRAWAL R, MEGIDDO N. Discoverydriven exploration of olap data cubes [C]∥Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology. London, UK: SpringerVerlag, 1998.
[15] GIACOMETTI A, MARCEL P, NEGRE E, et al. Query recommendations for olap discovery driven analysis [C]∥Proceeding of the ACM Twelfth International Workshop on Data Warehousing and OLAP. Hong Kong, China: ACM, 2009.

[1] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3] 朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.
[4] 冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.
[5] 刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[6] 殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.
[7] 张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[8] 陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[9] 吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[10] 傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[11] 杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[12] 熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[13] 张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[14] 蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.
[15] 沈斌, 姚敏. 关联且项项正相关频繁模式挖掘[J]. J4, 2009, 43(12): 2171-2177.