Please wait a minute...
J4  2012, Vol. 46 Issue (6): 974-979    DOI: 10.3785/j.issn.1008-973X.2012.06.003
    
Data warehouse native feature based OLAP querying with keywords
XIAO Min1,2, CHEN Iing1, XIA Hai-yuan3, CHEN Gen-cai1
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. PLA 75733, Guangzhou 510800, China; 3. Department of State Security, Zhejiang Province, Hangzhou 310009, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

 In order to support naive users to use on-line analytical processing (OLAP) tools, this work proposed a native feature of the dimension and its attributes for the multidimensional model of data warehouse based OLAP query method, which combined OLAP and search engine-alike methods. The method firstly created column based full-text index for dimension tables, secondly generated hit groups in terms of the keywords provided by the user, then constructed a candidate result set by joining these hit groups, and finally presented the ranked results to the user. Based on the features that a user cared much more about the aggregated data and the imbalance of the dimensions and their attributes, irrelative attributes and repeated column values were filtered in proposed method to relieve the negative effects on the ranking results. A weighted coefficient, named dimensional level coefficient (DLC) was also introduced to the text ranking arithmetic. Experiments were conducted on the FoodMart and AdventureWorks provided by Microsoft SQL Server to confirm how these factors influenced the hitting rate. The results indicated that the proposed method achieved a higher hitting rate than the keyword-driven analytical processing (KDAP) methods for the first candidate.



Published: 24 July 2012
CLC:  TP 311  
Cite this article:

XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords. J4, 2012, 46(6): 974-979.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2012.06.003     OR     http://www.zjujournals.com/eng/Y2012/V46/I6/974


基于数据仓库内在特征的OLAP关键词查询

 提出基于数据仓库多维模型维度及维度属性特征的联机分析处理(OLAP)关键词查询方法,将获得广泛应用的关键词查询方法和OLAP结合,可大大降低OLAP应用的门槛.该方法用列结构创建数据仓库维度全文索引,根据用户输入的查询获得按关键词划分的命中集,对命中集进行连接并排序后将产生的候选结果提供给用户.该方法基于对数据仓库用户更加关注概要性数据的特性,以及多维模型维度及其属性的非均衡型特征的分析,采用过滤不相关维度属性及重复维度列值的方法,并在传统的全文检索排序算法上增加维度层次权重系数.在MS SQL Server提供的FoodMart和AdventureWorks示例数据集上的实验对上述因素的影响做出了比较和分析,结果表明,首选候选结果命中率均优于基于关键词的分析处理方法.

[1] BHALOTIA G, NAKHE C, HULGERI A, et al. Keyword searching and browsing in databases using banks[C]∥18th International Conference on Data Engineering. San Jose, CA: IEEE Computer Society, 2002.
[2] AGRAWAL S, CHAUDHURI S, DAS G. Dbxplorer: a system for keywordbased search over relational databases[C]∥18th International Conference on Data Engineering. San Jose, CA: IEEE Computer Society, 2002.
[3] HRISTIDIS V, PAPAKONSTANTINOU Y. Discover: keyword search in relational databases [C]∥Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, China: VLDB Endowment, 2002.
[4] HRISTIDIS V, GRAVANO L, PAPAKONSTANTINOU Y. Efficient irstyle keyword search over relational databases[C]∥Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, Germany: Morgan Kaufmann, 2003.
[5] BALMIN A, HRISTIDIS V, PAPAKONSTANTINOU Y. Objectrank: authoritybased keyword search in databases [C]∥Proceedings of the Thirtieth International Conference on Very Large Data BasesVolume 30.Toronto, Canada: VLDB Endowment, 2004.
[6] ARIYACHANDRA T, WATSON H. Key organizational factors in data warehouse architecture selection[J]. Decision Support Systems, 2010, 49(2): 200-212.
[7] SIFER M, LIN J, WATANOBE Y, et al. Integrating keyword search with multiple dimension tree views over a summary corpus data cube[C]∥ Proceedings of the 2010 International Conference on Management of Data.Indianapolis, Indiana: ACM, 2010.
[8] WU P, SISMANIS Y, REINWALD B. Towards keyworddriven analytical processing [C]∥Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data.Beijing, China: ACM, 2007.

[9] The Apache Software Foundation. Apache lucene [EB/OL].( 2009-09-25 ). http:∥lucene.apache.org/.
[10] The Apache Software Foundation. Lucene scoring [EB/OL].( 2006-08-20 ). http:∥lucene.apache.org/java/2_4_0/scoring.htm.
[11] RALPH KIMBALL M R. The data warehouse toolkit: the complete guide to dimensional modeling[M]. 2nd ed. New York: Wiley, 2002.
[12] Federal Aviation Administration. Collaborative decision making [EB/OL]. \
[20000512\]. http:∥cdm.fly.faa.gov/.
[13] JERBI H, RAVAT F, TESTE O, et al. Applying recommendation technology in olap systems[J]. Enterprise Information Systems, 2009, 22(1): 220-233.
[14] SARAWAGI S, AGRAWAL R, MEGIDDO N. Discoverydriven exploration of olap data cubes [C]∥Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology. London, UK: SpringerVerlag, 1998.
[15] GIACOMETTI A, MARCEL P, NEGRE E, et al. Query recommendations for olap discovery driven analysis [C]∥Proceeding of the ACM Twelfth International Workshop on Data Warehousing and OLAP. Hong Kong, China: ACM, 2009.

[1] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. J4, 2014, 48(2): 0-0.
[2] JIN Cang-hong, WU Ming-hui, YING Jing. A context-aware index based text extraction framework[J]. J4, 2013, 47(9): 1537-1546.
[3] ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data[J]. J4, 2013, 47(6): 990-999.
[4] FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm[J]. J4, 2013, 47(2): 223-230.
[5] LIU Ying, CHEN Ling, CHEN Gen-cai, ZHAO Jiang-qi, WANG Jing-chang. Approach for collection selection based on click-through data[J]. J4, 2013, 47(1): 23-28.
[6] YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation[J]. J4, 2012, 46(11): 2052-2060.
[7] ZHANG Li-ping, LI Song, HAO Xiao-hong, HAO Zhong-xiao. Jrv  rough Vague region relation[J]. J4, 2012, 46(1): 105-111.
[8] CHEN Ling, XU Xiao-long, YANG Qing, CHEN Gen-cai. Wireless signal strength propagation model
 base on cubic spline interpolation
[J]. J4, 2011, 45(9): 1521-1527.
[9] WU Ming-hui, YING Jing. Business process modeling and formal verification[J]. J4, 2011, 45(2): 280-287.
[10] FU Chao-yang, GAO Ji, ZHOU You-ming. Service discovery based on integrating lexical multi-level hashing
with subsumption semantics
[J]. J4, 2010, 44(12): 2274-2283.
[11] YANG Qing, CHEN Ling, CHEN Gen-Cai. Estimating walking distance based on single accelerometer[J]. J4, 2010, 44(9): 1681-1686.
[12] XIONG Wei, WANG Xiao-Tun. Method for mapping software dependability requirements
based on quality function deployment
[J]. J4, 2010, 44(5): 881-886.
[13] ZHANG Yin, HE Gao, DIAO Li-Na, ZHANG San-Yuan. Abstract state machine design of Internetware model[J]. J4, 2010, 44(5): 923-929.
[14] JIANG Chao, YING Jing, TUN Meng-Hui, et al. Feature increment oriented  approach for software product line analysis[J]. J4, 2009, 43(12): 2142-2148.
[15] CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns[J]. J4, 2009, 43(12): 2171-2177.