Please wait a minute...
J4  2013, Vol. 47 Issue (1): 23-28    DOI: 10.3785/j.issn.1008-973X.2013.01.004
    
Approach for collection selection based on click-through data
LIU Ying1, CHEN Ling1, CHEN Gen-cai1, ZHAO Jiang-qi2, WANG Jing-chang2
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; 2.Zhejiang Hongcheng
Computer Systems Company Limited, Hangzhou 310009, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An approach of collection selection based on click-through data (PCTD-CS) was proposed considering that collections have different contributions to the final retrieval results. Click-through data of past queries were utilized for estimating the relevance of each collection to the query. A term-based and results-based mixed approach was used to estimate the similarity between queries. Past similar queries were used to predict the relevance of collections to a specific user query. Then M collections with the highest relevance were selected for retrieving, and the number of documents each collection returned was determined when top k ranked results were required. Rm, P@n and MAP were used to verify the effectiveness of the new collection selection method. Experimental results demonstrated that PCTD-CS improved the accuracy and recall of search results. PCTD-CS was better at selecting collections with more relevant documents.



Published: 01 January 2013
CLC:  TP 311  
Cite this article:

LIU Ying, CHEN Ling, CHEN Gen-cai, ZHAO Jiang-qi, WANG Jing-chang. Approach for collection selection based on click-through data. J4, 2013, 47(1): 23-28.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2013.01.004     OR     http://www.zjujournals.com/eng/Y2013/V47/I1/23


基于历史点击数据的集合选择方法

针对分布式信息检索时不同信息集对最终检索结果贡献度有差异的现象,提出基于历史点击数据的集合选择方法(PCTD-CS).该方法利用点击数据估计各集合与历史查询的相关度.采用基于关键词和基于检索结果相结合的方法估计查询间的相似度.利用历史查询中的相似查询估计新查询与各集合的相关度,选择相关度最高的M个集合进行检索,给出要获取前k个文档的情况下各集合应当返回的文档数.采用召回率Rm、前n个检索结果的准确率P@n及平均准确率MAP对集合选择方法的性能进行验证.实验结果表明,采用PCTD-CS方法提高了检索结果的召回率和准确率,能够更准确地定位到包含相关文档多的集合.

[1] CALLAN J. Distributed information retrieval [M]. USA: Kluwer Academic Publishes, 2000: 127-150.
[2] CALLAN J, LU Z, CROFT W B. Searching distributed collection with inference networks [C] ∥ Proceeding of ACM SIGIR. Seattle, Washington: ACM, 1995: 21-28.
[3] SI L, JIN R, ALLAN J. et al. A language modeling framework for resource selection and results merging [C] ∥ Proceeding of ACM CIKM. McLean, Virginia: ACM, 2002: 391-397.
[4] SI L, CALLAN J. Relevant document distribution estimation method for resource selection [C] ∥ Proceeding of ACM SIGIR. Toronto, Canada: ACM, 2003: 298-305.
[5] RASOLOFO Y, ABBACI F, SAVOY J. Approaches to collection selection and results merging for distributed information retrieval [C] ∥ Proceeding of ACM CIKM. Atlanta: ACM, 2001: 191-198.
[6] PUPPIN D, SILVESTRI F, LAFORENZA D. Query-driven document partitioning and collection selection [C] ∥ Proceeding of the 1st INFOSCALE Conference. Hong Kong: ACM, 2006: Article 34.

[1] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. J4, 2014, 48(2): 0-0.
[2] JIN Cang-hong, WU Ming-hui, YING Jing. A context-aware index based text extraction framework[J]. J4, 2013, 47(9): 1537-1546.
[3] ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data[J]. J4, 2013, 47(6): 990-999.
[4] FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm[J]. J4, 2013, 47(2): 223-230.
[5] YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation[J]. J4, 2012, 46(11): 2052-2060.
[6] XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords[J]. J4, 2012, 46(6): 974-979.
[7] ZHANG Li-ping, LI Song, HAO Xiao-hong, HAO Zhong-xiao. Jrv  rough Vague region relation[J]. J4, 2012, 46(1): 105-111.
[8] CHEN Ling, XU Xiao-long, YANG Qing, CHEN Gen-cai. Wireless signal strength propagation model
 base on cubic spline interpolation
[J]. J4, 2011, 45(9): 1521-1527.
[9] WU Ming-hui, YING Jing. Business process modeling and formal verification[J]. J4, 2011, 45(2): 280-287.
[10] FU Chao-yang, GAO Ji, ZHOU You-ming. Service discovery based on integrating lexical multi-level hashing
with subsumption semantics
[J]. J4, 2010, 44(12): 2274-2283.
[11] YANG Qing, CHEN Ling, CHEN Gen-Cai. Estimating walking distance based on single accelerometer[J]. J4, 2010, 44(9): 1681-1686.
[12] XIONG Wei, WANG Xiao-Tun. Method for mapping software dependability requirements
based on quality function deployment
[J]. J4, 2010, 44(5): 881-886.
[13] ZHANG Yin, HE Gao, DIAO Li-Na, ZHANG San-Yuan. Abstract state machine design of Internetware model[J]. J4, 2010, 44(5): 923-929.
[14] CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns[J]. J4, 2009, 43(12): 2171-2177.
[15] JIANG Chao, YING Jing, TUN Meng-Hui, et al. Feature increment oriented  approach for software product line analysis[J]. J4, 2009, 43(12): 2142-2148.