|
|
Approach for collection selection based on click-through data |
LIU Ying1, CHEN Ling1, CHEN Gen-cai1, ZHAO Jiang-qi2, WANG Jing-chang2 |
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; 2.Zhejiang Hongcheng
Computer Systems Company Limited, Hangzhou 310009, China |
|
|
Abstract An approach of collection selection based on click-through data (PCTD-CS) was proposed considering that collections have different contributions to the final retrieval results. Click-through data of past queries were utilized for estimating the relevance of each collection to the query. A term-based and results-based mixed approach was used to estimate the similarity between queries. Past similar queries were used to predict the relevance of collections to a specific user query. Then M collections with the highest relevance were selected for retrieving, and the number of documents each collection returned was determined when top k ranked results were required. Rm, P@n and MAP were used to verify the effectiveness of the new collection selection method. Experimental results demonstrated that PCTD-CS improved the accuracy and recall of search results. PCTD-CS was better at selecting collections with more relevant documents.
|
Published: 01 January 2013
|
|
基于历史点击数据的集合选择方法
针对分布式信息检索时不同信息集对最终检索结果贡献度有差异的现象,提出基于历史点击数据的集合选择方法(PCTD-CS).该方法利用点击数据估计各集合与历史查询的相关度.采用基于关键词和基于检索结果相结合的方法估计查询间的相似度.利用历史查询中的相似查询估计新查询与各集合的相关度,选择相关度最高的M个集合进行检索,给出要获取前k个文档的情况下各集合应当返回的文档数.采用召回率Rm、前n个检索结果的准确率P@n及平均准确率MAP对集合选择方法的性能进行验证.实验结果表明,采用PCTD-CS方法提高了检索结果的召回率和准确率,能够更准确地定位到包含相关文档多的集合.
|
|
[1] CALLAN J. Distributed information retrieval [M]. USA: Kluwer Academic Publishes, 2000: 127-150.
[2] CALLAN J, LU Z, CROFT W B. Searching distributed collection with inference networks [C] ∥ Proceeding of ACM SIGIR. Seattle, Washington: ACM, 1995: 21-28.
[3] SI L, JIN R, ALLAN J. et al. A language modeling framework for resource selection and results merging [C] ∥ Proceeding of ACM CIKM. McLean, Virginia: ACM, 2002: 391-397.
[4] SI L, CALLAN J. Relevant document distribution estimation method for resource selection [C] ∥ Proceeding of ACM SIGIR. Toronto, Canada: ACM, 2003: 298-305.
[5] RASOLOFO Y, ABBACI F, SAVOY J. Approaches to collection selection and results merging for distributed information retrieval [C] ∥ Proceeding of ACM CIKM. Atlanta: ACM, 2001: 191-198.
[6] PUPPIN D, SILVESTRI F, LAFORENZA D. Query-driven document partitioning and collection selection [C] ∥ Proceeding of the 1st INFOSCALE Conference. Hong Kong: ACM, 2006: Article 34. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|