Please wait a minute...
J4  2009, Vol. 43 Issue (8): 1361-1366    DOI: 10.3785/j.issn.1008-973X.
计算机科学技术     
搜索引擎中混合型分布式索引组织策略
陈伟,刘康苗,卜佳俊,陈纯,张利军
(浙江大学 计算机科学与技术学院, 浙江 杭州 310027)
Hybrid strategy to distributed index organization in search engine
 CHEN Wei, LIU Kang-Miao, BO Jia-Jun, CHEN Chun, ZHANG Li-jun
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF(927 KB)   HTML
摘要:

针对搜索引擎中索引组织策略在查询性能和可扩展性等方面存在的问题,提出了一种混合型分布式索引组织策略(Loc-Glob)。该策略整合了局部和全局索引组织的基本思路,首先将搜索引擎系统的索引服务器从逻辑上分为若干个索引服务器池,索引数据先以局部(或全局)索引组织策略分配到索引服务器池上。然后,在索引服务器池的内部,索引继续以全局(或局部)索引组织的方式存储到各索引服务器上。混合型的索引组织策略较局部和全局索引组织策略具有更好的可扩展性。实验结果表明,该策略较全局索引组织策略在查询性能、负载均衡方面都有所提升,与局部索引组织策略的查询性能基本相当,并具备较高的负载均衡水平。

Abstract:

A hybrid index organization strategy named Loc-Glob was proposed to enhance the query performance and scalability in search engine. Loc-Glob integrates two well-studied index partitioning schemes, which are widely used in search engines. Firstly, index is partitioned according to local (or global) index organization strategy, taking cluster of some index servers as a single machine. Then, index distributed to certain cluster are further partitioned to index servers according to the global (or local) index organization strategy inside the cluster. Loc-Glob is more scalable than the traditional strategies to accommodate the explosively growing web pages. Experimental results indicate that the throughput of Loc-Glob outperforms the global index organization while it is very close to the local index organization, and Loc-Glob provides good load-balancing level.

出版日期: 2009-09-28
:  TP 274.2  
通讯作者: 刘康苗,男,博士后.     E-mail: lkm@zju.edu
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

陈伟, 刘康苗, 卜佳俊, 等. 搜索引擎中混合型分布式索引组织策略[J]. J4, 2009, 43(8): 1361-1366.

CHEN Wei, LIU Kang-Miao, BO Jia-Jun, DENG. Hybrid strategy to distributed index organization in search engine. J4, 2009, 43(8): 1361-1366.

链接本文:

http://www.zjujournals.com/eng/CN/ 10.3785/j.issn.1008-973X.        http://www.zjujournals.com/eng/CN/Y2009/V43/I8/1361

[1] RIBEIRO-NETO B, BARBOSA R. Query performance for tightly coupled distributed digital libraries [C]∥ Proceedings of 3rd ACM Conference on Digital Libraries. Pittsburgh: ACM, 1998: 182-190.
[2] MAC A, MCCANN J A, ROBERTSON S E. Parallel search using partitioned inverted files [C]∥ Proceedings of 7th International Symposium on String Processing and Information Retrieval. A Coruna: IEEE, 2000: 209-220.
[3] BARROSO L A, DEAN J, HOLZLE U. Web search for a planet: the google cluster architecture [J]. IEEE Micro, 2003, 23(2): 22-28.
[4] MELNIK S, RAGHAVAN S, YANG B, et al. Building a distributed full-text index for the web [C]∥ Proceedings of the 10th International Conference on World Wide Web. Hong Kong: ACM, 2001: 396-406.
[5] BUTTCHER S, CLARKE C L A, LUSHMAN B. Hybrid index maintenance for growing text collections [C]∥ Proceedings of the 29th ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle: ACM, 2006: 356-363.
[6] LESTER N, MOFFAT A, ZOBEL J. Fast on-line index construction by geometric partitioning [C]∥ Proceedings of the 14th ACM International Conference on Information and Knowledge Management. Bremen: ACM, 2006: 776-783.
[7] JEONG B S, OMIECINSKI E. Inverted file partitioning schemes in multiple disk systems [J]. IEEE Transactions on Parallel and Distributed Systems, 1995, 6(2): 142-153.
[8] ZOBEL J, MOFFAT A. Inverted files for text search engines [J]. ACM Computing Surveys, 2006, 38(2): Article 6.
[9] BADUE C, RIBEIRO-NETO B, BAEZA-YATES R, et al. Distributed query processing using partitioned inverted files [C]∥ Proceedings of 8th International Symposium on String Processing and Information Retrieval. Manaus: IEEE, 2001: 10-20.
[10] TOMASIC A, GARCIA-MOLINA H. Performance of inverted indices in shared-nothing distributed text document information retrieval systems [C]∥ Proceedings of 2nd International Conference on Parallel and Distributed Information Systems. San Diego: IEEE, 1993: 8-17.
[11] LEMPEL R, MORAN S. Optimizing result prefetching in web search engines with segmented indices [J]. ACM Transactions on Internet Technology, 2004, 4(1): 31-59.
[12] BAEZA-YATES R, RIBEIRO-NETO B. Modern information retrieval [M]. Beijing: China Machine Press, 2004.
[13] MOFFAT A, WEBBER W, ZOBEL J, et al. A pipelined architecture for distributed text query evaluation [J]. Information Retrieval, 2006, 10(3): 205-231.
[14] MOFFAT A, WEBBER W, ZOBEL J. Load balancing for term-distributed parallel retrieval [C]∥ Proceedings of the 29th ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle: ACM, 2006: 348-355.
[15] WILLIAMS H E, ZOBEL J, BAHLE D. Fast phrase querying with combined indexes [J]. ACM Transactions on Information Systems, 2004, 22(4): 573-594.
[16] KIM M S, WHANG K Y, LEE J G, et al. n-gram/2L: a space and time efficient two-level n-gram inverted index structure [C]∥ Proceedings of the 31st International Conference on Very Large Databases. Trondheim: ACM, 2005: 325-336.

No related articles found!