Please wait a minute...
J4  2011, Vol. 45 Issue (1): 14-19    DOI: 10.3785/j.issn.1008-973X.2011.01.003
    
Object cache optimization strategy for real-time vertical search engine
ZHOU Jia-qing1, WU Yu1, JIANG Jin-hua1, CHEN Gang1, DONG Yi2
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Branch of Industrial and Commercial Bank of China, Hangzhou 310009, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new vertical search engine object cache optimization strategy was proposed to address the challenges like the changeful of popular objects, the property of query triggered data crawl and so on. A popular object prediction model was proposed based on relationships between objects and their properties in order to predict the tendency of popular object distribution. Since user query and data changed by  Poisson process, a procedure to maximize the data freshness and an optimal strategy to distribute and balance resource were proposed. Experimental results show that  the increase in time complexity is relative limited, while the average freshness of user query result and the query precision ratio preceded traditional fixed-rate cache strategy.



Published: 03 March 2011
CLC:  TP 311.13  
Cite this article:

ZHOU Jia-qing, WU Yu, JIANG Jin-hua, CHEN Gang, DONG Yi. Object cache optimization strategy for real-time vertical search engine. J4, 2011, 45(1): 14-19.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2011.01.003     OR     http://www.zjujournals.com/eng/Y2011/V45/I1/14


实时垂直搜索引擎对象缓存优化策略

针对实时垂直搜索引擎搜索对象热门度多变和数据抓取由查询驱动等问题,提出一种全新的实时垂直搜索引擎对象缓存优化策略.基于对象及属性间的关联设计热门对象预测模型,预测热门对象的变化趋势;基于用户查询及对象变化符合泊松过程的特点,推导最大化数据新鲜度的计算方法,从理论上给出资源分配和动态平衡的最优策略.大量的对比实验验证了新的缓存优化策略在较少开销增长的前提下,用户查询结果平均新鲜度和准确率均明显优于传统固定频率的缓存策略.

[1] WU Y, SHOU L, HU T, et al. Query triggered crawling strategy: build a time sensitive vertical search engine [C]∥Proceedings of the 2008 International Conference on Cyberworlds. Hangzhou: IEEE, 2008: 422-427.
[2] BREWINGTON B E,CYBENKO G. How dynamic is the web [J]. Computer Networks, 2000, 33(1/6): 257-276.
[3] BREWINGTON B E,CYBENKO G. Keeping up with the changing web [J]. IEEE Computer, 2000, 33(5): 52-58.
[4] GRIMES C, BRIEN S O. Microscale evolution of Web pages [C]∥Proceeding of the 17th International Conference on World Wide Web. Beijing: ACM, 2008: 1149-1150.
[5] CHO J, GARCIAMOLINA H. The evolution of the web and implications for an incremental crawler [C]∥Proceedings of the 26th International Conference on Very Large DataBases. San Francisco: Morgan Kaufmann, 2000: 200-209.
[6] FETTERLY D, MANASSE M, NAJORK M, et al. A largescale study of the evolution of web pages [C]∥Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 669-678.
[7] OLSTON C, PANDEY S. Recrawl scheduling based on information longevity [C]∥ Proceedings of the 17th International World Wide Web Conference. Beijing: ACM, 2008: 437-446.
[8] CHO J, GARCIAMOLINA H. Estimating frequency of change [J]. ACM Transactions on Internet Technology, 2003, 3(3): 256-290.
[9] CHO J, GARCIAMOLINA H. Effective page refresh policies for Web crawlers [J]. ACM Transactions on Database Systems, 2003, 28(4): 390-426.
[10] SATO N, EUHARA M, SAKAI Y. FTFIDF scoring for fresh information retrieval [C]∥ Proceedings of the 18th International Conference on Advanced Information Networking and Application. [S.l.]: IEEE, 2004: 165-170.
[11] SATO N, EUHARA M, SAKAI Y. The evaluations of FTFIDF scoring for fresh information retrieval [C]∥Proceedings of the 19th International Conference on Advanced Information Networking and Applications. [S.l.]: IEEE, 2005: 635-640.

[1] GUO Li-chao, SU Hong-ye, GOU Qian-wen. A new algorithm for frequency tendency prediction over data streams[J]. J4, 2012, 46(5): 858-865.
[2] WU Yu, SHOU Li-dan, CHEN Gang. CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap[J]. J4, 2012, 46(3): 377-385.
[3] HONG Yin-jie, CHEN Gang, CHEN Ke. Set similarity join using partition index[J]. J4, 2012, 46(2): 286-293.
[4] JIANG Jin-hua, WU Yu, HU Tian-lei, CHEN Gang. Efficient processing of complex XML twig pattern queries
based on path-joins
[J]. J4, 2011, 45(1): 1-8.