Please wait a minute...
J4  2011, Vol. 45 Issue (1): 14-19    DOI: 10.3785/j.issn.1008-973X.2011.01.003
计算机技术﹑电信技术     
实时垂直搜索引擎对象缓存优化策略
周佳庆1, 吴羽1, 江锦华1, 陈刚1,董轶2
1.浙江大学 计算机科学与技术学院, 浙江 杭州 310027; 2.工商银行浙江省分行, 浙江 杭州 310009
Object cache optimization strategy for real-time vertical search engine
ZHOU Jia-qing1, WU Yu1, JIANG Jin-hua1, CHEN Gang1, DONG Yi2
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Zhejiang Branch of Industrial and Commercial Bank of China, Hangzhou 310009, China
 全文: PDF  HTML
摘要:

针对实时垂直搜索引擎搜索对象热门度多变和数据抓取由查询驱动等问题,提出一种全新的实时垂直搜索引擎对象缓存优化策略.基于对象及属性间的关联设计热门对象预测模型,预测热门对象的变化趋势;基于用户查询及对象变化符合泊松过程的特点,推导最大化数据新鲜度的计算方法,从理论上给出资源分配和动态平衡的最优策略.大量的对比实验验证了新的缓存优化策略在较少开销增长的前提下,用户查询结果平均新鲜度和准确率均明显优于传统固定频率的缓存策略.

Abstract:

A new vertical search engine object cache optimization strategy was proposed to address the challenges like the changeful of popular objects, the property of query triggered data crawl and so on. A popular object prediction model was proposed based on relationships between objects and their properties in order to predict the tendency of popular object distribution. Since user query and data changed by  Poisson process, a procedure to maximize the data freshness and an optimal strategy to distribute and balance resource were proposed. Experimental results show that  the increase in time complexity is relative limited, while the average freshness of user query result and the query precision ratio preceded traditional fixed-rate cache strategy.

出版日期: 2011-03-03
:  TP 311.13  
基金资助:

国家自然科学基金资助项目(60603044, 60803003);浙江省科技计划重大科技攻关项目(2006c11108).

通讯作者: 陈刚,男,教授,博导.     E-mail: cg@zju.edu.cn
作者简介: 周佳庆(1984-), 男,浙江绍兴人,硕士生,从事垂直搜索引擎研究. E-mail: jorsef@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

周佳庆, 吴羽, 江锦华, 陈刚,董轶. 实时垂直搜索引擎对象缓存优化策略[J]. J4, 2011, 45(1): 14-19.

ZHOU Jia-qing, WU Yu, JIANG Jin-hua, CHEN Gang, DONG Yi. Object cache optimization strategy for real-time vertical search engine. J4, 2011, 45(1): 14-19.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2011.01.003        http://www.zjujournals.com/eng/CN/Y2011/V45/I1/14

[1] WU Y, SHOU L, HU T, et al. Query triggered crawling strategy: build a time sensitive vertical search engine [C]∥Proceedings of the 2008 International Conference on Cyberworlds. Hangzhou: IEEE, 2008: 422-427.
[2] BREWINGTON B E,CYBENKO G. How dynamic is the web [J]. Computer Networks, 2000, 33(1/6): 257-276.
[3] BREWINGTON B E,CYBENKO G. Keeping up with the changing web [J]. IEEE Computer, 2000, 33(5): 52-58.
[4] GRIMES C, BRIEN S O. Microscale evolution of Web pages [C]∥Proceeding of the 17th International Conference on World Wide Web. Beijing: ACM, 2008: 1149-1150.
[5] CHO J, GARCIAMOLINA H. The evolution of the web and implications for an incremental crawler [C]∥Proceedings of the 26th International Conference on Very Large DataBases. San Francisco: Morgan Kaufmann, 2000: 200-209.
[6] FETTERLY D, MANASSE M, NAJORK M, et al. A largescale study of the evolution of web pages [C]∥Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 669-678.
[7] OLSTON C, PANDEY S. Recrawl scheduling based on information longevity [C]∥ Proceedings of the 17th International World Wide Web Conference. Beijing: ACM, 2008: 437-446.
[8] CHO J, GARCIAMOLINA H. Estimating frequency of change [J]. ACM Transactions on Internet Technology, 2003, 3(3): 256-290.
[9] CHO J, GARCIAMOLINA H. Effective page refresh policies for Web crawlers [J]. ACM Transactions on Database Systems, 2003, 28(4): 390-426.
[10] SATO N, EUHARA M, SAKAI Y. FTFIDF scoring for fresh information retrieval [C]∥ Proceedings of the 18th International Conference on Advanced Information Networking and Application. [S.l.]: IEEE, 2004: 165-170.
[11] SATO N, EUHARA M, SAKAI Y. The evaluations of FTFIDF scoring for fresh information retrieval [C]∥Proceedings of the 19th International Conference on Advanced Information Networking and Applications. [S.l.]: IEEE, 2005: 635-640.

[1] 郭立超, 苏宏业, 缑倩雯. 一种新的数据流频繁度变化趋势预测算法[J]. J4, 2012, 46(5): 858-865.
[2] 吴羽,寿黎但,陈刚. CB-LSH:基于压缩位图的高性能LSH索引算法[J]. J4, 2012, 46(3): 377-385.
[3] 洪银杰, 陈刚, 陈珂. 基于分区索引的集合相似连接[J]. J4, 2012, 46(2): 286-293.
[4] 江锦华,吴羽,胡天磊,陈刚. 基于路径连接的XML复杂小枝模式查询处理[J]. J4, 2011, 45(1): 1-8.