Please wait a minute...
J4  2012, Vol. 46 Issue (3): 377-385    DOI: 10.3785/j.issn.1008-973X.2012.03.001
计算机技术     
CB-LSH:基于压缩位图的高性能LSH索引算法
吴羽,寿黎但,陈刚
浙江大学 计算机科学与技术学院,浙江 杭州 310027
CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap
WU Yu, SHOU Li-dan, CHEN Gang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
 全文: PDF  HTML
摘要:

由于传统局部敏感散列(LSH)算法的删除性能不足,阻碍了LSH算法在实际产品中的应用.提出一种基于压缩位图的改进方法,通过引入压缩位图改良传统LSH算法的桶中数据结构,以及使用标记清除策略进行算法流程优化,解决传统LSH索引实时删除性能差的问题.理论分析证明:基于压缩位图的LSH(CB-LSH)算法可以显著降低算法的空间复杂度和时间复杂度.实验结果支撑了理论分析的结论,相对于传统LSH算法,CB-LSH在降低内存消耗的同时,可显著提高索引删除、数据插入和数据查询的性能.在大型项目中的应用实践验证了在线实时更新的海量多媒体数据检索系统中,CB-LSH索引算法对于多媒体数据的高维索引是有效可行的,并显著提升了性能、降低了资源消耗.

Abstract:

An improvement method based on compressed bitmap was proposed to address the deficiency of deletion performance in traditional locality-sensitive hashing (LSH) algorithms. To improve the overall performance especially the delete performance of LSH algorithms, a novel Boolean algebra schema for LSH operations was designed and the vector structure in hash-buckets was replaced with compressed bitmaps. In addition, a mark-and-sweep strategy was utilized to perform a batch deletion and further improved the delete performance. Theoretical analysis showed the improvements of CB-LSH in both memory space requirements and time complexity of all the LSH operations. Thorough experiments showed that CB-LSH not only saved notable memory space, but also achieved significant improvements in delete, query and insert performances. Industrial practice also verified that CB-LSH is qualified for online updates and queries for content-based image retrieval in largescale multimedia databases, and CB-LSH significantly improves the overall performance and optimally utilizes the resources.

出版日期: 2012-03-01
:  TP 311.13  
基金资助:

国家自然科学基金资助项目(60803003, 60970124).

通讯作者: 陈刚,男,教授,博导.     E-mail: cg@zju.edu.cn
作者简介: 吴羽(1982-),男,博士生,从事垂直搜索引擎和数据库相关研究.E-mail: wuyu@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

吴羽,寿黎但,陈刚. CB-LSH:基于压缩位图的高性能LSH索引算法[J]. J4, 2012, 46(3): 377-385.

WU Yu, SHOU Li-dan, CHEN Gang. CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap. J4, 2012, 46(3): 377-385.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2012.03.001        http://www.zjujournals.com/eng/CN/Y2012/V46/I3/377

[1] PIOTR I, RAJEEV M. Approximate nearest neighbors: towards removing the curse of dimensionality [C]∥ Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. New York: ACM, 1998: 604-613.
[2] 程守远. 基于图像检索技术的领带花型检索的研究[D]. 上海: 东华大学, 2006.
[3] WU Y, SHOU LD, HU TL, et al. Query triggered crawling strategy: build a time sensitive vertical search engine [C]∥ Proceedings of the International Conference on Cyberworlds 2008. Washington: IEEE Computer Society, 2008: 422-427.
[4] GIONIS A, INDYK P, MOTWANI R. Similarity Search in High Dimensions via Hashing [C]∥ Proceedings of the 25th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc, 1999:518529.
[5] DATAR M, IMMORLICA N, INDYK P, et al. Localitysensitive hashing scheme based on pstable distributions [C]∥ Proceedings of the Twentieth Annual Symposium on Computational Geometry. New York: ACM, 2004: 235-262.
[6] PANIGPRAHY R. Entropy based nearest neighbor search in high dimensions [C]∥ Proceedings of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithm. New York: ACM, 2006: 1186-1195.
[7] DONG W, WANG Z, JOSEPHSON W, et al. Modeling LSH for performance tuning. [C]∥ Proceeding of the 17th ACM Conference on Information and Knowledge Management. New York: ACM, 2008: 669-678.
[8] LV Q, JOSEPHSON W, WANG Z, et al. MultiProbe LSH: Efficient indexing for highdimensional similarity search [C] ∥ Proceedings of the 33rd International Conference on Very Large Data Bases. San Francisco: VLDB Endowment, 2007: 950-961.
[9] KATAYAMA N, SATOH S. The SRtree: An index structure for highdimensional nearest neighbor queries [C]∥ Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1997: 369-380.
[10] WEBER R, SCHEK H J, BLOTT. A quantitative analysis and performance study for similaritysearch methods in highdimensional spaces [C] ∥ Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco: VLDB Endowment, 1998: 194-205.
[11] BAWA M, CONDIE T, GANESAN P. LSH forest: selftuning indexes for similarity search [C]∥ Proceedings of the 14th International Conference on World Wide Web. New York: ACM, 2005: 651-660.
[12] CHAN CY, YANNIS EL. Bitmap index design and evaluation [C] ∥ Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1998: 355-366.
[13] WU K, OTOO E J, SHOSHANI A. A performance comparison of bitmap indexes [C]∥ Proceedings of the Tenth International Conference on Information and Knowledge Management. New York: ACM, 2001: 559-561.
[14] WU K, OTOO E J, SHOSHANI A. Compressing bitmap indexes for faster search operations[C]∥ Proceedings of the 14th International Conference on Scientific and Statistical Database Management. Washington: IEEE Computer Society, 2002: 99-108.
[15] GENNADY A, MOHAMED Z. Query processing and optimization in Oracle RDB [C]∥ Proceedings of the 23rd International Cconference on Very Large Data Bases. San Francisco: VLDB Endowment, 1996: 229-237.
[16] WU K, OTOO E J, SHOSHANI A. An efficient compression scheme for bitmap indices [J]. ACM Transactions on Database Systems, 2004, 16(1):1-7.
[17] WU K, OTOO EJ, SHOSHANI A. Optimizing bitmap indices with efficient compression [J]. ACM Transactions on Database Systems, 2006, 31(1):1-38.

[1] 郭立超, 苏宏业, 缑倩雯. 一种新的数据流频繁度变化趋势预测算法[J]. J4, 2012, 46(5): 858-865.
[2] 洪银杰, 陈刚, 陈珂. 基于分区索引的集合相似连接[J]. J4, 2012, 46(2): 286-293.
[3] 江锦华,吴羽,胡天磊,陈刚. 基于路径连接的XML复杂小枝模式查询处理[J]. J4, 2011, 45(1): 1-8.
[4] 周佳庆, 吴羽, 江锦华, 陈刚,董轶. 实时垂直搜索引擎对象缓存优化策略[J]. J4, 2011, 45(1): 14-19.