Please wait a minute...
J4  2012, Vol. 46 Issue (3): 377-385    DOI: 10.3785/j.issn.1008-973X.2012.03.001
    
CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap
WU Yu, SHOU Li-dan, CHEN Gang
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An improvement method based on compressed bitmap was proposed to address the deficiency of deletion performance in traditional locality-sensitive hashing (LSH) algorithms. To improve the overall performance especially the delete performance of LSH algorithms, a novel Boolean algebra schema for LSH operations was designed and the vector structure in hash-buckets was replaced with compressed bitmaps. In addition, a mark-and-sweep strategy was utilized to perform a batch deletion and further improved the delete performance. Theoretical analysis showed the improvements of CB-LSH in both memory space requirements and time complexity of all the LSH operations. Thorough experiments showed that CB-LSH not only saved notable memory space, but also achieved significant improvements in delete, query and insert performances. Industrial practice also verified that CB-LSH is qualified for online updates and queries for content-based image retrieval in largescale multimedia databases, and CB-LSH significantly improves the overall performance and optimally utilizes the resources.



Published: 01 March 2012
CLC:  TP 311.13  
Cite this article:

WU Yu, SHOU Li-dan, CHEN Gang. CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap. J4, 2012, 46(3): 377-385.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2012.03.001     OR     http://www.zjujournals.com/eng/Y2012/V46/I3/377


CB-LSH:基于压缩位图的高性能LSH索引算法

由于传统局部敏感散列(LSH)算法的删除性能不足,阻碍了LSH算法在实际产品中的应用.提出一种基于压缩位图的改进方法,通过引入压缩位图改良传统LSH算法的桶中数据结构,以及使用标记清除策略进行算法流程优化,解决传统LSH索引实时删除性能差的问题.理论分析证明:基于压缩位图的LSH(CB-LSH)算法可以显著降低算法的空间复杂度和时间复杂度.实验结果支撑了理论分析的结论,相对于传统LSH算法,CB-LSH在降低内存消耗的同时,可显著提高索引删除、数据插入和数据查询的性能.在大型项目中的应用实践验证了在线实时更新的海量多媒体数据检索系统中,CB-LSH索引算法对于多媒体数据的高维索引是有效可行的,并显著提升了性能、降低了资源消耗.

[1] PIOTR I, RAJEEV M. Approximate nearest neighbors: towards removing the curse of dimensionality [C]∥ Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. New York: ACM, 1998: 604-613.
[2] 程守远. 基于图像检索技术的领带花型检索的研究[D]. 上海: 东华大学, 2006.
[3] WU Y, SHOU LD, HU TL, et al. Query triggered crawling strategy: build a time sensitive vertical search engine [C]∥ Proceedings of the International Conference on Cyberworlds 2008. Washington: IEEE Computer Society, 2008: 422-427.
[4] GIONIS A, INDYK P, MOTWANI R. Similarity Search in High Dimensions via Hashing [C]∥ Proceedings of the 25th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc, 1999:518529.
[5] DATAR M, IMMORLICA N, INDYK P, et al. Localitysensitive hashing scheme based on pstable distributions [C]∥ Proceedings of the Twentieth Annual Symposium on Computational Geometry. New York: ACM, 2004: 235-262.
[6] PANIGPRAHY R. Entropy based nearest neighbor search in high dimensions [C]∥ Proceedings of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithm. New York: ACM, 2006: 1186-1195.
[7] DONG W, WANG Z, JOSEPHSON W, et al. Modeling LSH for performance tuning. [C]∥ Proceeding of the 17th ACM Conference on Information and Knowledge Management. New York: ACM, 2008: 669-678.
[8] LV Q, JOSEPHSON W, WANG Z, et al. MultiProbe LSH: Efficient indexing for highdimensional similarity search [C] ∥ Proceedings of the 33rd International Conference on Very Large Data Bases. San Francisco: VLDB Endowment, 2007: 950-961.
[9] KATAYAMA N, SATOH S. The SRtree: An index structure for highdimensional nearest neighbor queries [C]∥ Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1997: 369-380.
[10] WEBER R, SCHEK H J, BLOTT. A quantitative analysis and performance study for similaritysearch methods in highdimensional spaces [C] ∥ Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco: VLDB Endowment, 1998: 194-205.
[11] BAWA M, CONDIE T, GANESAN P. LSH forest: selftuning indexes for similarity search [C]∥ Proceedings of the 14th International Conference on World Wide Web. New York: ACM, 2005: 651-660.
[12] CHAN CY, YANNIS EL. Bitmap index design and evaluation [C] ∥ Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1998: 355-366.
[13] WU K, OTOO E J, SHOSHANI A. A performance comparison of bitmap indexes [C]∥ Proceedings of the Tenth International Conference on Information and Knowledge Management. New York: ACM, 2001: 559-561.
[14] WU K, OTOO E J, SHOSHANI A. Compressing bitmap indexes for faster search operations[C]∥ Proceedings of the 14th International Conference on Scientific and Statistical Database Management. Washington: IEEE Computer Society, 2002: 99-108.
[15] GENNADY A, MOHAMED Z. Query processing and optimization in Oracle RDB [C]∥ Proceedings of the 23rd International Cconference on Very Large Data Bases. San Francisco: VLDB Endowment, 1996: 229-237.
[16] WU K, OTOO E J, SHOSHANI A. An efficient compression scheme for bitmap indices [J]. ACM Transactions on Database Systems, 2004, 16(1):1-7.
[17] WU K, OTOO EJ, SHOSHANI A. Optimizing bitmap indices with efficient compression [J]. ACM Transactions on Database Systems, 2006, 31(1):1-38.

[1] GUO Li-chao, SU Hong-ye, GOU Qian-wen. A new algorithm for frequency tendency prediction over data streams[J]. J4, 2012, 46(5): 858-865.
[2] HONG Yin-jie, CHEN Gang, CHEN Ke. Set similarity join using partition index[J]. J4, 2012, 46(2): 286-293.
[3] JIANG Jin-hua, WU Yu, HU Tian-lei, CHEN Gang. Efficient processing of complex XML twig pattern queries
based on path-joins
[J]. J4, 2011, 45(1): 1-8.
[4] ZHOU Jia-qing, WU Yu, JIANG Jin-hua, CHEN Gang, DONG Yi. Object cache optimization strategy for real-time vertical search engine[J]. J4, 2011, 45(1): 14-19.