Please wait a minute...
J4  2010, Vol. 44 Issue (2): 224-231    DOI: 10.3785/j.issn.1008-973X.2010.02.003
    
Distributed Hash table-based distributed subspace clustering algorithm
QU Lin, ZHOU Fan, TIAN Xiang, CHEN Yao-wu
(Institute of Advanced Digital Technology and Instrument, Zhejiang University, Hangzhou 310027, China)
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A distributed subspace clustering (DISCLUS) algorithm based on distributed Hash table(DHT) was proposed. Each node executed subspace clustering on its local data. Then the clustering results of nodes were combined to form the final clustering results of the distributed system. The dataset reducing and pruning schemes were proposed to optimize the communication between nodes according to the speciality of subspace clustering. A DHT-based distributed voting (DDV) algorithm was proposed to combine the clustering results of nodes. The algorithm used the topology of the underlying overlay to hierarchically collect the voting information. All the nodes in the system can be covered without redundancy. The theoretical and experimental results show that the clustering error and the communication cost of DISCLUS algorithm are scalable to the dataset, nodes and dimensionality.



Published: 09 March 2010
CLC:  TP 393  
Cite this article:

QU Lin, ZHOU Fan, TIAN Xiang, et al. Distributed Hash table-based distributed subspace clustering algorithm. J4, 2010, 44(2): 224-231.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2010.02.003     OR     http://www.zjujournals.com/eng/Y2010/V44/I2/224


基于分布式哈希表的分布式子空间聚类算法

提出一种基于分布式哈希表(DHT)的分布式子空间聚类(DISCLUS)算法,该算法对各结点存储的数据分别进行子空间聚类,对聚类结果进行合并,得到分布式系统的聚类结果.针对子空间聚类的特点,提出结果集缩减和结果集剪枝策略对结点间通讯进行优化.为实现结点聚类结果合并,提出分布式表决算法(DDV).该算法利用底层覆盖网的拓扑结构进行层次化表决信息收集,在动态网络环境中实现了对所有结点的无冗余覆盖.理论分析和实验表明,DISCLUS算法的聚类误差和通讯性能能够较好地适应系统数据集规模、网络规模和数据空间维度的增加.

[1] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic subspace clustering of high dimensional data for data mining applications[C]// Proceedings of the ACM SIGMODInternational Conference on Management of Data. Seattle, Washington: ACM, 1998: 94-105.
[2] PARSONS L, HAQUE E, LIU H, et al. Subspace clustering for high dimensional data: a review[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90-105.
[3] PATRIKAINEN A, MEILA M. Comparing subspace clusterings[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902-916.
[4] 田敬, 代亚非. P2P持久存储研究[J]. 软件学报,2007,18(6): 1379-1399.
TIAN Jing, DAI Ya-fei. Study on durable peer-to-peer storage techniques[J]. Journal of Software, 2007, 18(6): 1379-1399.
[5] DATTA S, BHADURI K, GIANNELLA C, et al. Distributed data mining in peer-to-peer networks[J]. IEEE Internet Computing, 2006, 10(4): 18-26.
[6] BAWA M, GIONIS A, MOLINA H G, et al. The price of validity in dynamic networks[C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data.Paris, France: ACM, 2004: 515-526.
[7] MEHYAR M, SPANOS D, PONGSAJAPAN J, et al. Asynchronous distributed averaging on communication networks[J]. IEEE/ACM Transactions on Networking, 2007, 15(3): 512-520.
[8] KEMPE D, DOBRA A, GEHRKE J, et al. Computing aggregate information using Gossip[C]// Proceedings of the 44th IEEE Symposium on Foundations of Computer Science.Cambridge, MA: IEEE, 2003: 482-491.
[9] DATTA S, GIANNELLA C, KARGUPTA H, et al. K-means clustering over peer-to-peer networks[C]// Proceedings of the 8th International Workshop on High Performance andDistributed Mining. Newport Beach, CA: [s. n.], 2005.
[10] WOLFF R, BHADURI K, KARGUPTA H, et al. Local L2 thresholding based data mining in peer-to-peer systems[C]// Proceedings of the SIAM Conference on Data Mining. Bethesda,MD: [s. n.], 2006: 430-441.
[11] WOLFF R, SCHUSTER A. Association rule mining in peer-to-peer systems[C]// Proceedings of the IEEE International Conference on Data Mining. Washington, DC: IEEE, 2003:363-370.
[12] LUO P, XIONG H, KEVIN L, et al. Distributed classification in peer-to-peer networks[C]// Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California: ACM, 2007: 968-976.
[13] MAYMOUNKOV P, MAZIERES D. Kademlia: a peer-to-peer information system based on the XOR metric[C]// Proceedings of the 1st International Workshop on Peer-to-PeerSystems. London, UK: Springer, 2002: 53-65.
[14] DUYGULU P, BARNARD K, FREITAS J F, et al. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary[C]// Proceedings of the 7thEuropean Conference on Computer Vision. London, UK: Springer, 2002: 97-112.

[1] GUO Tong,LIN Feng. Bayesian network structure learning based on hybrid genetic
and fish swarm algorithm
[J]. J4, 2014, 48(1): 130-135.
[2] I De-jun,WANG Gang,YANG Can-jun,JIN Bo,CHEN Yan-hu. NTP/IEEE1588-based time synchronization system in seafloor observatory network[J]. J4, 2014, 48(1): 1-7.
[3] DU Rui-zhong, TIAN Jun-feng, ZHANG Huan-guo. Cloud service selection model based on trust and personality preferences[J]. J4, 2013, 47(1): 53-61.
[4] ZHANG Shuai, SUN Jian-ling, XU Bin, HUANG Chao, KAVS Aleksander J.. RBAC based access control model for services compositions
cross multiple enterprises
[J]. J4, 2012, 46(11): 2035-2043.
[5] Chen Sui-sheng,Lu Jian-gang,Lou Xiao-chun. Localization algorithm for wireless sensor networks
based on MDS-MAP and nonlinear filtering
[J]. J4, 2012, 46(5): 866-872.
[6] YANG Zhao-hui, LI Shan-ping, LIN Xin. Quality optimizing real-time scheduling for incremental context services[J]. J4, 2012, 46(1): 90-97.
[7] PAN Ju-long, LI Shan-ping, ZHANG Dao-yuan. Detecting suspicious node within one cluster in wireless sensor network
using game theoretic approach
[J]. J4, 2012, 46(1): 72-78.
[8] GAO Qing,LI Shan-ping,YANG Zhao-hui. Virtual force-field based energy efficient geo-routing in
wireless sensor network
[J]. J4, 2012, 46(1): 98-104.
[9] QIAN Jian-feng, YIN Jian-wei, DONG Jin-xiang. Load balancing algorithms of semantic publish/subscribe system
over structured P2P networks
[J]. J4, 2011, 45(10): 1710-1719.
[10] YANG Zhao-hui, LI Shan-ping, LIN Xin. Anonymity level adaptation algorithm to meet resource constraint
of K-anonymity service in LBS
[J]. J4, 2011, 45(7): 1154-1160.
[11] PAN Gang, LI Shi-jian, CHEN Yun-xing. ScudContext: large-scale environmental context services infrastructure
towards cyber-physical space integration
[J]. J4, 2011, 45(6): 991-998.
[12] CHE Jian-hua, HE Qin-ming, CHEN Jian-hai, WANG Bei. Software simulation-based fault injection tool of
virtual machine system
[J]. J4, 2011, 45(4): 614-620.
[13] ZHANG Li-ping, PAN Gang, ZHENG Neng-gan, YANG Guo-qing, LI Hong, ZHAO Min-de. Consistent bidirectional generation method and  development
platform based on SmartC models and codes
[J]. J4, 2011, 45(1): 20-29.
[14] LI Jian-ting, JIN Xin-yu, TANG Jun, ZHANG Yu. Target localization method based on wireless multimedia sensor network[J]. J4, 2011, 45(1): 45-49.
[15] SHU Ting, SUN Shou-qian, WANG Hai-ning, XU Wei-qiang. Adaptive generation algorithm for executable state identification
sequences in EFSM model
[J]. J4, 2010, 44(11): 2183-2187.