Please wait a minute...
J4  2010, Vol. 44 Issue (2): 224-231    DOI: 10.3785/j.issn.1008-973X.2010.02.003
计算机技术﹑电信技术     
基于分布式哈希表的分布式子空间聚类算法
曲琳, 周凡, 田翔, 陈耀武
(浙江大学 数字技术及仪器研究所, 浙江 杭州 310027)
Distributed Hash table-based distributed subspace clustering algorithm
QU Lin, ZHOU Fan, TIAN Xiang, CHEN Yao-wu
(Institute of Advanced Digital Technology and Instrument, Zhejiang University, Hangzhou 310027, China)
 全文: PDF  HTML
摘要:

提出一种基于分布式哈希表(DHT)的分布式子空间聚类(DISCLUS)算法,该算法对各结点存储的数据分别进行子空间聚类,对聚类结果进行合并,得到分布式系统的聚类结果.针对子空间聚类的特点,提出结果集缩减和结果集剪枝策略对结点间通讯进行优化.为实现结点聚类结果合并,提出分布式表决算法(DDV).该算法利用底层覆盖网的拓扑结构进行层次化表决信息收集,在动态网络环境中实现了对所有结点的无冗余覆盖.理论分析和实验表明,DISCLUS算法的聚类误差和通讯性能能够较好地适应系统数据集规模、网络规模和数据空间维度的增加.

Abstract:

A distributed subspace clustering (DISCLUS) algorithm based on distributed Hash table(DHT) was proposed. Each node executed subspace clustering on its local data. Then the clustering results of nodes were combined to form the final clustering results of the distributed system. The dataset reducing and pruning schemes were proposed to optimize the communication between nodes according to the speciality of subspace clustering. A DHT-based distributed voting (DDV) algorithm was proposed to combine the clustering results of nodes. The algorithm used the topology of the underlying overlay to hierarchically collect the voting information. All the nodes in the system can be covered without redundancy. The theoretical and experimental results show that the clustering error and the communication cost of DISCLUS algorithm are scalable to the dataset, nodes and dimensionality.

出版日期: 2010-03-09
:  TP 393  
基金资助:

国家“863”高技术研究发展计划资助项目(2003AA1Z2130);浙江省科技计划重大科技攻关资助项目(2005C11001-02).

通讯作者: 陈耀武,男,教授,博导.     E-mail: cyw@mail.bme.zju.edu.cn
作者简介: 曲琳(1979—),男,山东烟台人,博士生,从事嵌入式系统、对等网络、分布式数据挖掘的研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

曲琳, 周凡, 田翔, 等. 基于分布式哈希表的分布式子空间聚类算法[J]. J4, 2010, 44(2): 224-231.

QU Lin, ZHOU Fan, TIAN Xiang, et al. Distributed Hash table-based distributed subspace clustering algorithm. J4, 2010, 44(2): 224-231.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2010.02.003        http://www.zjujournals.com/eng/CN/Y2010/V44/I2/224

[1] AGRAWAL R, GEHRKE J, GUNOPULOS D, et al. Automatic subspace clustering of high dimensional data for data mining applications[C]// Proceedings of the ACM SIGMODInternational Conference on Management of Data. Seattle, Washington: ACM, 1998: 94-105.
[2] PARSONS L, HAQUE E, LIU H, et al. Subspace clustering for high dimensional data: a review[J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90-105.
[3] PATRIKAINEN A, MEILA M. Comparing subspace clusterings[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(7): 902-916.
[4] 田敬, 代亚非. P2P持久存储研究[J]. 软件学报,2007,18(6): 1379-1399.
TIAN Jing, DAI Ya-fei. Study on durable peer-to-peer storage techniques[J]. Journal of Software, 2007, 18(6): 1379-1399.
[5] DATTA S, BHADURI K, GIANNELLA C, et al. Distributed data mining in peer-to-peer networks[J]. IEEE Internet Computing, 2006, 10(4): 18-26.
[6] BAWA M, GIONIS A, MOLINA H G, et al. The price of validity in dynamic networks[C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data.Paris, France: ACM, 2004: 515-526.
[7] MEHYAR M, SPANOS D, PONGSAJAPAN J, et al. Asynchronous distributed averaging on communication networks[J]. IEEE/ACM Transactions on Networking, 2007, 15(3): 512-520.
[8] KEMPE D, DOBRA A, GEHRKE J, et al. Computing aggregate information using Gossip[C]// Proceedings of the 44th IEEE Symposium on Foundations of Computer Science.Cambridge, MA: IEEE, 2003: 482-491.
[9] DATTA S, GIANNELLA C, KARGUPTA H, et al. K-means clustering over peer-to-peer networks[C]// Proceedings of the 8th International Workshop on High Performance andDistributed Mining. Newport Beach, CA: [s. n.], 2005.
[10] WOLFF R, BHADURI K, KARGUPTA H, et al. Local L2 thresholding based data mining in peer-to-peer systems[C]// Proceedings of the SIAM Conference on Data Mining. Bethesda,MD: [s. n.], 2006: 430-441.
[11] WOLFF R, SCHUSTER A. Association rule mining in peer-to-peer systems[C]// Proceedings of the IEEE International Conference on Data Mining. Washington, DC: IEEE, 2003:363-370.
[12] LUO P, XIONG H, KEVIN L, et al. Distributed classification in peer-to-peer networks[C]// Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California: ACM, 2007: 968-976.
[13] MAYMOUNKOV P, MAZIERES D. Kademlia: a peer-to-peer information system based on the XOR metric[C]// Proceedings of the 1st International Workshop on Peer-to-PeerSystems. London, UK: Springer, 2002: 53-65.
[14] DUYGULU P, BARNARD K, FREITAS J F, et al. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary[C]// Proceedings of the 7thEuropean Conference on Computer Vision. London, UK: Springer, 2002: 97-112.

[1] 郭童,林峰. 基于混合遗传鱼群算法的贝叶斯网络结构学习[J]. J4, 2014, 48(1): 130-135.
[2] 李德骏,汪港,杨灿军,金波,陈燕虎. 基于NTP和IEEE1588海底观测网时间同步系统[J]. J4, 2014, 48(1): 1-7.
[3] 杜瑞忠, 田俊峰, 张焕国. 基于信任和个性偏好的云服务选择模型[J]. J4, 2013, 47(1): 53-61.
[4] 张帅,孙建伶,徐斌,黄超,KAVS Aleksander J.. 基于RBAC的跨多企业服务组合访问控制模型[J]. J4, 2012, 46(11): 2035-2043.
[5] 陈岁生,卢建刚,楼晓春. 基于MDS-MAP和非线性滤波的WSN定位算法[J]. J4, 2012, 46(5): 866-872.
[6] 杨朝晖,李善平,林欣. 增量型上下文信息服务的质量优化实时调度[J]. J4, 2012, 46(1): 90-97.
[7] 高庆,李善平,杨朝晖. 基于虚拟场的能量高效传感器网络地理路由[J]. J4, 2012, 46(1): 98-104.
[8] 潘巨龙,李善平,张道远. 无线传感器网络簇内可疑节点的博弈检测方法[J]. J4, 2012, 46(1): 72-78.
[9] 钱剑锋, 尹建伟, 董金祥. 结构化P2P网络的语义发布/订阅系统
负载均衡算法
[J]. J4, 2011, 45(10): 1710-1719.
[10] 杨朝晖,李善平,林欣. LBS中面向K-匿名服务资源约束的匿名度调节算法[J]. J4, 2011, 45(7): 1154-1160.
[11] 潘纲, 李石坚, 陈云星. ScudContext:信息-物理空间融合的大规模
环境上下文服务
[J]. J4, 2011, 45(6): 991-998.
[12] 车建华, 何钦铭, 陈建海, 王备. 基于软件模拟的虚拟机系统故障插入工具[J]. J4, 2011, 45(4): 614-620.
[13] 李鉴庭,金心宇,唐军,张昱. 基于无线多媒体传感器网络的目标定位方法[J]. J4, 2011, 45(1): 45-49.
[14] 张莉苹,潘纲,郑能干,杨国青,李红,赵民德. SmartC模型与代码一致性双向生成方法及开发平台[J]. J4, 2011, 45(1): 20-29.
[15] 舒挺, 孙守迁,王海宁,徐伟强. ESIS序列自适应生成算法[J]. J4, 2010, 44(11): 2183-2187.