Please wait a minute...
浙江大学学报(工学版)  2018, Vol. 52 Issue (11): 2191-2200    DOI: 10.3785/j.issn.1008-973X.2018.11.018
计算机技术     
半监督约束集成的快速密度峰值聚类算法
刘如辉, 黄炜平, 王凯, 刘创, 梁军
浙江大学 控制科学与工程学院, 浙江 杭州 310027
Semi-supervised constraint ensemble clustering by fast search and find of density peaks
LIU Ru-hui, HUANG Wei-ping, WANG Kai, LIU Chuang, LIANG Jun
College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
 全文: PDF(1522 KB)   HTML
摘要:

为了解决2014年在Science上提出的快速密度峰值聚类(CFDP)算法存在的自动选择时误选和漏选中心点、簇的数量需要主观先验判断、算法使用受场景局限的缺陷,从半监督角度出发,结合集成学习思想提出半监督约束集成的快速密度峰值聚类(SiCE-CFDP)算法.SiCE-CFDP算法使用相对密度方式度量节点密度,从多角度分析决策图,自动选择候选中心点,并最终自动确定簇的数量.在只标注有限约束关系的前提下,算法能以集成学习指导约束信息的扩充,提升聚类性能.在方法验证中,通过3个人工数据集、4个公开数据集以及1个空调系统数据集进行仿真研究.结果表明,在相同的约束量前提下,针对大样本数据,SiCE-CFDP算法相比其他半监督聚类算法具有更高的聚类精度.

Abstract:

Aming at the weaknesses of clustering by fast search and find of density peaks (CFDP) proposed on Science in 2014 in selection of the cluster centers, subjective judgment of class number, limitation in some application scenarios, a semi-supervised constraint ensemble clustering by fast search and find of density peaks (SiCE-CFDP) was proposed. Relative density was used in SiCE-CFDP, the decision graph was analyzed from different perspectives to extract cluster centers, and the class number was decided by itself eventually. When facing finite constraint information, SiCE-CFDP enlarged constraint information by ensemble learning to improve clustering performance. Experiments were conducted on three synthetic datasets, four open datasets and one air conditioning system simulation dataset. For large-scale datasets, the clustering accuracy of SiCE-CFDP was higher than other well-known semi-supervised clustering algorithms.

收稿日期: 2017-06-18 出版日期: 2018-11-22
CLC:  TP181  
基金资助:

国家自然科学基金资助项目(U1664264,U1509203)

通讯作者: 梁军,男,博士,教授.orcid.org/0000-0003-1115-0824.     E-mail: jliang@iipc.zju.edu.cn
作者简介: 刘如辉(1993-),男,硕士生,从事机器学习研究.orcid.org/0000-0002-5033-3693.E-mail:ruhuiliu@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

刘如辉, 黄炜平, 王凯, 刘创, 梁军. 半监督约束集成的快速密度峰值聚类算法[J]. 浙江大学学报(工学版), 2018, 52(11): 2191-2200.

LIU Ru-hui, HUANG Wei-ping, WANG Kai, LIU Chuang, LIANG Jun. Semi-supervised constraint ensemble clustering by fast search and find of density peaks. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(11): 2191-2200.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2018.11.018        http://www.zjujournals.com/eng/CN/Y2018/V52/I11/2191

[1] SHAO J, HE X, BOHM C, et al. Synchronization-inspired partitioning and hierarchical clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4):893-905.
[2] HUANG J, SUN H, SONG Q, et al. Revealing density-based clustering structure from the core-connected tree of a network[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8):1876-1889.
[3] SHENTAL N, BAR-HILLEL A, HERTZ T, et al. Gaussian mixture models with equivalence constraints[M]//BASU S, DAVIDSON I, WAGSTAFF K. Constrained clustering:advances in algorithms, theory, and applications. Boca Raton:CRC Press, 2008:33-58.
[4] MORSIER D F, TUIA D, BORGEAUD M, et al. Cluster validity measure and merging system for hierarchical clustering considering outliers[J]. Pattern Recognition, 2015, 48(4):1478-1489.
[5] PARIKH M, VARMA T. Survey on different grid based clustering algorithms[J]. International Journal, 2014, 2(2):427-430.
[6] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[7] CHEN M, LI L, WANG B, CHENG J, et al. Effectively clustering by finding density backbone based-on kNN[J]. Pattern Recognition, 2016, 60:486-498.
[8] XIE J, GAO H, XIE W, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40.
[9] XU J, WANG G, DENG W. DenPEHC:density peak based efficient hierarchical clustering[J]. Information Sciences, 2016, 373:200-218.
[10] RUIZ C, SPILIOPOULOU M, MENASALVAS E. User constraints over data streams[C]//17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin:ECML/PKDD, 2006:117-226.
[11] RUIZ C, SPILIOPOULOU M, MENASALVAS E. Density-based semi-supervised clustering[J]. Data Mining and Knowledge Discovery, 2010, 21(3):345-370.
[12] WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained K-means clustering with background knowledge[C]//Eighteenth International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers Inc, 2001:577-584.
[13] BILENKO M, BASU S, MOONEY R J. Integrating constraints and metric learning in semi-supervised clustering[C]//The Twenty-first International Conference on Machine Learning. Banff:ICML, 2004:11.
[14] Rangapuram S S, Hein M. Constrained 1-Spectral Clustering[J]. Computer Science, 2015:1143-1151.
[15] AZIMI J, FERN X. Adaptive cluster ensemble selection[C]//International Joint Conference on Artifical Intelligence. Pasadena:Morgan Kaufmann Publishers Inc, 2009:992-997.
[16] 唐伟, 周志华. 基于Bagging的选择性聚类集成[J]. 软件学报, 2005, 16(4):496-502 TANG Wei, ZHOU Zhi-Hua. Bagging-based selective clusterer ensemble[J]. Journal of Software, 2005, 16(4):496-502
[17] HANSEN L K, SALAMON P. Neural network ensemble[J]. IEEE Computer Society, 1990, 12(10):993-1001.
[18] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.
[19] HALKIDI M, GUNOPULOS D, KUMAR N. A framework for semi-supervised learning based on subjective and objective clustering criteria[C]//Fifth IEEE International Conference on Data Mining. Houston:IEEE, 2005:637-640.
[20] BACHE K, LICHMAN M. UCI machine learning repository[EB/OL].[2017-06-18] . https://archive.ics.uci.edu/ml/index.php

[1] 孟濬, 邓晓雨, 虞捷舟. 基于变量聚类的BP神经网络术后生存期预测模型[J]. 浙江大学学报(工学版), 2018, 52(12): 2365-2371.
[2] 忽丽莎, 王素贞, 陈益强, 高晨龙, 胡春雨, 蒋鑫龙, 陈振宇, 高兴宇. 基于可穿戴设备的跌倒检测算法综述[J]. 浙江大学学报(工学版), 2018, 52(9): 1717-1728.
[3] 余建波, 董晨阳, 李传锋, 刘海强. 基于统计α算法的临床路径过程挖掘[J]. 浙江大学学报(工学版), 2017, 51(10): 1881-1890.