Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)  2018, Vol. 52 Issue (11): 2191-2200    DOI: 10.3785/j.issn.1008-973X.2018.11.018
Computer Technology     
Semi-supervised constraint ensemble clustering by fast search and find of density peaks
LIU Ru-hui, HUANG Wei-ping, WANG Kai, LIU Chuang, LIANG Jun
College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Download:   PDF(1522KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Aming at the weaknesses of clustering by fast search and find of density peaks (CFDP) proposed on Science in 2014 in selection of the cluster centers, subjective judgment of class number, limitation in some application scenarios, a semi-supervised constraint ensemble clustering by fast search and find of density peaks (SiCE-CFDP) was proposed. Relative density was used in SiCE-CFDP, the decision graph was analyzed from different perspectives to extract cluster centers, and the class number was decided by itself eventually. When facing finite constraint information, SiCE-CFDP enlarged constraint information by ensemble learning to improve clustering performance. Experiments were conducted on three synthetic datasets, four open datasets and one air conditioning system simulation dataset. For large-scale datasets, the clustering accuracy of SiCE-CFDP was higher than other well-known semi-supervised clustering algorithms.



Received: 18 June 2017      Published: 22 November 2018
CLC:  TP181  
Cite this article:

LIU Ru-hui, HUANG Wei-ping, WANG Kai, LIU Chuang, LIANG Jun. Semi-supervised constraint ensemble clustering by fast search and find of density peaks. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(11): 2191-2200.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2018.11.018     OR     http://www.zjujournals.com/eng/Y2018/V52/I11/2191


半监督约束集成的快速密度峰值聚类算法

为了解决2014年在Science上提出的快速密度峰值聚类(CFDP)算法存在的自动选择时误选和漏选中心点、簇的数量需要主观先验判断、算法使用受场景局限的缺陷,从半监督角度出发,结合集成学习思想提出半监督约束集成的快速密度峰值聚类(SiCE-CFDP)算法.SiCE-CFDP算法使用相对密度方式度量节点密度,从多角度分析决策图,自动选择候选中心点,并最终自动确定簇的数量.在只标注有限约束关系的前提下,算法能以集成学习指导约束信息的扩充,提升聚类性能.在方法验证中,通过3个人工数据集、4个公开数据集以及1个空调系统数据集进行仿真研究.结果表明,在相同的约束量前提下,针对大样本数据,SiCE-CFDP算法相比其他半监督聚类算法具有更高的聚类精度.

[1] SHAO J, HE X, BOHM C, et al. Synchronization-inspired partitioning and hierarchical clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4):893-905.
[2] HUANG J, SUN H, SONG Q, et al. Revealing density-based clustering structure from the core-connected tree of a network[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(8):1876-1889.
[3] SHENTAL N, BAR-HILLEL A, HERTZ T, et al. Gaussian mixture models with equivalence constraints[M]//BASU S, DAVIDSON I, WAGSTAFF K. Constrained clustering:advances in algorithms, theory, and applications. Boca Raton:CRC Press, 2008:33-58.
[4] MORSIER D F, TUIA D, BORGEAUD M, et al. Cluster validity measure and merging system for hierarchical clustering considering outliers[J]. Pattern Recognition, 2015, 48(4):1478-1489.
[5] PARIKH M, VARMA T. Survey on different grid based clustering algorithms[J]. International Journal, 2014, 2(2):427-430.
[6] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[7] CHEN M, LI L, WANG B, CHENG J, et al. Effectively clustering by finding density backbone based-on kNN[J]. Pattern Recognition, 2016, 60:486-498.
[8] XIE J, GAO H, XIE W, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40.
[9] XU J, WANG G, DENG W. DenPEHC:density peak based efficient hierarchical clustering[J]. Information Sciences, 2016, 373:200-218.
[10] RUIZ C, SPILIOPOULOU M, MENASALVAS E. User constraints over data streams[C]//17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin:ECML/PKDD, 2006:117-226.
[11] RUIZ C, SPILIOPOULOU M, MENASALVAS E. Density-based semi-supervised clustering[J]. Data Mining and Knowledge Discovery, 2010, 21(3):345-370.
[12] WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained K-means clustering with background knowledge[C]//Eighteenth International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers Inc, 2001:577-584.
[13] BILENKO M, BASU S, MOONEY R J. Integrating constraints and metric learning in semi-supervised clustering[C]//The Twenty-first International Conference on Machine Learning. Banff:ICML, 2004:11.
[14] Rangapuram S S, Hein M. Constrained 1-Spectral Clustering[J]. Computer Science, 2015:1143-1151.
[15] AZIMI J, FERN X. Adaptive cluster ensemble selection[C]//International Joint Conference on Artifical Intelligence. Pasadena:Morgan Kaufmann Publishers Inc, 2009:992-997.
[16] 唐伟, 周志华. 基于Bagging的选择性聚类集成[J]. 软件学报, 2005, 16(4):496-502 TANG Wei, ZHOU Zhi-Hua. Bagging-based selective clusterer ensemble[J]. Journal of Software, 2005, 16(4):496-502
[17] HANSEN L K, SALAMON P. Neural network ensemble[J]. IEEE Computer Society, 1990, 12(10):993-1001.
[18] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336):846-850.
[19] HALKIDI M, GUNOPULOS D, KUMAR N. A framework for semi-supervised learning based on subjective and objective clustering criteria[C]//Fifth IEEE International Conference on Data Mining. Houston:IEEE, 2005:637-640.
[20] BACHE K, LICHMAN M. UCI machine learning repository[EB/OL].[2017-06-18] . https://archive.ics.uci.edu/ml/index.php

[1] MENG Jun, DENG Xiao-yu, YU Jie-zhou. Postoperative survival prediction model of BP neural network with variable cluster[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(12): 2365-2371.
[2] HU Li-sha, WANG Su-zhen, CHEN Yi-qiang, GAO Chen-long, HU Chun-yu, JIANG Xin-long, CHEN Zhen-yu, GAO Xing-yu. Fall detection algorithms based on wearable device: a review[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2018, 52(9): 1717-1728.
[3] YU Jian-bo, DONG Chen-yang, LI Chuan-feng, LIU Hai-qiang. Statistical α-algorithm based process mining on clinical pathway[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(10): 1881-1890.