Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2012, Vol. 13 Issue (10): 761-768    DOI: 10.1631/jzus.C1200078
    
An accelerated K-means clustering algorithm using selection and erasure rules
Suiang-Shyan Lee, Ja-Chen Lin
Department of Computer Science, National Chiao Tung University, Taiwan 30050, Hsinchu
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

Key wordsK-means clustering      Acceleration      Vector quantization      Selection      Erasure     
Received: 22 March 2012      Published: 01 October 2012
CLC:  TP301.6  
Cite this article:

Suiang-Shyan Lee, Ja-Chen Lin. An accelerated K-means clustering algorithm using selection and erasure rules. Front. Inform. Technol. Electron. Eng., 2012, 13(10): 761-768.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1200078     OR     http://www.zjujournals.com/xueshu/fitee/Y2012/V13/I10/761


An accelerated K-means clustering algorithm using selection and erasure rules

The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

关键词: K-means clustering,  Acceleration,  Vector quantization,  Selection,  Erasure 
[1] Ehsan Saeedi, Yinan Kong, Md. Selim Hossain. Side-channel attacks and learning-vector quantization[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 511-518.
[2] Yuan-ping Nie, Yi Han, Jiu-ming Huang, Bo Jiao, Ai-ping Li. Attention-based encoder-decoder model for answer selection in question answering[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 535-544.
[3] Hui Zhao, You-yu Tan, Gao-feng Pan, Yun-fei Chen. Ergodic secrecy capacity of MRC/SC in single-input multiple-output wiretap systems with imperfect channel state information[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(4): 578-590.
[4] Shi-jin Ren, Yin Liang, Xiang-jun Zhao, Mao-yun Yang. A novel multimode process monitoring method integrating LDRSKM with Bayesian inference[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(8): 617-633.
[5] Hong Shao, Shuang Chen, Jie-yi Zhao, Wen-cheng Cui, Tian-shu Yu. Face recognition based on subset selection via metric learning on manifold[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 1046-1058.
[6] Jie Zhou, Bi-cheng Li, Gang Chen. Automatically building large-scale named entity recognition corpora from Chinese Wikipedia[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 940-956.
[7] Rong Li, Xin Ding, Jun-hao Yu, Tian-yi Gao, Wen-ting Zheng, Rui Wang, Hu-jun Bao. Procedural generation and real-time rendering of a marine ecosystem[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 514-524.
[8] Ji-ming Li, Yun-tao Qian. Clustering-based hyperspectral band selection using sparse nonnegative matrix factorization[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(7): 542-549.
[9] Pejman Mowlaee, Abolghasem Sayadian, Hamid Sheikhzadeh. Split vector quantization for sinusoidal amplitude and frequency[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(2): 140-154.
[10] Hong-xia Pang, Wen-de Dong, Zhi-hai Xu, Hua-jun Feng, Qi Li, Yue-ting Chen. Novel linear search for support vector machine parameter selection[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(11): 885-896.
[11] Rui Wang, Wei-feng Chen, Ming-hao Pan, Hu-jun Bao. Harmonic coordinates for real-time image cloning[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 690-698.
[12] Yong-yi Shou, Yi-lun Huang. Combinatorial auction algorithm for project portfolio selection and scheduling to maximize the net present value[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(7): 562-574.
[13] Rui Yin, Yu Zhang, Guan-ding Yu, Zhao-yang Zhang, Jie-tao Zhang. Centralized and distributed resource allocation in OFDM based multi-relay system[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(6): 450-464.
[14] Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH. Evaluating single-channel speech separation performance in transform-domain[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(3): 160-174.
[15] Yi Liu, Jian-hua Zhang, Wei Xu, Ze-min Liu. Statistical assessment of selection-based dual-hop semi-blind amplify-and-forward cooperative networks[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(10): 785-792.