Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2018, Vol. 19 Issue (11): 1385-1396    DOI:
    
An anchor-based spectral clustering method
Qin ZHANG, Guo-qiang ZHONG , Jun-yu DONG
Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
Science and Information College, Qingdao Agricultural University, Qingdao 266109, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Spectral clustering is one of the most popular and important clustering methods in pattern recognition,
machine learning, and data mining.  However, its high computational complexity limits it in applications involving
truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph
Laplacian with O(n
3
) time complexity.  To address this problem, we propose a novel method called anchor-based
spectral clustering (ASC) by employing anchor points of data.  Specifically, m (m  n) anchor points are selected
from  the dataset,  which  can basically  maintain  the intrinsic (manifold)  structure of  the original  data.   Then a
mapping matrix between the original data and the anchors is constructed.   More importantly, it is proved that
this data-anchor mapping matrix essentially preserves the clustering structure of the data.  Based on this mapping
matrix, it is easy to approximate the spectral embedding of the original data.  The proposed method scales linearly
relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC,
is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration
clustering and landmark-based spectral clustering,  on 10 real-world applications under three evaluation metrics.
Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable
clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness
and efficiency.


Key wordsClustering      Spectral clustering      Graph Laplacian      Anchors     
Received: 17 April 2017      Published: 13 June 2019
Cite this article:

Qin ZHANG, Guo-qiang ZHONG , Jun-yu DONG. An anchor-based spectral clustering method. Front. Inform. Technol. Electron. Eng., 2018, 19(11): 1385-1396.

URL:

http://www.zjujournals.com/xueshu/fitee/     OR     http://www.zjujournals.com/xueshu/fitee/Y2018/V19/I11/1385


An anchor-based spectral clustering method

Spectral clustering is one of the most popular and important clustering methods in pattern recognition,
machine learning, and data mining.  However, its high computational complexity limits it in applications involving
truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph
Laplacian with O(n
3
) time complexity.  To address this problem, we propose a novel method called anchor-based
spectral clustering (ASC) by employing anchor points of data.  Specifically, m (m  n) anchor points are selected
from  the dataset,  which  can basically  maintain  the intrinsic (manifold)  structure of  the original  data.   Then a
mapping matrix between the original data and the anchors is constructed.   More importantly, it is proved that
this data-anchor mapping matrix essentially preserves the clustering structure of the data.  Based on this mapping
matrix, it is easy to approximate the spectral embedding of the original data.  The proposed method scales linearly
relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC,
is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration
clustering and landmark-based spectral clustering,  on 10 real-world applications under three evaluation metrics.
Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable
clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness
and efficiency.

关键词: Clustering,  Spectral clustering,  Graph Laplacian,  Anchors 
[1] Rabia IRFAN , Sharifullah KHAN, Kashif RAJPOOT, Ali Mustafa QAMAR. TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data[J]. Front. Inform. Technol. Electron. Eng., 2018, 19(6): 763-782.
[2] A Ram CHOI, Sung Min KIM, Mee Young SUNG. Controlling the contact levels of details for fast and precise haptic collision detection[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(8): 1117-1130.
[3] Ke-shi GE, Hua-you SU, Dong-sheng LI, Xi-cheng LU. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(7): 915-927.
[4] Erfan Shaghaghi, Mohammad Reza Jabbarpour, Rafidah Md Noor, Hwasoo Yeo, Jason J. Jung. Adaptive green traffic signal controlling using vehicular communication[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(3): 373-393.
[5] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. Two-level hierarchical feature learning for image classification[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[6] Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. A social tag clustering method based on common co-occurrence group similarity[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(2): 122-134.
[7] Shi-jin Ren, Yin Liang, Xiang-jun Zhao, Mao-yun Yang. A novel multimode process monitoring method integrating LDRSKM with Bayesian inference[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(8): 617-633.
[8] Zhi-min Han, Zhi-yun Lin, Min-yue Fu, Zhi-yong Chen. Distributed coordination in multi-agent systems: a graph Laplacian perspective[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 429-448.
[9] Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen. A sampling method based on URL clustering for fast web accessibility evaluation[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 449-456.
[10] Xian Zang, Felipe P. Vista Iv, Kil To Chong. Fast global kernel fuzzy c-means clustering algorithm for consonant/vowel segmentation of speech signal[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 551-563.
[11] Yi Xie, Hui-min Yu, Roland Hu. Probabilistic hypergraph based hash codes for social image search[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 537-550.
[12] Xin-zheng Xu, Shi-fei Ding, Zhong-zhi Shi, Hong Zhu. Optimizing radial basis function neural network based on rough sets and affinity propagation clustering algorithm[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(2): 131-138.
[13] Jing Fan, Hai-feng Ji, Xin-xin Guan, Ying Tang. A GPU-based multi-resolution algorithm for simulation of seed dispersal[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(11): 816-827.
[14] Suiang-Shyan Lee, Ja-Chen Lin. An accelerated K-means clustering algorithm using selection and erasure rules[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(10): 761-768.
[15] Zhen-gong Cai, Xiao-hu Yang, Xin-yu Wang, Aleksander J. Kavs. A fuzzy formal concept analysis based approach for business component identification[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(9): 707-720.