|
|
An anchor-based spectral clustering method |
Qin ZHANG, Guo-qiang ZHONG , Jun-yu DONG |
Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
Science and Information College, Qingdao Agricultural University, Qingdao 266109, China |
|
|
Abstract Spectral clustering is one of the most popular and important clustering methods in pattern recognition,
machine learning, and data mining. However, its high computational complexity limits it in applications involving
truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph
Laplacian with O(n
3
) time complexity. To address this problem, we propose a novel method called anchor-based
spectral clustering (ASC) by employing anchor points of data. Specifically, m (m n) anchor points are selected
from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a
mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that
this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping
matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly
relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC,
is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration
clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics.
Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable
clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness
and efficiency.
|
Received: 17 April 2017
Published: 13 June 2019
|
An anchor-based spectral clustering method
Spectral clustering is one of the most popular and important clustering methods in pattern recognition,
machine learning, and data mining. However, its high computational complexity limits it in applications involving
truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph
Laplacian with O(n
3
) time complexity. To address this problem, we propose a novel method called anchor-based
spectral clustering (ASC) by employing anchor points of data. Specifically, m (m n) anchor points are selected
from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a
mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that
this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping
matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly
relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC,
is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration
clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics.
Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable
clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness
and efficiency.
关键词:
Clustering,
Spectral clustering,
Graph Laplacian,
Anchors
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|