Accelerated <i>k</i>-nearest neighbors algorithm based on principal component analysis for text categorization

doi:10.1631/jzus.C1200303

Front. Inform. Technol. Electron. Eng.

2013, Vol. 14

Issue (6): 407-416 DOI: 10.1631/jzus.C1200303

Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization

Min Du, Xing-shu Chen

School of Computer Science, Sichuan University, Chengdu 610065, China

Download:

PDF(0KB)
Export: BibTeX | EndNote (RIS)

Abstract Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effective strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.

Key words： k-nearest neighbors (kNN) Text categorization Accelerating strategy Principal component analysis (PCA)

Received: 24 October 2012 Published: 04 June 2013

CLC:

TP391

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Min Du
	Xing-shu Chen

Cite this article:

Min Du, Xing-shu Chen. Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization. Front. Inform. Technol. Electron. Eng., 2013, 14(6): 407-416.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1200303 OR http://www.zjujournals.com/xueshu/fitee/Y2013/V14/I6/407

Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization

Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effective strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.

关键词： k-nearest neighbors (kNN), Text categorization, Accelerating strategy, Principal component analysis (PCA)

[1]	Xiu-rui Geng, Lu-yan Ji, Kang Sun. Non-negative matrix factorization based unmixing for principal component transformed hyperspectral data[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(5): 403-412.

[2]	Gurmanik Kaur, Ajat Shatru Arora, Vijender Kumar Jain. Using hybrid models to predict blood pressure reactivity to unsupported back based on anthropometric characteristics[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 474-485.

[3]	Ding-cheng Feng, Feng Chen, Wen-li Xu. Learning robust principal components from L1-norm maximization[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(12): 901-908.

Viewed

Full text

Abstract

Cited

Shared

Discussed