Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2013, Vol. 14 Issue (6): 407-416    DOI: 10.1631/jzus.C1200303
    
Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization
Min Du, Xing-shu Chen
School of Computer Science, Sichuan University, Chengdu 610065, China
Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization
Min Du, Xing-shu Chen
School of Computer Science, Sichuan University, Chengdu 610065, China
 全文: PDF 
摘要: Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effective strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.
关键词: k-nearest neighbors (kNN)Text categorizationAccelerating strategyPrincipal component analysis (PCA)    
Abstract: Text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effective strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.
Key words: k-nearest neighbors (kNN)    Text categorization    Accelerating strategy    Principal component analysis (PCA)
收稿日期: 2012-10-24 出版日期: 2013-06-04
CLC:  TP391  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Min Du
Xing-shu Chen

引用本文:

Min Du, Xing-shu Chen. Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization. Front. Inform. Technol. Electron. Eng., 2013, 14(6): 407-416.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1200303        http://www.zjujournals.com/xueshu/fitee/CN/Y2013/V14/I6/407

[1] Ding-cheng Feng, Feng Chen, Wen-li Xu. Learning robust principal components from L1-norm maximization[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(12): 901-908.