Clustering feature vector,Decision tree,Semi-supervised learning,Stream data classification,Very fast decision tree," /> Clustering feature decision trees for semi-supervised classification from high-speed data streams" /> Clustering feature decision trees for semi-supervised classification from high-speed data streams" /> Clustering feature vector,Decision tree,Semi-supervised learning,Stream data classification,Very fast decision tree,"/> <span style="font-size:13.3333px;">Clustering feature decision trees for semi-supervised classification from high-speed data streams</span>
Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2011, Vol. 12 Issue (8): 615-628    DOI: 10.1631/jzus.C1000330
    
Clustering feature decision trees for semi-supervised classification from high-speed data streams
Wen-hua Xu1, Zheng Qin*,2, Yang Chang2
1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 2 School of Software, Tsinghua University, Beijing 100084, China
Clustering feature decision trees for semi-supervised classification from high-speed data streams
Wen-hua Xu1, Zheng Qin*,2, Yang Chang2
1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 2 School of Software, Tsinghua University, Beijing 100084, China
 全文: PDF(331 KB)  
摘要: Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data. Such approaches are impractical since labeled data are usually hard to obtain in reality. In this paper, we build a clustering feature decision tree model, CFDT, from data streams having both unlabeled and a small number of labeled examples. CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while generating high classification accuracy with high speed.
关键词: Clustering feature vector')" href="#">Clustering feature vectorDecision treeSemi-supervised learningStream data classificationVery fast decision tree    
Abstract: Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data. Such approaches are impractical since labeled data are usually hard to obtain in reality. In this paper, we build a clustering feature decision tree model, CFDT, from data streams having both unlabeled and a small number of labeled examples. CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while generating high classification accuracy with high speed.
Key words: Clustering feature vector    Decision tree    Semi-supervised learning    Stream data classification    Very fast decision tree
收稿日期: 2010-09-25 出版日期: 2011-08-03
CLC:  TP391  
服务  
把本文推荐给朋友 Clustering feature decision trees for semi-supervised classification from high-speed data streams”的文章,特向您推荐。请打开下面的网址:http://www.zjujournals.com/xueshu/fitee/CN/abstract/abstract15071.shtml" name="neirong"> Clustering feature decision trees for semi-supervised classification from high-speed data streams">
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Wen-hua Xu
Zheng Qin
Yang Chang

引用本文:

Wen-hua Xu, Zheng Qin, Yang Chang. Clustering feature decision trees for semi-supervised classification from high-speed data streams. Front. Inform. Technol. Electron. Eng., 2011, 12(8): 615-628.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1000330        http://www.zjujournals.com/xueshu/fitee/CN/Y2011/V12/I8/615

[1] Jin ZHANG, Zhao-hui TANG, Wei-hua GUI, Qing CHEN, Jin-ping LIU. Interactive image segmentation with a regression based ensemble learning paradigm[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(7): 1002-1020.
[2] Xiao-lei Ma, Yin-hai Wang, Feng Chen, Jian-feng Liu. Transit smart card data mining for passenger origin information extraction[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(10): 750-760.
[3] Yun-hua Qu, Tian-jiong Tao, Serge Sharoff, Narisong Jin, Ruo-yuan Gao, Nan Zhang, Yu-ting Yang, Cheng-zhi Xu. Using an integrated feature set to generalize and justify the Chinese-to-English transferring rule of the ‘ZHE’ aspect[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 663-676.