Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2015, Vol. 16 Issue (12): 1059-1068    DOI: 10.1631/FITEE.1400398
    
一种概念漂移情况下数据流分类的整体方法
Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori
Department of Computer Engineering, University of Zanjan, Zanjan 45371-38791, Iran
An ensemble method for data stream classification in the presence of concept drift
Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori
Department of Computer Engineering, University of Zanjan, Zanjan 45371-38791, Iran
 全文: PDF 
摘要: 目的:数据流(data stream)管理和处理是计算机科学领域的热点问题。本文提及的“数据流”指连续且快速生成的数据包。数据流的专有特性有数据量极大、生成率高、处理时间有限和数据概念漂移(concept drift)等。这些特性将数据流区别于其他标准数据形式。数据流的一个重要问题即为输入数据分类。本文提出一种新型的整体分类器(ensemble classifier)。
创新点:在数据流分类器的基础上,提出一种包含概念漂移检测、基分类器移除和动态加权机制的方法。
方法:(1)针对不同数据输入条件,对基分类器使用两种加权函数;(2)利用Kappa系数确定概念漂移,提升算法精度;(3)基于基分类器的质量,移除不同数目的基分类器;(4)在决策阶段对基分类器应用加权机制,提升算法对漂移的适应性,提高分类器效率。
结论:在标准数据集上测试,本文方法较现有整体分类器和单分类器可获得更高的精度;在某些情况下可节省运行时间和内存用量。
关键词: 数据流分类整体分类器概念漂移    
Abstract: One recent area of interest in computer science is data stream management and processing. By ‘data stream’, we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.
Key words: Data stream    Classificaion    Ensemble classifiers    Concept drift
收稿日期: 2014-11-19 出版日期: 2015-12-07
CLC:  TP391  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Omid Abbaszadeh
Ali Amiri
Ali Reza Khanteymoori

引用本文:

Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori. An ensemble method for data stream classification in the presence of concept drift. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 1059-1068.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/FITEE.1400398        http://www.zjujournals.com/xueshu/fitee/CN/Y2015/V16/I12/1059

[1] Ehsan Saeedi, Yinan Kong, Md. Selim Hossain. 边信道攻击和学习向量量化[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(4): 511-518.
[2] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. 基于两级层次特征学习的图像分类方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[3] G. R. Brindha, P. Swaminathan, B. Santhi. 一种观点挖掘新词语权重过程性能分析[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(11): 1186-1198.
[4] Jie He, Yue-xiang Yang, Yong Qiao, Wen-ping Deng. 基于簇流的细粒度P2P流量分类[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 391-403.
[5] Qi-rong Mao, Xin-yu Pan, Yong-zhao Zhan, Xiang-jun Shen. 基于Kinect的实时面部情感识别[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(4): 272-282.
[6] Li-gang Ma, Jin-song Deng, Huai Yang, Yang Hong, Ke Wang. 基于国产高分辨率遥感影像和面向对象多变量模型的城市土地利用分类[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(3): 238-248.
[7] Jie Zhou, Bi-cheng Li, Gang Chen. 基于中文维基的大规模命名实体识别语料自动生成方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 940-956.
[8] Ying Cai, Meng-long Yang, Jun Li. 基于深度卷积网络的多分类法在头部姿态估计中的应用[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 930-939.
[9] Syed Adeel Ali Shah, Muhammad Shiraz, Mostofa Kamal Nasir, Rafidah Binti Md Noor. 城市车辆网络的单播路由协议:综述、分类法和开放性研究问题[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 489-513.
[10] Yin Tian, Hong-hui Dong, Li-min Jia, Si-yu Li. 基于多传感器相关关系的车型重识别算法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(5): 372-382.
[11] Fei-wei Qin, Lu-ye Li, Shu-ming Gao, Xiao-ling Yang, Xiang Chen. 用于三维CAD模型分类的深度学习方法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 91-106.
[12] Hao Shao, Feng Tao, Rui Xu. 采用专家问询方法的主动迁移学习算法研究[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 107-118.
[13] Ahmad Karim, Rosli Bin Salleh, Muhammad Shiraz, Syed Adeel Ali Shah, Irfan Awan, Nor Badrul Anuar. 僵尸网络探测技术:回顾、发展趋势及存在的问题[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(11): 943-983.