Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2014, Vol. 15 Issue (2): 107-118    DOI: 10.1631/jzus.C1300167
    
采用专家问询方法的主动迁移学习算法研究
Hao Shao, Feng Tao, Rui Xu
School of WTO Research & Education, Shanghai University of International Business and Economics, Shanghai 200336, China; School of Business, East China University of Science and Technology, Shanghai 200237, China; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
Transfer active learning by querying committee
Hao Shao, Feng Tao, Rui Xu
School of WTO Research & Education, Shanghai University of International Business and Economics, Shanghai 200336, China; School of Business, East China University of Science and Technology, Shanghai 200237, China; School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China
 全文: PDF 
摘要: 研究目的:在分类学习中,我们往往面临着匮乏的类标信息,而对无类标数据进行分类又会耗费大量人力物力,同时大量老旧信息得不到充分应用,造成资源浪费。一个典型例就是对突发新疾病的诊断,如H7N9禽流感病毒,从发现病症到确诊,需要经过很长时间,其中重要原因是,当新疾病出现时,往往只有极少数确诊病例,而且对病情信息所知甚少,由于没有经验数据,确诊新病症极为困难,导致大量病人被当作普通流感治疗,从而耽误了救治的黄金时间。因此,针对大量疑似病例,需要尽快做出正确诊断以挽救病人生命。如果依靠医生对每一个疑似病例进行详细分析诊断,将会浪费宝贵的医疗资源和时间,耽误亟待确诊患者的救治。我们注意到,医院保存了大量其他疾病的数据库。因此,探讨如何利用已有数据(例如普通流感或肺炎数据库),辅助医生进行未知的类似病症的诊断,具有更加重要的现实意义。本研究主要利用迁移学习理论,对旧数据进行信息提取,同时借助专家系统,进一步提升其精确性,从而在快速得到准确结果的同时节省大量稀缺资源。
创新要点:采用专家系统和混合模型,进一步优化迁移学习方法。在借助专家指导的过程中,主动学习(active learning)理论可以更好提供最有价值的数据集。因此,本研究引入专家系统对迁移算法的辅助方法设计,以及使用主动学习理论来进行未知数据的人工选择,以弥补迁移学习算法在初始数据集匮乏的情况下性能不足的弱点。
研究手段:将大量冗余数据(源数据)作为专家系统,在迭代过程中设置阈值,淘汰不符合条件的专家以及数据集合,可以大大提升算法性能。
重要结论:主动学习和迁移学习的结合,能够补偿迁移学习算法对初始数据集质量的高度依赖,避免负面迁移并大大提升算法性能。
关键词: 迁移学习主动学习分类数据挖掘    
Abstract: In real applications of inductive learning for classification, labeled instances are often deficient, and labeling them by an oracle is often expensive and time-consuming. Active learning on a single task aims to select only informative unlabeled instances for querying to improve the classification accuracy while decreasing the querying cost. However, an inevitable problem in active learning is that the informative measures for selecting queries are commonly based on the initial hypotheses sampled from only a few labeled instances. In such a circumstance, the initial hypotheses are not reliable and may deviate from the true distribution underlying the target task. Consequently, the informative measures will possibly select irrelevant instances. A promising way to compensate this problem is to borrow useful knowledge from other sources with abundant labeled information, which is called transfer learning. However, a significant challenge in transfer learning is how to measure the similarity between the source and the target tasks. One needs to be aware of different distributions or label assignments from unrelated source tasks; otherwise, they will lead to degenerated performance while transferring. Also, how to design an effective strategy to avoid selecting irrelevant samples to query is still an open question. To tackle these issues, we propose a hybrid algorithm for active learning with the help of transfer learning by adopting a divergence measure to alleviate the negative transfer caused by distribution differences. To avoid querying irrelevant instances, we also present an adaptive strategy which could eliminate unnecessary instances in the input space and models in the model space. Extensive experiments on both the synthetic and the real data sets show that the proposed algorithm is able to query fewer instances with a higher accuracy and that it converges faster than the state-of-the-art methods.
Key words: Active learning    Transfer learning    Classification
收稿日期: 2013-06-20 出版日期: 2014-01-29
CLC:  TP3  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Hao Shao
Feng Tao
Rui Xu

引用本文:

Hao Shao, Feng Tao, Rui Xu. Transfer active learning by querying committee. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 107-118.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1300167        http://www.zjujournals.com/xueshu/fitee/CN/Y2014/V15/I2/107

[1] Ehsan Saeedi, Yinan Kong, Md. Selim Hossain. 边信道攻击和学习向量量化[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(4): 511-518.
[2] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. 基于两级层次特征学习的图像分类方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.
[3] G. R. Brindha, P. Swaminathan, B. Santhi. 一种观点挖掘新词语权重过程性能分析[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(11): 1186-1198.
[4] Jie He, Yue-xiang Yang, Yong Qiao, Wen-ping Deng. 基于簇流的细粒度P2P流量分类[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(5): 391-403.
[5] Qi-rong Mao, Xin-yu Pan, Yong-zhao Zhan, Xiang-jun Shen. 基于Kinect的实时面部情感识别[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(4): 272-282.
[6] Li-gang Ma, Jin-song Deng, Huai Yang, Yang Hong, Ke Wang. 基于国产高分辨率遥感影像和面向对象多变量模型的城市土地利用分类[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(3): 238-248.
[7] Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori. 一种概念漂移情况下数据流分类的整体方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 1059-1068.
[8] Jie Zhou, Bi-cheng Li, Gang Chen. 基于中文维基的大规模命名实体识别语料自动生成方法[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 940-956.
[9] Ying Cai, Meng-long Yang, Jun Li. 基于深度卷积网络的多分类法在头部姿态估计中的应用[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 930-939.
[10] Syed Adeel Ali Shah, Muhammad Shiraz, Mostofa Kamal Nasir, Rafidah Binti Md Noor. 城市车辆网络的单播路由协议:综述、分类法和开放性研究问题[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(7): 489-513.
[11] Yin Tian, Hong-hui Dong, Li-min Jia, Si-yu Li. 基于多传感器相关关系的车型重识别算法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(5): 372-382.
[12] Fei-wei Qin, Lu-ye Li, Shu-ming Gao, Xiao-ling Yang, Xiang Chen. 用于三维CAD模型分类的深度学习方法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 91-106.
[13] Ahmad Karim, Rosli Bin Salleh, Muhammad Shiraz, Syed Adeel Ali Shah, Irfan Awan, Nor Badrul Anuar. 僵尸网络探测技术:回顾、发展趋势及存在的问题[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(11): 943-983.