Please wait a minute...
浙江大学学报(工学版)
自动化技术、电信技术     
基于混合类别标记新技术的小样本学习算法
李敏丹,沈晔 ,章东平,殷海兵
中国计量学院 信号与信息处理系,浙江 杭州 310018
Small sample learning algorithm based on novel hybrid class labeling technique
LI Min dan, SHEN Ye, ZHANG Dong ping, YIN Hai bing
Department of Signal and Information Processing, China Jiliang University, Hangzhou 310018, China
 全文: PDF(740 KB)   HTML
摘要:

针对计算机辅助诊断(CAD)中标记病例样本难以收集所引起的小样本学习问题,提出基于混合类别标记新技术(HCLT)的小样本学习算法.该算法分别基于几何距离、概率分布及语义概念对大量存在的未标记样本进行差异化标记,将有一致标记结果的样本加入样本集,以此扩大训练样本集.为了减少错误标记样本对学习过程造成的不利影响,提出样本伪标记隶属度并引入模糊支持向量机(FSVM)学习中,由隶属度控制样本对学习过程的贡献程度.基于UCI数据集的实验结果表明,采用该算法能够解决小样本学习问题的有效性.与单一类别标记技术相比,该算法产生的错误标记样本显著减少、学习性能显著改善.

Abstract:

A small sample learning algorithm based on a novel hybrid class labeling technique (HCLT) was proposed in order to address the learning problem resulting from the underrepresented labeled training set in computer aided diagnosis(CAD). The abundant unlabeled samples were labeled by HCLT with three diverse class labeling schemes respectively from the view point of geometric similarity, probabilistic distribution and semantic concept. Only those unlabeled samples which get the unanimous labeling results from three different labeling schemes were added to the training set in order to enlarge the labeled training set. The memberships of pseudo labeled samples were introduced to fuzzy support vector machine (FSVM) in order to reduce the adverse effects for learning performance resulting from the still existing labeling mistakes. The contributions of pseudo labeled samples to learning task were determined by their memberships. Classification experiment results based on datasets in UCI show that the proposed algorithm can deal with the small sample learning problem. The algorithm has less mistakes and better classification performance compared with the other algorithms which adopt the single labeling scheme.

出版日期: 2016-03-31
:  TP 391  
基金资助:

浙江省自然科学基金资助项目(LY13H180011);浙江省自然科学基金资助项目(LY15F020021).

通讯作者: 沈晔, 男, 副教授. ORCID: 0000 0001 9581 8734.     E-mail: shenye1978@vip.sina.com
作者简介: 李敏丹(1976-), 女, 讲师, 从事机器学习技术的研究. ORCID: 0000 0002 3737 3164. E-mail: limindan@cjlu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

李敏丹,沈晔,章东平,殷海兵. 基于混合类别标记新技术的小样本学习算法[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2016.01.020.

LI Min dan, SHEN Ye, ZHANG Dong ping, YIN Hai bing. Small sample learning algorithm based on novel hybrid class labeling technique. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2016.01.020.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2016.01.020        http://www.zjujournals.com/eng/CN/Y2016/V50/I1/137

[1] 沈晔,李敏丹,夏顺仁.计算机辅助乳腺癌诊断中的非平衡学习技术研究[J].浙江大学学报:工学版,2013, 47(1): 1-7.
SHEN Ye, LI Min dan. XIA Shun ren. Learning algorithm with non balanced data for computer aided diagnosis of breast cancer [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(1): 1-7.
[2] GORGEL P, SERTBAS A, UCAN O N. Computer aided classification of breast masses in mammogram images based on spherical wavelet transform and support vector machines[J]. EXPERT SYSTEMS, 2015, 32(1): 155-164.
[3] DHEEBA J, SELVI S T. Classification of malignant and benign microcalcification using SVM [C]∥Proceedings of ICETECT. Tamil Nadu: [s. n.], 2011: 686-690.
[4] JEYAKUMAR V, KANAGARAJ B R. A framework for medical image retrieval system using ant colony optimization and weighted relevance feedback [J]. Journal of Medical Imaging and Health Informatics, 2015, 5(7): 1383-1389.
[5] 沈晔,夏顺仁,李敏丹. 基于内容的医学图像检索中的相关反馈技术[J].中国生物医工程学报,2009, 28(1): 128-136.
SHEN Ye, XIA Shun ren, LI Min dan. A survey on relevance feedback techniques in content based medical image retrieval [J]. Chinese Journal of Biomedical Engineering, 2009, 28(1): 128-136.
[6]WU K, YAP K H. Fuzzy SVM for content based image retrieval: a pseudo label support vector machine framework [J]. IEEE Computational Intelligence Magazine, 2006, 1(2): 10-16.
[7]ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency [C] ∥Proceedings of NIPS. Whistler: [s. n.], 2003: 321-328.
[8]WANG Fei, ZHANG Chang shui. Label propagation through linear neighborhoods [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-66.
[9]TU E,YANG J,KASABOV N,et al. Posterior distribution learning (PDL): a novel supervised learning framework using unlabeled samples to  improve classification performance [J]. Neurocomputing, 2015, 157: 173-186.
[10]ZHOU Zhi hua, LI Ming. Tri training: exploiting unlabeled data using three classifier [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 117(11): 1529-1541.
[11] LI Ming, ZHOU Zhi hua. Improve computer aided diagnosis with machine learning techniques using undiagnosed samples [J]. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 2007, 37(6): 1088-1098.
[12] KIM J, SHIN H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo labeled patient data [J]. Journal of the American Medical Informatics Association, 2013, 20(4): 613-618.
[13]ZHOU Zhi hua, CHEN Ke jia, DAI Hong bin. Enhancing relevance feedback in image retrieval using unlabeled data [J]. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
[14]CHEN K,WANG S H. Semi supervised learning via regularized boosting working on multiple semi supervised assumptions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129-143.
[15]LE T B, KIM S W. Modified criterion to select useful unlabeled data for improving semi supervised support vector machines [J]. Pattern Recognition Letters,2015, 60 61: 48-56.
[16]LI Yu feng, ZHOU Zhi hua. Towards making unlabeled data never hurt [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175-188.
[17] MALLAPRAGADA P K, JIN R, JAIN A K, et al. SemiBoost: boosting for semi supervised learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(11): 2000-2014.
[18] JAVED K, GOURIVEAU R, ZERHOUNI N. A new multivariate approach for prognostics based on extreme learning machine and fuzzy clustering [J].IEEE Transactions on Cybernetics, 2015, 45(12): 26-39.
[19]LICHMAN M. Machine learning repository [DB/OL]. 2013 04 04. http:∥archive.ics.uci.edu/ml/datasets.html.
[20]NIE Fei ping, XU Dong, LI Xue long, et al. Semisupervised dimensionality reduction and classification through virtual label regression [J].IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(3): 675-685

[1] 何雪军, 王进, 陆国栋, 刘振宇, 陈立, 金晶. 基于三角网切片及碰撞检测的工业机器人三维头像雕刻[J]. 浙江大学学报(工学版), 2017, 51(6): 1104-1110.
[2] 王桦, 韩同阳, 周可. 公安情报中基于关键图谱的群体发现算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1173-1180.
[3] 尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[4] 毕晓君, 王佳荟. 基于混合学习策略的教与学优化算法[J]. 浙江大学学报(工学版), 2017, 51(5): 1024-1031.
[5] 王亮, 於志文, 郭斌. 基于双层多粒度知识发现的移动轨迹预测模型[J]. 浙江大学学报(工学版), 2017, 51(4): 669-674.
[6] 廖苗, 赵于前, 曾业战, 黄忠朝, 张丙奎, 邹北骥. 基于支持向量机和椭圆拟合的细胞图像自动分割[J]. 浙江大学学报(工学版), 2017, 51(4): 722-728.
[7] 黄正宇, 蒋鑫龙, 刘军发, 陈益强, 谷洋. 基于融合特征的半监督流形约束定位方法[J]. 浙江大学学报(工学版), 2017, 51(4): 655-662.
[8] 蒋鑫龙, 陈益强, 刘军发, 忽丽莎, 沈建飞. 面向自闭症患者社交距离认知的可穿戴系统[J]. 浙江大学学报(工学版), 2017, 51(4): 637-647.
[9] 穆晶晶, 赵昕玥, 何再兴, 张树有. 基于凹凸变换与圆周拟合的重叠气泡轮廓重构[J]. 浙江大学学报(工学版), 2017, 51(4): 714-721.
[10] 戴彩艳, 陈崚, 李斌, 陈伯伦. 复杂网络中的抽样链接预测[J]. 浙江大学学报(工学版), 2017, 51(3): 554-561.
[11] 刘磊, 杨鹏, 刘作军. 采用多核相关向量机的人体步态识别[J]. 浙江大学学报(工学版), 2017, 51(3): 562-571.
[12] 郭梦丽, 达飞鹏, 邓星, 盖绍彦. 基于关键点和局部特征的三维人脸识别[J]. 浙江大学学报(工学版), 2017, 51(3): 584-589.
[13] 王海军, 葛红娟, 张圣燕. 基于核协同表示的快速目标跟踪算法[J]. 浙江大学学报(工学版), 2017, 51(2): 399-407.
[14] 张亚楠, 陈德运, 王莹洁, 刘宇鹏. 基于增量图形模式匹配的动态冷启动推荐方法[J]. 浙江大学学报(工学版), 2017, 51(2): 408-415.
[15] 刘宇鹏, 乔秀明, 赵石磊, 马春光. 统计机器翻译中大规模特征的深度融合[J]. 浙江大学学报(工学版), 2017, 51(1): 46-56.