Automatic Technology, Telecommunication Technology |
|
|
|
|
Small sample learning algorithm based on novel hybrid class labeling technique |
LI Min dan, SHEN Ye, ZHANG Dong ping, YIN Hai bing |
Department of Signal and Information Processing, China Jiliang University, Hangzhou 310018, China |
|
|
Abstract A small sample learning algorithm based on a novel hybrid class labeling technique (HCLT) was proposed in order to address the learning problem resulting from the underrepresented labeled training set in computer aided diagnosis(CAD). The abundant unlabeled samples were labeled by HCLT with three diverse class labeling schemes respectively from the view point of geometric similarity, probabilistic distribution and semantic concept. Only those unlabeled samples which get the unanimous labeling results from three different labeling schemes were added to the training set in order to enlarge the labeled training set. The memberships of pseudo labeled samples were introduced to fuzzy support vector machine (FSVM) in order to reduce the adverse effects for learning performance resulting from the still existing labeling mistakes. The contributions of pseudo labeled samples to learning task were determined by their memberships. Classification experiment results based on datasets in UCI show that the proposed algorithm can deal with the small sample learning problem. The algorithm has less mistakes and better classification performance compared with the other algorithms which adopt the single labeling scheme.
|
Published: 31 March 2016
|
|
基于混合类别标记新技术的小样本学习算法
针对计算机辅助诊断(CAD)中标记病例样本难以收集所引起的小样本学习问题,提出基于混合类别标记新技术(HCLT)的小样本学习算法.该算法分别基于几何距离、概率分布及语义概念对大量存在的未标记样本进行差异化标记,将有一致标记结果的样本加入样本集,以此扩大训练样本集.为了减少错误标记样本对学习过程造成的不利影响,提出样本伪标记隶属度并引入模糊支持向量机(FSVM)学习中,由隶属度控制样本对学习过程的贡献程度.基于UCI数据集的实验结果表明,采用该算法能够解决小样本学习问题的有效性.与单一类别标记技术相比,该算法产生的错误标记样本显著减少、学习性能显著改善.
|
|
[1] 沈晔,李敏丹,夏顺仁.计算机辅助乳腺癌诊断中的非平衡学习技术研究[J].浙江大学学报:工学版,2013, 47(1): 1-7.
SHEN Ye, LI Min dan. XIA Shun ren. Learning algorithm with non balanced data for computer aided diagnosis of breast cancer [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(1): 1-7.
[2] GORGEL P, SERTBAS A, UCAN O N. Computer aided classification of breast masses in mammogram images based on spherical wavelet transform and support vector machines[J]. EXPERT SYSTEMS, 2015, 32(1): 155-164.
[3] DHEEBA J, SELVI S T. Classification of malignant and benign microcalcification using SVM [C]∥Proceedings of ICETECT. Tamil Nadu: [s. n.], 2011: 686-690.
[4] JEYAKUMAR V, KANAGARAJ B R. A framework for medical image retrieval system using ant colony optimization and weighted relevance feedback [J]. Journal of Medical Imaging and Health Informatics, 2015, 5(7): 1383-1389.
[5] 沈晔,夏顺仁,李敏丹. 基于内容的医学图像检索中的相关反馈技术[J].中国生物医工程学报,2009, 28(1): 128-136.
SHEN Ye, XIA Shun ren, LI Min dan. A survey on relevance feedback techniques in content based medical image retrieval [J]. Chinese Journal of Biomedical Engineering, 2009, 28(1): 128-136.
[6]WU K, YAP K H. Fuzzy SVM for content based image retrieval: a pseudo label support vector machine framework [J]. IEEE Computational Intelligence Magazine, 2006, 1(2): 10-16.
[7]ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency [C] ∥Proceedings of NIPS. Whistler: [s. n.], 2003: 321-328.
[8]WANG Fei, ZHANG Chang shui. Label propagation through linear neighborhoods [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-66.
[9]TU E,YANG J,KASABOV N,et al. Posterior distribution learning (PDL): a novel supervised learning framework using unlabeled samples to improve classification performance [J]. Neurocomputing, 2015, 157: 173-186.
[10]ZHOU Zhi hua, LI Ming. Tri training: exploiting unlabeled data using three classifier [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 117(11): 1529-1541.
[11] LI Ming, ZHOU Zhi hua. Improve computer aided diagnosis with machine learning techniques using undiagnosed samples [J]. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 2007, 37(6): 1088-1098.
[12] KIM J, SHIN H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo labeled patient data [J]. Journal of the American Medical Informatics Association, 2013, 20(4): 613-618.
[13]ZHOU Zhi hua, CHEN Ke jia, DAI Hong bin. Enhancing relevance feedback in image retrieval using unlabeled data [J]. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
[14]CHEN K,WANG S H. Semi supervised learning via regularized boosting working on multiple semi supervised assumptions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129-143.
[15]LE T B, KIM S W. Modified criterion to select useful unlabeled data for improving semi supervised support vector machines [J]. Pattern Recognition Letters,2015, 60 61: 48-56.
[16]LI Yu feng, ZHOU Zhi hua. Towards making unlabeled data never hurt [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175-188.
[17] MALLAPRAGADA P K, JIN R, JAIN A K, et al. SemiBoost: boosting for semi supervised learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(11): 2000-2014.
[18] JAVED K, GOURIVEAU R, ZERHOUNI N. A new multivariate approach for prognostics based on extreme learning machine and fuzzy clustering [J].IEEE Transactions on Cybernetics, 2015, 45(12): 26-39.
[19]LICHMAN M. Machine learning repository [DB/OL]. 2013 04 04. http:∥archive.ics.uci.edu/ml/datasets.html.
[20]NIE Fei ping, XU Dong, LI Xue long, et al. Semisupervised dimensionality reduction and classification through virtual label regression [J].IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(3): 675-685 |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|