A small sample learning algorithm based on a novel hybrid class labeling technique (HCLT) was proposed in order to address the learning problem resulting from the underrepresented labeled training set in computer aided diagnosis(CAD). The abundant unlabeled samples were labeled by HCLT with three diverse class labeling schemes respectively from the view point of geometric similarity, probabilistic distribution and semantic concept. Only those unlabeled samples which get the unanimous labeling results from three different labeling schemes were added to the training set in order to enlarge the labeled training set. The memberships of pseudo labeled samples were introduced to fuzzy support vector machine (FSVM) in order to reduce the adverse effects for learning performance resulting from the still existing labeling mistakes. The contributions of pseudo labeled samples to learning task were determined by their memberships. Classification experiment results based on datasets in UCI show that the proposed algorithm can deal with the small sample learning problem. The algorithm has less mistakes and better classification performance compared with the other algorithms which adopt the single labeling scheme.
LI Min dan, SHEN Ye, ZHANG Dong ping, YIN Hai bing. Small sample learning algorithm based on novel hybrid class labeling technique. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(1): 137-143.
[1] 沈晔,李敏丹,夏顺仁.计算机辅助乳腺癌诊断中的非平衡学习技术研究[J].浙江大学学报:工学版,2013, 47(1): 1-7.
SHEN Ye, LI Min dan. XIA Shun ren. Learning algorithm with non balanced data for computer aided diagnosis of breast cancer [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(1): 1-7.
[2] GORGEL P, SERTBAS A, UCAN O N. Computer aided classification of breast masses in mammogram images based on spherical wavelet transform and support vector machines[J]. EXPERT SYSTEMS, 2015, 32(1): 155-164.
[3] DHEEBA J, SELVI S T. Classification of malignant and benign microcalcification using SVM [C]∥Proceedings of ICETECT. Tamil Nadu: [s. n.], 2011: 686-690.
[4] JEYAKUMAR V, KANAGARAJ B R. A framework for medical image retrieval system using ant colony optimization and weighted relevance feedback [J]. Journal of Medical Imaging and Health Informatics, 2015, 5(7): 1383-1389.
[5] 沈晔,夏顺仁,李敏丹. 基于内容的医学图像检索中的相关反馈技术[J].中国生物医工程学报,2009, 28(1): 128-136.
SHEN Ye, XIA Shun ren, LI Min dan. A survey on relevance feedback techniques in content based medical image retrieval [J]. Chinese Journal of Biomedical Engineering, 2009, 28(1): 128-136.
[6]WU K, YAP K H. Fuzzy SVM for content based image retrieval: a pseudo label support vector machine framework [J]. IEEE Computational Intelligence Magazine, 2006, 1(2): 10-16.
[7]ZHOU D, BOUSQUET O, LAL T N, et al. Learning with local and global consistency [C] ∥Proceedings of NIPS. Whistler: [s. n.], 2003: 321-328.
[8]WANG Fei, ZHANG Chang shui. Label propagation through linear neighborhoods [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 55-66.
[9]TU E,YANG J,KASABOV N,et al. Posterior distribution learning (PDL): a novel supervised learning framework using unlabeled samples to improve classification performance [J]. Neurocomputing, 2015, 157: 173-186.
[10]ZHOU Zhi hua, LI Ming. Tri training: exploiting unlabeled data using three classifier [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 117(11): 1529-1541.
[11] LI Ming, ZHOU Zhi hua. Improve computer aided diagnosis with machine learning techniques using undiagnosed samples [J]. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 2007, 37(6): 1088-1098.
[12] KIM J, SHIN H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo labeled patient data [J]. Journal of the American Medical Informatics Association, 2013, 20(4): 613-618.
[13]ZHOU Zhi hua, CHEN Ke jia, DAI Hong bin. Enhancing relevance feedback in image retrieval using unlabeled data [J]. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
[14]CHEN K,WANG S H. Semi supervised learning via regularized boosting working on multiple semi supervised assumptions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 129-143.
[15]LE T B, KIM S W. Modified criterion to select useful unlabeled data for improving semi supervised support vector machines [J]. Pattern Recognition Letters,2015, 60 61: 48-56.
[16]LI Yu feng, ZHOU Zhi hua. Towards making unlabeled data never hurt [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 175-188.
[17] MALLAPRAGADA P K, JIN R, JAIN A K, et al. SemiBoost: boosting for semi supervised learning [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(11): 2000-2014.
[18] JAVED K, GOURIVEAU R, ZERHOUNI N. A new multivariate approach for prognostics based on extreme learning machine and fuzzy clustering [J].IEEE Transactions on Cybernetics, 2015, 45(12): 26-39.
[19]LICHMAN M. Machine learning repository [DB/OL]. 2013 04 04. http:∥archive.ics.uci.edu/ml/datasets.html.
[20]NIE Fei ping, XU Dong, LI Xue long, et al. Semisupervised dimensionality reduction and classification through virtual label regression [J].IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(3): 675-685