Please wait a minute...
J4  2013, Vol. 47 Issue (1): 1-7    DOI: 10.3785/j.issn.1008-973X.2013.01.001
计算机技术﹑电信技术     
计算机辅助乳腺癌诊断中的非平衡学习技术
沈晔1,2 ,李敏丹2,夏顺仁1
1.浙江大学 生物医学工程与仪器科学学院,浙江 杭州310027;
2.中国计量学院 信号与信息处理系,浙江 杭州 310018
Learning algorithm with non-balanced data for computer-aided
diagnosis of breast cancer
SHEN Ye1,2, LI Min-dan2, XIA Shun-ren1
1.School of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
2.Department of Signal and Information Processing, China Jiliang University, Hangzhou 310018, China
 全文: PDF  HTML
摘要:

针对计算机辅助诊断(CAD)中学习算法处理非平衡数据时,分类器预测具有大类样本的分类误差小,而稀有类样本的分类误差大的倾向性分类问题,提出基于反向k近邻的欠采样新方法.通过去除大类样本集中的噪声及冗余样本、保留具有类别代表性且可靠的样本作为有效样本以此平衡训练样本集,解决了欠采样引起的类别信息的丢失问题.基于UCI Breast-cancer数据集的仿真实验结果表明,该方法解决了非平衡学习问题的有效性,进一步的横向评测对比显示该算法性能显著优于其他同类算法.

Abstract:

When the learning algorithm handles non-balanced data in the computer-aided diagnosis, the prediction result of classifier is undesirably biased. The classification error of the big samples is small, while the classification error of the small samples is great. A reverse k nearest neighbor subsampling method was proposed in order to address the non-balanced learning issue. By removing the noisy and redundant samples from the big samples, and keeping the representative and reliable samples as the effective samples, the balanced training samples was realized, and the problem of the loss of the class information resulted from the subsampling was solved. The simulation results with the Breast-cancer dataset in UCI Machine Learning Repository show the validity of the algorithm to deal with the learning problems for non-balanced data. The experimental results show that the algorithm obviously outperforms existing methods.

出版日期: 2013-01-01
:  TP 391.7  
基金资助:

国家自然科学基金资助项目(60772092,81101903).

通讯作者: 夏顺仁,男,教授,博导.     E-mail: srxia@zju.edu.cn
作者简介: 沈晔(1978-),男,博士生,从事基于内容的图像检索、计算机辅助诊断研究.E-mail:shenye1978@vip.sina.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

沈晔 ,李敏丹,夏顺仁. 计算机辅助乳腺癌诊断中的非平衡学习技术[J]. J4, 2013, 47(1): 1-7.

SHEN Ye, LI Min-dan, XIA Shun-ren. Learning algorithm with non-balanced data for computer-aided
diagnosis of breast cancer. J4, 2013, 47(1): 1-7.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2013.01.001        http://www.zjujournals.com/eng/CN/Y2013/V47/I1/1

[1] ELNAQA I, YANG Y, WERNICK M N, et al. A support vector machine approach for detection of microcalcifications [J]. IEEE Transaction on Medical Imaging, 2002, 21(12): 1552-1563.
[2] DHEEBA J, SELVI S T. Classification of malignant and benign microcalcification using SVM [C]∥Proceedings of ICETECT. Tamil Nadu: [s. n.], 2011: 686-690.
[3] RAHMAN M M, BHATTACHARYA P, DESAI B C. A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback [J]. IEEE Transactions on Information Technology in Biomedicine, 2007, 11(1): 58-69.
[4] 沈晔,夏顺仁,李敏丹. 基于内容的医学图像检索中的相关反馈技术[J].中国生物医学工程学报,2009, 28(1): 128-136.
SHEN Ye, XIA Shunren, LI Mindan. A survey on relevance feedback techniques in contentbased medical image retrieval [J]. Chinese Journal of Biomedical Engineering, 2009, 28(1): 128-136.
[5] WU K, YAP K H. Fuzzy SVM for contentbased image retrieval:a pseudolabel support vector machine framework [J]. IEEE Computational Intelligence Magazine, 2006, 1(2): 10-16.
[6] ZHOU Zhihua, CHEN Kejia, DAI Hongbin. Enhancing relevance feedback in image retrieval using unlabeled data [J]. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
[7] ZHOU Zhihua, LI Ming. Tritraining: exploiting unlabeled data using three classifier [J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 117(11): 1529-1541.
[8] LI Ming, ZHOU Zhihua. Improve computeraided diagnosis with machine learning techniques using undiagnosed samples [J]. IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, 2007, 37(6): 1088-1098.
[9] LI Yufeng, ZHOU Zhihua. Towards making unlabeled data never hurt [C]∥Proceedings of ICML. USA: [s. n.], 2011: 1081-1088.
[10] ZHANG J, MANI I. KNN approach to unbalanced data distributions: a case study involving information extraction [C]∥ Proceeding of ICML’2003 Workshop on Learning from Imbalanced Data Sets. Washington DC : [s. n.], 2003.
[11] CHEN X W, GERLACH B, CASASEN T D. Pruning support vectors for imbalanced data classification [C]∥Proceedings of 18th International Joint Conference on Neural Networks. Montreal:[s. n.], 2005: 1883-1887.
[12] LIU Xuying, WU Jianxin, ZHOU Zhihua. Exploratory undersampling for classimbalance learning [J]. IEEE Transactions on Systems, Man and Cybernetics, 2009, 39(2): 539-550.
[13] MEASE D, WYNER A J, BUJA A. Boosted classification trees and class probability/quantile estimation [J]. Machine Learning Research, 2007, 8:409-439.
[14] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[15] HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning [C]∥Proceedings of IJCNN. Hong Kong: IEEE, 2008: 1322-1328.
[16] ZHOU Zhihua, LIU Xuying. Training costsensitive neural networks with methods addressing the class imbalance problem [J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63-77.
[17] MASNADISHIRAZI H, VASCONCELOS N. Costsensitive boosting [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 294-309.
[18] HE Haibo, GARCIA E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
[19] WEISS G M. Mining with rarity: a unifying framework [J]. Sigkdd Explorations, 2004, 6(1): 7-19.
[20] KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one sided selection [C]∥Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 179-186.
[21] KORN F, MUTHUKRISHNAN S. Influence sets based on reverse neareast neighbor queries [C]∥Proceedings of the 2000 ACM SIGMOD International conference on Management of Data. New York: ACM, 2000: 201-212.
[22] TAO Yufei, YIU Manlung, MAMOULIS N. Reverse nearest neighbor search in metric spaces [J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9): 1239-1252.
[23] MUHAMMAD A C, LIN Xuemin, ZHANG Wenjie, et al. Influence zone: efficiently processing reverse k nearest neighbors queries [C]∥Proceedings of ICDE. HANNOVER: [s. n.], 2011: 577-588.
[24] 杨风召,朱扬勇. 一种有效的量化交易数据相似性搜索方法[J].计算机研究与发展,2004,41(2):361-368.
YANG Fengzhao, ZHU Yangyong. An efficient method for similarity search on quantitative transaction data [J]. Journal of Computer Research and Development, 2004, 41(2): 361-368.

[1] 沈晔, 李敏丹, 夏顺仁. 计算机辅助乳腺癌诊断中的非平衡学习技术[J]. J4, 2013, 47(1): 1-7.
[2] 赵杰伊,唐敏,童若锋. 基于CUDA的细分曲面阴影体算法[J]. J4, 2012, 46(7): 1301-1306.
[3] 徐进, 张树有, 费少梅. 基于自适应粒子群的产品再制造拆卸规划[J]. J4, 2011, 45(10): 1746-1752.
[4] 解利军, 王彦妮, 张帅. 基于改进粒子群算法的体绘制传递函数设计[J]. J4, 2010, 44(8): 1466-1472.
[5] 马进, 李锋, 李建华. 分布式数据挖掘中基于扰乱的隐私保护方法[J]. J4, 2010, 44(2): 276-282.
[6] 盛文露, 唐任仲, 刘运通. 基于本体的饰品创新设计过程知识服务建模[J]. J4, 2009, 43(12): 2268-2273.
[7] 刘肖健, 孙守迁, 陈实. 基于图像的编织产品三维图案结构映射[J]. J4, 2009, 43(8): 1367-1371.