计算机辅助乳腺癌诊断中的非平衡学习技术

doi:10.3785/j.issn.1008-973X.2013.01.001

2013, Vol. 47

Issue (1): 1-7 DOI: 10.3785/j.issn.1008-973X.2013.01.001

计算机技术﹑电信技术

计算机辅助乳腺癌诊断中的非平衡学习技术

沈晔1,2, 李敏丹2, 夏顺仁1

1.浙江大学生物医学工程与仪器科学学院,浙江杭州310027；
2.中国计量学院信号与信息处理系,浙江杭州 310018

Learning algorithm with non-balanced data for computer-aided
diagnosis of breast cancer

SHEN Ye1,2, LI Min-dan2, XIA Shun-ren1

1.School of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou 310027, China;
2.Department of Signal and Information Processing, China Jiliang University, Hangzhou 310018, China

全文: PDF

摘要：

针对计算机辅助诊断(CAD)中学习算法处理非平衡数据时,分类器预测具有大类样本的分类误差小,而稀有类样本的分类误差大的倾向性分类问题,提出基于反向k近邻的欠采样新方法.通过去除大类样本集中的噪声及冗余样本、保留具有类别代表性且可靠的样本作为有效样本以此平衡训练样本集,解决了欠采样引起的类别信息的丢失问题.基于UCI Breast-cancer数据集的仿真实验结果表明,该方法解决了非平衡学习问题的有效性,进一步的横向评测对比显示该算法性能显著优于其他同类算法.

关键词： 计算机辅助诊断; 非平衡学习; 支持向量机; 反向k近邻; 欠采样

Abstract:

When the learning algorithm handles non-balanced data in the computer-aided diagnosis, the prediction result of classifier is undesirably biased. The classification error of the big samples is small, while the classification error of the small samples is great. A reverse k nearest neighbor subsampling method was proposed in order to address the non-balanced learning issue. By removing the noisy and redundant samples from the big samples, and keeping the representative and reliable samples as the effective samples, the balanced training samples was realized, and the problem of the loss of the class information resulted from the subsampling was solved. The simulation results with the Breast-cancer dataset in UCI Machine Learning Repository show the validity of the algorithm to deal with the learning problems for non-balanced data. The experimental results show that the algorithm obviously outperforms existing methods．

Key words: computer-aided diagnosis class-imbalance learning support vector machine reverse k nearest neighbor under-sampling

出版日期: 2013-03-05

TP 391.7

基金资助:

国家自然科学基金资助项目(60772092,81101903)

通讯作者: 夏顺仁,男,教授,博导. E-mail: srxia@zju.edu.cn

作者简介: 沈晔（1978-）,男,博士生,从事基于内容的图像检索、计算机辅助诊断研究.E-mail:shenye1978@vip.sina.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

引用本文:

沈晔, 李敏丹, 夏顺仁. 计算机辅助乳腺癌诊断中的非平衡学习技术[J]. J4, 2013, 47(1): 1-7.

SHEN Ye, LI Min-dan, XIA Shun-ren. Learning algorithm with non-balanced data for computer-aided
diagnosis of breast cancer. J4, 2013, 47(1): 1-7.

链接本文:

http://www.zjujournals.com/xueshu/eng/CN/10.3785/j.issn.1008-973X.2013.01.001 或 http://www.zjujournals.com/xueshu/eng/CN/Y2013/V47/I1/1

［1］ EL-NAQA I, YANG Y, WERNICK M N, et al. A support vector machine approach for detection of microcalcifications ［J］. IEEE Transaction on Medical Imaging, 2002, 21(12): 1552-1563.
［2］ DHEEBA J, SELVI S T. Classification of malignant and benign microcalcification using SVM ［C］∥Proceedings of ICETECT. Tamil Nadu: ［s. n.］, 2011: 686-690．
［3］ RAHMAN M M, BHATTACHARYA P, DESAI B C. A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback ［J］. IEEE Transactions on Information Technology in Biomedicine, 2007, 11(1): 58-69．
［4］沈晔,夏顺仁,李敏丹. 基于内容的医学图像检索中的相关反馈技术［J］.中国生物医学工程学报,2009, 28(1): 128-136．
SHEN Ye, XIA Shun-ren, LI Min-dan. A survey on relevance feedback techniques in contentbased medical image retrieval ［J］. Chinese Journal of Biomedical Engineering, 2009, 28(1): 128-136．
［5］ WU K, YAP K H. Fuzzy SVM for content-based image retrieval：a pseudo-label support vector machine framework ［J］. IEEE Computational Intelligence Magazine, 2006, 1(2): 10-16.
［6］ ZHOU Zhi-hua, CHEN Kejia, DAI Hongbin. Enhancing relevance feedback in image retrieval using unlabeled data ［J］. ACM Transactions on Information Systems, 2006, 24(2): 219-244.
［7］ ZHOU Zhi-hua, LI Ming. Tri-training: exploiting unlabeled data using three classifier ［J］. IEEE Transactions on Knowledge and Data Engineering, 2005, 117(11): 1529-1541.
［8］ LI Ming, ZHOU Zhi-hua. Improve computeraided diagnosis with machine learning techniques using undiagnosed samples ［J］. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2007, 37(6): 1088-1098.
［9］ LI Yu-feng, ZHOU Zhi-hua. Towards making unlabeled data never hurt ［C］∥Proceedings of ICML. USA: ［s. n.］, 2011: 1081-1088．
［10］ ZHANG J, MANI I. KNN approach to unbalanced data distributions: a case study involving information extraction ［C］∥ Proceeding of ICML’2003 Workshop on Learning from Imbalanced Data Sets. Washington DC : ［s. n.］, 2003．
［11］ CHEN X W, GERLACH B, CASASEN T D. Pruning support vectors for imbalanced data classification ［C］∥Proceedings of 18th International Joint Conference on Neural Networks. Montreal:［s. n.］, 2005: 1883-1887．
［12］ LIU Xu-ying, WU Jian-xin, ZHOU Zhihua. Exploratory undersampling for classimbalance learning ［J］. IEEE Transactions on Systems, Man and Cybernetics, 2009, 39(2): 539-550．
［13］ MEASE D, WYNER A J, BUJA A. Boosted classification trees and class probability/quantile estimation ［J］. Machine Learning Research, 2007, 8:409-439．
［14］ CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority oversampling technique ［J］. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357．
［15］ HE Hai-bo, BAI Yang, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning ［C］∥Proceedings of IJCNN. Hong Kong: IEEE, 2008: 1322-1328．
［16］ ZHOU Zhi-hua, LIU Xu-ying. Training cost-sensitive neural networks with methods addressing the class imbalance problem ［J］. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(1): 63-77.
［17］ MASNADI-SHIRAZI H, VASCONCELOS N. Cost-sensitive boosting ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 294-309．
［18］ HE Hai-bo, GARCIA E A. Learning from imbalanced data ［J］. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284．
［19］ WEISS G M. Mining with rarity: a unifying framework ［J］. Sigkdd Explorations, 2004, 6(1): 7-19．
［20］ KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one sided selection ［C］∥Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1997: 179-186．
［21］ KORN F, MUTHUKRISHNAN S. Influence sets based on reverse neareast neighbor queries ［C］∥Proceedings of the 2000 ACM SIGMOD International conference on Management of Data. New York: ACM, 2000: 201-212．
［22］ TAO Yu-fei, YIU Man-lung, MAMOULIS N. Reverse nearest neighbor search in metric spaces ［J］. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9): 1239-1252．
［23］ MUHAMMAD A C, LIN Xuemin, ZHANG Wenjie, et al. Influence zone: efficiently processing reverse k nearest neighbors queries ［C］∥Proceedings of ICDE. HANNOVER: ［s. n.］, 2011: 577-588．
［24］杨风召,朱扬勇. 一种有效的量化交易数据相似性搜索方法［J］.计算机研究与发展,2004,41（2）：361-368．
YANG Feng-zhao, ZHU Yang-yong. An efficient method for similarity search on quantitative transaction data ［J］. Journal of Computer Research and Development, 2004, 41(2): 361-368．

[1]	袁红, 王波, 王丽, 许睦旬. 以轮廓为对象的体态特征情绪分类与预测[J]. 浙江大学学报(工学版), 2018, 52(1): 160-165.
[2]	尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[3]	朱东阳, 沈静逸, 黄炜平, 梁军. 基于主动学习和加权支持向量机的工业故障识别[J]. 浙江大学学报(工学版), 2017, 51(4): 697-705.
[4]	廖苗, 赵于前, 曾业战, 黄忠朝, 张丙奎, 邹北骥. 基于支持向量机和椭圆拟合的细胞图像自动分割[J]. 浙江大学学报(工学版), 2017, 51(4): 722-728.
[5]	谢罗峰, 徐慧宁, 黄沁元, 赵越, 殷国富. 应用双树复小波包和NCA-LSSVM检测磁瓦内部缺陷[J]. 浙江大学学报(工学版), 2017, 51(1): 184-191.
[6]	钟崴, 彭梁, 周永刚, 徐剑, 从飞云. 基于小波包分析和支持向量机的锅炉结渣诊断[J]. 浙江大学学报(工学版), 2016, 50(8): 1499-1506.
[7]	赵凌, 黄平捷, 刘宝玲, 赵树浩, 侯迪波, 张光新. 多层导电结构内部状态脉冲涡流检测分析方法[J]. 浙江大学学报(工学版), 2016, 50(4): 603-608.
[8]	陈大伟, 姚拴宝, 刘韶庆, 郭迪龙. 高速列车头型气动反设计方法[J]. 浙江大学学报(工学版), 2016, 50(4): 631-640.
[9]	冯培恩, 彭贝, 高宇, 邱清盈. 液压挖掘机作业循环阶段的智能识别[J]. 浙江大学学报(工学版), 2016, 50(2): 209-217.
[10]	李敏丹,沈晔,章东平,殷海兵. 基于混合类别标记新技术的小样本学习算法[J]. 浙江大学学报(工学版), 2016, 50(1): 137-143.
[11]	潘翔,童伟淮,张三元,郑河荣. 结合语义本体与泊松方程的动画角色模型分割[J]. 浙江大学学报(工学版), 2015, 49(9): 1634-1641.
[12]	黄发明, 殷坤龙, 张桂荣, 唐志政, 张俊. 多变量PSO-SVM模型预测滑坡地下水位[J]. 浙江大学学报(工学版), 2015, 49(6): 1193-1200.
[13]	谭海龙, 刘康玲, 金鑫, 石向荣, 梁军. 基于μσ-DWC特征和树结构M-SVM的多维时间序列分类[J]. 浙江大学学报(工学版), 2015, 49(6): 1061-1069.
[14]	程健, 项志宇, 于海滨, 刘济林. 城市复杂环境下基于三维激光雷达实时车辆检测[J]. 浙江大学学报(工学版), 2014, 48(12): 2101-2106.
[15]	汪开灿,许霁,翟国富. 基于电磁超声的铝板缺陷识别方法[J]. 浙江大学学报(工学版), 2014, 48(11): 2031-2038.

Viewed

Full text

Abstract

Cited

Shared

Discussed