Please wait a minute...
浙江大学学报(理学版)  2020, Vol. 47 Issue (2): 191-195    DOI: 10.3785/j.issn.1008-9497.2020.02.009
数学与计算机科学     
基于高斯混合模型的肿瘤纯度估计
闫占正, 李玉双
燕山大学 理学院,河北 秦皇岛 066004
Tumor purity estimation based on Gaussian mixture model
YAN Zhanzheng, LI Yushuang
School of Science, Yanshan University, Qinhuangdao 066004, Hebei Province, China
 全文: PDF(1470 KB)   HTML  
摘要: 在癌症基因组学研究中,临床所得的肿瘤组织是由癌症和正常细胞组成的混合物,肿瘤不纯会对后续的数据分析产生严重影响。基于DNA甲基化的芯片数据,构造了一种简单的肿瘤纯度估计方法GmmPurify。首先借助公共正常样本,利用高斯混合模型定义了一个重要的统计量“信息贡献值”;然后筛选出具有高信息贡献值的DNA甲基化位点,构成差异甲基化位点集合;最后利用核密度方法估计肿瘤的纯度。将GmmPurify方法应用于9类肿瘤,得到的纯度估值与两类先进方法的结果高度一致。研究结果表明,在与肿瘤样本相匹配的正常样本缺失的情况下,借助公共正常样本,GmmPurify可以给出令人满意的肿瘤纯度估计。
关键词: DNA甲基化肿瘤纯度高斯混合模型信息贡献值差异甲基化位点    
Abstract: In cancer genomics or epigenomics research, tumor tissues obtained from clinic are mixtures of cancer and normal cells, and impure tumor may have a severe impact on subsequent data analyses. Based on DNA methylation microarray data, we propose a simple method, GmmPurify, for estimating tumor purity in this paper. First, we apply Gaussian mixture model on the common normal samples to derive an important statistics“information contribution value”. Then, we construct a set of differential methylation sites with high information contribution values and estimate their tumor purity by using kernel density method. To verify the performance of GmmPurify, we use it to compute the purities of nine types of tumors from The Cancer Genome Atlas (TCGA), and the obtained purity estimates are highly consistent with the results of two state of the art methods. The result shows that GmmPurify could provide a satisfactory tumor purity estimation in the absence of normal samples with match the current tumor samples.
Key words: DNA methylation    tumor purity    Gaussian mixture model    information contribution value    differential methylation site
收稿日期: 2019-03-11 出版日期: 2020-03-25
CLC:  Q 332  
基金资助: 国家自然科学基金资助项目(61807029).
通讯作者: ORCID: https://orcid.org/0000-0001-6387-6016,E-mail:yushuangli@ysu.edu.cn.     E-mail: yushuangli@ysu.edu.cn
作者简介: 闫占正(1992—),ORCID: http://orcid.org/0000-0002-0895-1826,男,硕士研究生,主要从事基于DNA甲基化的肿瘤纯度估计研究,E-mail:yan.zhanzheng@foxmail.com.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
闫占正
李玉双

引用本文:

闫占正, 李玉双. 基于高斯混合模型的肿瘤纯度估计[J]. 浙江大学学报(理学版), 2020, 47(2): 191-195.

YAN Zhanzheng, LI Yushuang. Tumor purity estimation based on Gaussian mixture model. Journal of Zhejiang University (Science Edition), 2020, 47(2): 191-195.

链接本文:

https://www.zjujournals.com/sci/CN/10.3785/j.issn.1008-9497.2020.02.009        https://www.zjujournals.com/sci/CN/Y2020/V47/I2/191

1 OLSHENA B, BENGTSSONH, NEUVIALP, et al. Parent-specific copy number in paired tumor-normal studies using circular binary segmentation[J]. Bioinformatics, 2011, 27(15): 2038-2046.DOI:10.1093/bioinformatics/btr329
2 CARTERS L, CIBULSKISK, HELMANE, et al. Absolute quantification of somatic DNA alterations in human cancer[J]. Nature Biotechnology, 2012, 30(5): 413-421. DOI:10.1038/nbt.2203
3 YOSHIHARAK, SHAHMORADGOLIM, MARTÍNEZE, et al. Inferring tumour purity and stromal and immune cell admixture from expression data[J]. Nature Communications, 2013(4): 2612.
4 ZHANGN Q, WUH J, ZHANGW W, et al. Predicting tumor purity from methylation microarray data[J]. Bioinformatics, 2015, 31(21): 3401-3405.DOI:10.1093/bioinformatics/btv370
5 ZHENGX Q, ZHANGN Q, WUH J, et al. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies[J]. Genome Biology, 2017, 18:17.DOI:10.1186/s13059-016-1143-5
6 DOUH X, FANGY, ZHENGX Q. Universal informative CpG sites for inferring tumor purity from DNA methylation microarray data[J]. Journal of Bioinformatics and Computational Biology, 2018, 16(3): 1750030. DOI:10.1142/s0219720017500305
7 ZHENGX Q, ZHAOQ, WUH J, et al. MethylPurify: Tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes[J]. Genome Biology, 2014, 15(8): 419. DOI:10.1186/s13059-014-0419-x
8 SUX P, ZHANGL, ZHANGJ P, et al. PurityEst: Estimating purity of human tumor samples using next-generation sequencing data[J]. Bioinformatics, 2012, 28(17): 2265-2266.DOI:10.1093/bioinformatics/bts365
9 WILSONR. Multiresolution Gaussian Mixture Models: Theory and Application[R]. Coventry: University of Warwick, 2000: 1-10.
10 ZHOUY T, FANY, CHENZ Y, et al. Multimodality prediction of chaotic time series with sparse hard-cut EM learning of the Gaussian process mixture model[J]. Chinese Physics Letter, 2017, 34(5): 050502. DOI:10.1088/0256-307x/34/5/050502
11 王凯南,金立左. 基于高斯混合模型的EM算法改进与优化[J]. 工业控制计算机, 2017, 30(5): 115-116,118. WANGK N, JINL Z. Improvement and optimization of EM algorithm based on Gaussian mixture model[J]. Industrial Control Computer, 2017, 30(5): 115-118.
[1] 郏丽丽,孙婷婷. 紫色球杆菌视紫红质光谱特性的机器学习研究[J]. 浙江大学学报(理学版), 2022, 49(3): 280-286.