Abstract:In cancer genomics or epigenomics research, tumor tissues obtained from clinic are mixtures of cancer and normal cells, and impure tumor may have a severe impact on subsequent data analyses. Based on DNA methylation microarray data, we propose a simple method, GmmPurify, for estimating tumor purity in this paper. First, we apply Gaussian mixture model on the common normal samples to derive an important statistics“information contribution value”. Then, we construct a set of differential methylation sites with high information contribution values and estimate their tumor purity by using kernel density method. To verify the performance of GmmPurify, we use it to compute the purities of nine types of tumors from The Cancer Genome Atlas (TCGA), and the obtained purity estimates are highly consistent with the results of two state of the art methods. The result shows that GmmPurify could provide a satisfactory tumor purity estimation in the absence of normal samples with match the current tumor samples.
1 OLSHENA B, BENGTSSONH, NEUVIALP, et al. Parent-specific copy number in paired tumor-normal studies using circular binary segmentation[J]. Bioinformatics, 2011, 27(15): 2038-2046.DOI:10.1093/bioinformatics/btr329 2 CARTERS L, CIBULSKISK, HELMANE, et al. Absolute quantification of somatic DNA alterations in human cancer[J]. Nature Biotechnology, 2012, 30(5): 413-421. DOI:10.1038/nbt.2203 3 YOSHIHARAK, SHAHMORADGOLIM, MART?NEZE, et al. Inferring tumour purity and stromal and immune cell admixture from expression data[J]. Nature Communications, 2013(4): 2612. 4 ZHANGN Q, WUH J, ZHANGW W, et al. Predicting tumor purity from methylation microarray data[J]. Bioinformatics, 2015, 31(21): 3401-3405.DOI:10.1093/bioinformatics/btv370 5 ZHENGX Q, ZHANGN Q, WUH J, et al. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies[J]. Genome Biology, 2017, 18:17.DOI:10.1186/s13059-016-1143-5 6 DOUH X, FANGY, ZHENGX Q. Universal informative CpG sites for inferring tumor purity from DNA methylation microarray data[J]. Journal of Bioinformatics and Computational Biology, 2018, 16(3): 1750030. DOI:10.1142/s0219720017500305 7 ZHENGX Q, ZHAOQ, WUH J, et al. MethylPurify: Tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes[J]. Genome Biology, 2014, 15(8): 419. DOI:10.1186/s13059-014-0419-x 8 SUX P, ZHANGL, ZHANGJ P, et al. PurityEst: Estimating purity of human tumor samples using next-generation sequencing data[J]. Bioinformatics, 2012, 28(17): 2265-2266.DOI:10.1093/bioinformatics/bts365 9 WILSONR. Multiresolution Gaussian Mixture Models: Theory and Application[R]. Coventry: University of Warwick, 2000: 1-10. 10 ZHOUY T, FANY, CHENZ Y, et al. Multimodality prediction of chaotic time series with sparse hard-cut EM learning of the Gaussian process mixture model[J]. Chinese Physics Letter, 2017, 34(5): 050502. DOI:10.1088/0256-307x/34/5/050502 11 王凯南,金立左. 基于高斯混合模型的EM算法改进与优化[J]. 工业控制计算机, 2017, 30(5): 115-116,118. WANGK N, JINL Z. Improvement and optimization of EM algorithm based on Gaussian mixture model[J]. Industrial Control Computer, 2017, 30(5): 115-118.