Please wait a minute...
浙江大学学报(农业与生命科学版)  2014, Vol. 40 Issue (4): 463-472    DOI: 10.3785/j.issn.1008-9209.2013.08.064
论文     
建兰转录本的微卫星序列和单核苷酸多态性信息分析
李小白*, 向林, 罗洁, 秦德辉, 孙崇波
(浙江省农业科学院, 杭州 310021)
Analysis of microsatellite and single nucleotide polymorphism within transcriptomic database in Cymbidium ensifolium
Li Xiaobai*, Xiang Lin, Luo Jie, Qin Dehui, Sun Chongbo
(Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China)
 全文: PDF(14535 KB)   HTML (
摘要: 利用建兰转录组数据对其微卫星序列又称简单重复序列(simple sequence repeat,SSR)和单核苷酸多态性(single nucleotide polymorphism,SNP)进行搜索,并对其所在序列进行注释,从而为建兰分子标记的开发提供有效信息。利用High-Seq技术对建兰转录组进行深度测序,采取从头拼接策略进行拼接,最后得到了101 423个转录产物,其中含139 385 689 bp,平均长度为1 374 bp。在这个转录组数据库中一共检测到17 793个SSR和16 676个SNP位点,它们的平均密度分别是1.28个SSRs/10 kb和1.20个SNPs/10 kb。在这些SSR中,除了单核苷酸重复外,二核苷酸和三核苷酸重复是最主流的类型,分别占了所有SSR的20.46%和21.98%。在SNP中,C和T之间以及A和G之间的替换是最主要的形式,分别占了所有SNP的30.80%和28.81%。另外,对含有SSR和SNP序列注释发现:分别有1 748个SSR和1 932个SNP序列具有直系同源基因簇(Clusters of Orthologous Groups,COG)注释;4 994个SSR和4 819个SNP序列具有基因本体论(Gene Ontology,GO)注释;2 107个SSR和2 188个SNP序列具有京都基因与基因组百科(Kyoto Encyclopedia of Genes and Genomes,KEGG)注释。这些序列涉及了许多重要的生物功能和代谢途径,预示着这些潜在的标记可能与重要的生物功能有关。这些信息为建兰分子标记的开发和应用奠定了基础。
Abstract:Cymbidium ensifolium is one of Cymbidium genus, having elegant shape, beautiful appearance and fragrant aroma. Because of these features, this species gets with extremely high ornamental value. Owing to the lack of its genomic resource, the development and application of molecular marker is still limited. With the development of RNA-Seq technology, the transcriptomic data gradually accumulate and become a useful resource to explore marker with low cost and high efficiency.
   Here, the transcriptome in C. ensifoliumwas subjected to RNA-seq. Illumina sequencing was performed at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) according to the manufacturer‘s instructions (Illumina, San Diego, CA). Highquality reads were assembled de novo using Trinity with optimized Kmer length of 25. The program Msatcommander was used to analyze microsatellite (as called simple sequence repeat, SSR) frequencies. The minimum numbers of repeats for SSR detection were as follows: six repeats for di-SSRs; and four for tri-, tetra-, penta- and hexa-SSRs. Single nucleotide polymorphisms (SNPs) were detected and filtered using SAMtools and VarScan. The open reading frame (ORF) and untranslated region (UTR) within the isogene were identified using Trinity software. Isogenes containing SSR and SNP were annotated on the basis of BLAST similarity searches.
   All high-quality reads were assembled into 101 423isogenes, with total residues of 139 385 689.The isogenes averaged 1 374bp, ranging from 351 bp to 17 260 bp, and 70 583 isogenes, accounted for 69.60%, were about 600 bp. In total, 17 793SSRs and 16 676 SNPs were identified within transcriptomic database. The density of SSR and SNP was 1.28 SSRs/10 kb and 1.20 SSRs/10 kb, respectively. Among these SSRs, tri-SSR was the most types, followed by diSSR, except mono-SSR. Di-SSR and tri-SSR accounted for 20.46% and 21.98% in all SSRs, respectively. The location of SSR was also estimated. The estimated locations were obtained for 7 936 SSRs, but sequence information could not be determined for the remaining 6 586SSR regions as it extended over both estimated coding and non-coding regions. We found that most tri-SSRs and hexa-SSRs occurred more frequently in coding regions. In contrast, di-SSR, tetra-SSR, and penta-SSR, were more likely to appear in UTR rather than coding regions. Among these SNPs, C/T was the most common base substitution, followed by A/G. The two kinds of substitutions, C/T and A/G, accounted for 30.80% and 28.81% in all SNPs, respectively.
   The number of isogenes containing SSR and SNP was 13 768 and 7 519, respectively. These isogenes were annotated by Clusters of Orthologous Groups (COG), Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database, respectively. A large number of them were annotated with crucial genes that were associated with important biological functions. There were 1 748SSR and 1 932 SNP isogenes assigned into 23 COG classifications, respectively. There were 4 994 SSR and 4 819SNP isogenes classified into 80 and 78 GO terms, respectively. There were 2 107SSR and 2 188 SNP isogenes involved in 300 and 308 KEGG pathways, respectively.
    The numerous SSRs and SNPs identified in this study will contribute to marker development. The annotation of isogenes containing SSR and SNP will help in constructing genetic maps and exploring the associations between these markers and the interesting traits. The map will in turn accelerate research on genomics and functional genomics of C. ensifolium.
出版日期: 2014-07-20
CLC:  Q 75  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
秦德辉
孙崇波
李小白
向林
罗洁

引用本文:

李小白, 向林, 罗洁, 秦德辉, 孙崇波. 建兰转录本的微卫星序列和单核苷酸多态性信息分析[J]. 浙江大学学报(农业与生命科学版), 2014, 40(4): 463-472.

Li Xiaobai, Xiang Lin, Luo Jie, Qin Dehui, Sun Chongbo. Analysis of microsatellite and single nucleotide polymorphism within transcriptomic database in Cymbidium ensifolium. Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 463-472.

链接本文:

http://www.zjujournals.com/agr/CN/10.3785/j.issn.1008-9209.2013.08.064        http://www.zjujournals.com/agr/CN/Y2014/V40/I4/463

null
[1] 袁兵, 崔海瑞, 富昊伟, 蒋萌, 李瑞清, 赵海军, 舒庆尧. 水稻Os09g24220基因插入突变体的分子鉴定与农艺性状分析[J]. 浙江大学学报(农业与生命科学版), 2014, 40(4): 456-462.
[2] 周国艳, 胡望雄, 徐建红*, 薛庆中*. 整合多个组学(omics)分析植物代谢产物及其功能[J]. 浙江大学学报(农业与生命科学版), 2013, 39(3): 237-245.
[3] 张伟, 张君胜, 张尧, 王利红. 湖羊促性腺激素释放激素受体基因(GnRHR)在消化系统中的表达谱分析[J]. 浙江大学学报(农业与生命科学版), 2012, 38(4): 387-392.