|
Tools for quantitative trait locus mapping and genome-wide association study mapping: a review
Md. Mamun Monir, Zhu Jun
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 379-386.
https://doi.org/10.3785/j.issn.1008-9209.2014.04.212
One of the key objectives in genomics studies is to understand genetic architecture of complex traits and diseases. Quantitative trait locus (QTL) mapping and genome-wide association study (GWAS) mapping have been using to dissect genetic architecture of complex traits and diseases that assists in genetic breeding and drug discovery. In this paper, we reviewed QTL mapping and GWAS mapping methodologies and softwares for complex traits or diseases analysis. PLINK, TASSEL, SNPassoc, GenABEL and ProbABEL are most popular softwares providing many useful functions for GWAS mapping. PLINK is the highest popular opensource whole genome association analysis toolset, which implements a number of functions for SNP data analysis. TASSEL is another popular software, implements Q+K composite approach for association mapping. SNPassoc, GenABEL and ProbABEL are popular open source R packages using for association mapping. We have briefly described above popular softwares for GWAS mapping and have described implemented functions in QTXNetwork software for GWAS mapping of quantitative trait with markers (QTLs), SNPs (QTSs), transcripts (QTTs), proteins (QTPs), and metabolites (QTMs). We also describe some other popular softwares such as Windows QTL Cartographer, QTL Express, Map Manager QTX, R/qtl and QTLNetwork for QTL mapping.
|
|
Entire chloroplast genome sequence of tea (Camellia sinensis cv. Longjing 43): a molecular phylogenetic analysis
Ye Xiaoqian, Zhao Zhonghui, Zhu Quanwu, Wang Yingying, Lin Zhangxiang, Ye Chuyu, Fan Longjiang, Xu Hairong
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 404-412.
https://doi.org/10.3785/j.issn.1008-9209.2014.05.051
Camellia sinensis cv. Longjing 43 is a domestic variety of tea species and an important economic crop in China. In this study, we developed a rapid method to get the chloroplast (cp) genome and sequenced the entire cp genome sequence of C. sinensis cv. Longjing 43. The C. sinensis cv. Longjing 43 cp genome was 157 085 bp in length, which contained a large singlecopy (LSC, 86 642 bp) region, a small single-copy (SSC, 18 283 bp) region, and two inverted repeat (IR, each with a size of 26 080 bp) regions. With the cp genome of Korean C. sinensis cultivar as a reference, 134 chloroplast genes were successfully annotated. There were 15 genes with non-synonymous mutations in the coding region and more than 100 polymorphic sites in the non-coding region, which could be the DNA markers for the determination of different C. sinensis varieties. We also investigated the relationship of 12 C. sinensis varieties in China based on several cp genomic regions, which contain many variant sites. The result showed that these varieties were divided into two groups with Lingyunbaimaocha in one group and the other 11 in another group. Among the other 11 varieties, the Longjingchangye, Longjingyuanye, Longjingguazi, and Zhongcha 102 had a closer relationship and were formed into one cluster with 100% support rate, demonstrating the reliability of the method that used the cp genome sequences to investigate the genetic relationships.
|
|
Impacts of cigarette smoking on epistasis and gender-specific effects of FEV1/FVC ratio in human
Xu Changwei, Zhu Jun
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 413-420.
https://doi.org/10.3785/j.issn.1008-9209.2014.04.214
The ratio of FEV1 (forced expiratory volume in one second) to FVC (forced vital capacity) is an index for pulmonary obstruction measurement and one of the most significant predictors for chronic obstructive pulmonary disease (COPD), which is a heritable multi-factorial disease. We present genome-wide association study (GWAS) to map the genetic architecture of this trait and investigate the networks between the external factors (smoking and gender) and genetic factors. By using a mixed linear model and a conditional model, we conducted GWAS in a cohort suffered COPD from the U.S. National Heart, Lung and Blood Institute. Among 561 467 single nucleotide polymorphisms, we found 12 significant quantitative trait SNPs (QTSs) fitted the full model. And for each of them, we demonstrated the mechanisms and relationship between pulmonary function and genes detected. STIM2 and MRE11A (PEW-value<1×10-5) showed unambiguous evidence of association with COPD. APOL3 (PEW-value<1×10-5) was influenced by different genders in different ways and previous studies also implicated its associations with smoking behavior. The variation of genes MRE11A and DNAJC15 was related to lung adenocarcinoma, which is a serious complication of COPD. The significant epistasis effects of these genes suggested the possibility of multiple functional polymorphisms. These associations offer mechanistic insight into pulmonary function regulation and networks between genetics factors and environmental factors, which indicate potential ways for interventions to COPD and many other respiratory diseases.
|
|
Genome-wide association studies for identifying genetic architecture of SGRQ in smoking population
Hao Xinying, Zhu Jun
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 431-439.
https://doi.org/10.3785/j.issn.1008-9209.2014.04.271
The dualism of genetic predisposition and gender influences has long been a hot topic in the development of chronic obstructive pulmonary disease (COPD), and smoking is considered a primary risk factor for this lung disease. This paper aimed to detect susceptibility genes for COPD with the data downloaded from dbGaP. A linear mixed model was employed to conduct association-mapping QTSs (quantitative trait single-nucleotide polymorphisms) because of its effectiveness in unbiased estimation of random effects with unbalanced data and in controlling population stratification. The primary focus of the study is to identify genetic risk factors that determine susceptibility for COPD and COPD-related phenotypes with the goal of providing insight into clinically relevant COPD subtypes. By comparing the conditional model excluding the cofactor smoking with the full model, we can detect related QTSs, which will reveal the gene expression on COPD caused or suppressed by smoking. As a result, there are significant genes with high heritability: TNS1 and DGKH were not caused by smoking, MACROD2 and CNIH3 were due to smoking, and LINC00426, METTL4 and GSDMC were suppressed by smoking.
|
|
Molecular characterization and agronomic trait analysis of rice Os09g24220 gene insertion mutants.
Yuan Bing, Cui Hairui, Fu Haowei, Jiang Meng, Li Ruiqing, Zhao Haijun, Shu Qingyao
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 456-462.
https://doi.org/10.3785/j.issn.1008-9209.2013.09.103
The mismatch repair (MMR) is a major pathway in DNA repair system,which is critical for maintaining genome stability and DNA replication fidelity,as it is responsible for the recognizing and repairing erroneous insertions, deletions and mismatch of bases newly arising during DNA replication and genetic recombination, as well as during the repair of some forms of DNA damage. The major components in the MMR system include MutS, MutH, and MutL in Escherichia coli. In eukaryotes, homologues of MutS and MutL have been found,but not of MutH. Based on homology, several MutS homologues have also been identified and cloned from Arabidopsis thaliana. Amongst these homlogues MSH2 complexes with MSH6, MSH3 or MSH7 form MutSα, MutSβ and MutSγ heterodimers to recognize different types of mutations, respectively. Mutation or disruption of MMR gene causes significantly increased frequencies of point mutations and microsatellite instability, thus to enhance genetic diversity for plant breeding. Rice is an important food crop and a prominent molecular model species for monocotyledonous plants. Some MMR genes have been annotated in Rice Annotation Project Database, one of which is Os09g24220, a homologue to Msh6 (At4g02070) in the MMR system of A. thaliana. However, no information is available for the Os09g24220 gene function and its mutator phenotype in rice. We reasoned that, if the disrupted Os09g24220 gene can enhance genetic diversity, offspring exhibiting mutations in agronomic traits could be generated using this approach. Herein, we mainly conducted molecular characterization of Tos17 insertion mutants of the Os09g24220 gene and analyzed their agronomic traits, to provide a foundation for the function studies of the Os09g24220 gene and exploitation of the mutant in rice mutation breeding. Three insertion mutants, NF9010, NF7784 and ND6011 with the Tos17 insertion at the 1st, 8th exons and 3′-UTR (untranslated region), respectively, were characterized by triple-primer PCR and RT-PCR. The Tos17 insertion in Os09g24220 gene was confirmed and all three insertion mutants were homozygous. In both NF9010 and NF7784 mutant seedlings, partial Os09g24220 mRNA transcripts were detected with the primer sets that amplified upstream region of the insertion site, but it was not detectable with the primer sets at the downstream region or across the insertion site, indicating that NF9010 and NF7784 lack full-length, functional Os09g24220 mRNA despite the existence of truncated transcripts. However, all Os09g24220 transcripts were detected with primer sets that amplified the upstream, internal and downstream fragments of whole functional region in ND6011 mutant, but no transcript was detected with the primer set across the insertion site, suggesting that ND6011 has the functional mRNA, but not full-length mRNA. Agronomic traits, such as plant height, number of productive panicles, panicle length, seed-setting rate and mass of 1 000 grains, of the insertion mutants were checked and compared with those of their wild type, Nipponbare. Results indicated that at least one trait for each mutant changed significantly. The mutant ND6011 just showed significantly (P<0.05) decreased seed-setting rate, but both NF7784 and NF9010 displayed significant (P<0.01) reduction in ear length and seed-setting rate, moreover plant height of NF9010 was also significantly lowered (P<0.05). In conclusion, some agronomic traits of Tos17 insertion mutants of the Os09g24220 gene changed significantly in rice plants, showing the mutator phenotype. Furthermore, different insertion mutants displayed different mutator phenotypes, which are related to effects on Os09g24220 gene function of different Tos17 insertion sites. These findings provide foundations for investigating the function of the Os09g24220 gene in DNA repair and exploiting such mutants in induced mutation breeding in rice.
|
|
Analysis of microsatellite and single nucleotide polymorphism within transcriptomic database in Cymbidium ensifolium
Li Xiaobai, Xiang Lin, Luo Jie, Qin Dehui, Sun Chongbo
Journal of Zhejiang University (Agriculture and Life Sciences), 2014, 40(4): 463-472.
https://doi.org/10.3785/j.issn.1008-9209.2013.08.064
Cymbidium ensifolium is one of Cymbidium genus, having elegant shape, beautiful appearance and fragrant aroma. Because of these features, this species gets with extremely high ornamental value. Owing to the lack of its genomic resource, the development and application of molecular marker is still limited. With the development of RNA-Seq technology, the transcriptomic data gradually accumulate and become a useful resource to explore marker with low cost and high efficiency. Here, the transcriptome in C. ensifoliumwas subjected to RNA-seq. Illumina sequencing was performed at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) according to the manufacturer‘s instructions (Illumina, San Diego, CA). Highquality reads were assembled de novo using Trinity with optimized Kmer length of 25. The program Msatcommander was used to analyze microsatellite (as called simple sequence repeat, SSR) frequencies. The minimum numbers of repeats for SSR detection were as follows: six repeats for di-SSRs; and four for tri-, tetra-, penta- and hexa-SSRs. Single nucleotide polymorphisms (SNPs) were detected and filtered using SAMtools and VarScan. The open reading frame (ORF) and untranslated region (UTR) within the isogene were identified using Trinity software. Isogenes containing SSR and SNP were annotated on the basis of BLAST similarity searches. All high-quality reads were assembled into 101 423isogenes, with total residues of 139 385 689.The isogenes averaged 1 374bp, ranging from 351 bp to 17 260 bp, and 70 583 isogenes, accounted for 69.60%, were about 600 bp. In total, 17 793SSRs and 16 676 SNPs were identified within transcriptomic database. The density of SSR and SNP was 1.28 SSRs/10 kb and 1.20 SSRs/10 kb, respectively. Among these SSRs, tri-SSR was the most types, followed by diSSR, except mono-SSR. Di-SSR and tri-SSR accounted for 20.46% and 21.98% in all SSRs, respectively. The location of SSR was also estimated. The estimated locations were obtained for 7 936 SSRs, but sequence information could not be determined for the remaining 6 586SSR regions as it extended over both estimated coding and non-coding regions. We found that most tri-SSRs and hexa-SSRs occurred more frequently in coding regions. In contrast, di-SSR, tetra-SSR, and penta-SSR, were more likely to appear in UTR rather than coding regions. Among these SNPs, C/T was the most common base substitution, followed by A/G. The two kinds of substitutions, C/T and A/G, accounted for 30.80% and 28.81% in all SNPs, respectively. The number of isogenes containing SSR and SNP was 13 768 and 7 519, respectively. These isogenes were annotated by Clusters of Orthologous Groups (COG), Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database, respectively. A large number of them were annotated with crucial genes that were associated with important biological functions. There were 1 748SSR and 1 932 SNP isogenes assigned into 23 COG classifications, respectively. There were 4 994 SSR and 4 819SNP isogenes classified into 80 and 78 GO terms, respectively. There were 2 107SSR and 2 188 SNP isogenes involved in 300 and 308 KEGG pathways, respectively. The numerous SSRs and SNPs identified in this study will contribute to marker development. The annotation of isogenes containing SSR and SNP will help in constructing genetic maps and exploring the associations between these markers and the interesting traits. The map will in turn accelerate research on genomics and functional genomics of C. ensifolium.
|
14 articles
|