摘要全基因组关联分析(genome-wide association study, GWAS)是研究复杂性状和疾病遗传变异的有效方法,其核心是研究分子变异和目标表型性状之间的关联。尤其是近几年来随着高通量测序和高分辨的代谢检测技术的不断发展,以及多种生物信息学技术和统计学方法发展,这些为复杂性状致因变异的精细定位提供基础。本综述将在全基因组关联分析的基础上,介绍GWAS在基因组、代谢组和转录组方面的应用及研究进展,包括基于单倍型的GWAS、代谢组的GWAS、基因表达的GWAS,并对GWAS未来的发展进行了展望。
Abstract:Genome-wide association study (GWAS) is an effective way to study the relationship between genetic variation and complex traits or diseases. The idea of GWAS is to find the correlation between molecular variation and target phenotype. With the development of the high-throughput, and the improvement of bioinformatics and statistical methods, it promotes researches on fine mapping of complex traits. On the basis of genome-wide association study, this review introduces the research progress of GWAS on genome, metabolomics as well as proteome, including haplotype-based GWAS, metabolic GWAS and gene expression-based GWAS. We finally make a prospect for GWAS study in the future.
卜李那, 赵毅强. 全基因组关联分析及其扩展方法的研究进展[J]. 农业生物技术学报, 2019, 27(1): 150-158.
BU Li-Na, ZHAO Yi-Qiang. Research Progress of Genome-wide Association Study and Its Extension Methods. 农业生物技术学报, 2019, 27(1): 150-158.
[1] 王继英, 王海霞, 迟瑞宾, 等. 2013. 全基因组关联分析在畜禽中的研究进展[J].中国农业科学, 46(4): 819-829. (Wang J Y, Wang H X, Chi R B,, et al.2013. Progresses in research of genome-wide association studies in livestock and poultry[J]. Scientia Agricultura Sinica, 46(4): 819-829.) [2] 严卫丽. 2008. 复杂疾病全基因组关联研究进展——遗传统计分析[J]. 遗传, 30(5): 543-549. (Yan W L.2008. Genome-wide association study on complex diseases: Genetic statistical issues[J]. Hereditas ,30(5): 543-549) [3] Akey J, Jin L, Xiong M.2001. Haplotypes vs single marker linkage disequilibrium tests: What do we gain?[J]. European Journal of Human Genetics, 9(4): 291-300. [4] Ayres D L, Darling A, Zwickl D J, et al.2012. BEAGLE: An application programming interface and high-performance computing library for statistical phylogenetics[J]. Systematic Biology, 61(1): 170-173. [5] Ballester M, Ramayocaldas Y, Revilla M, et al.2017. Integration of liver gene co-expression networks and eGWAs analyses highlighted candidate regulators implicated in lipid metabolism in pigs[J]. Scientific Reports, 7(46539): 46539. [6] Barrett J C, Fry B, Maller J, et al.2005. Haploview: Analysis and visualization of LD and haplotype maps[J]. Bioinformatics, 21(2): 263-5. [7] Brym P, Bojarojc-Nosowicz B, Olenski K, et al.2016. Genome-wide association study for host response to Bovine leukemia virus in Holstein cows[J]. Veterinary Immunology and Immunopathology, 175: 24-35. [8] Cardon L R, Palmer L J.2003. Population stratification and spurious allelic association[J]. The Lancet, 361(9357): 598-604. [9] Chen Z, Yao Y, Ma P, et al.2018. Haplotype-based genome-wide association study identifies loci and candidate genes for milk yield in Holsteins[J]. PLoS One, 13(2): e0192695. [10] Contreras-Soto R I, Mora F, Oliveira M A R D, et al.2017. A genome-wide association study for agronomic traits in soybean using SNP markers and SNP-based haplotype analysis[J]. PLoS One, 12(2): e1071105 [11] Delaneau O, Coulonges C, Zagury J F.2008. Shape-IT: New rapid and accurate algorithm for haplotype inference[J]. BMC Bioinformatics, 9(1): 540. [12] Devlin B, Roeder K, Wasserman L.2001. Genomic control, a new approach to genetic-based association studies[J].Theoretical Population Biology, 60(3): 155-166. [13] Dhana K, Braun K, Nano J, et al.2018. An epigenome-wide association study (EWAS) of obesity-related traits[J]. American Journal of Epidemiology, 187(8):1662-1669. [14] Dina C, Meyre D, Samson C, et al.2006. Comment on "A common genetic variant is associated with adult and childhood obesity"[J]. Science, 315(5809): 279-283. [15] Du X, Huang G, He S, et al.2018. Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits[J]. Nature Genetics, 50(6): 796-802. [16] Fan Q C, Wu P F, Dai G J, et al.2017. Identification of 19 loci for reproductive traits in a local Chinese chicken by genome-wide study[J]. Genetics and Molecular Research, 16(1):gmr16019431. [17] Foley A C, Mercola M.2005. Heart induction by Wnt antagonists depends on the homeodomain transcription factor Hex[J]. Genes & Development, 19(3): 387. [18] Frischknecht M, Bapst B, Seefried F R, et al.2017. Genome-wide association studies of fertility and calving traits in Brown Swiss cattle using imputed whole-genome sequences[J]. BMC Genomics, 18(1): 910. [19] Hansen M, Kraft T, Ganestam S, et al.2001. Linkage disequilibrium mapping of the bolting gene in sea beet using AFLP markers[J]. Genetical Research, 77(1): 61-66. [20] Hek K, Demirkan A, Lahti J, et al.2013. A genome-wide association study of depressive symptoms[J]. Biological Psychiatry, 73(7): 667. [21] Howard D M, Hall L S, Hafferty J D, et al.2017. Genome-wide haplotype-based association analysis of major depressive disorder in Generation Scotland and UK Biobank[J]. Translational Psychiatry, 7(11): 1263. [22] Juliana P, Singh R P, Singh P K, et al.2018. Genome-wide association mapping for resistance to leaf rust, stripe rust and tan spot in wheat reveals potential candidate genes[J]. Theoretical & Applied Genetics, 131(7):1405-1422. [23] Ju M, Zhou Z, Mu C, et al.2017. Dissecting the genetic architecture of Fusarium verticillioides seed rot resistance in maize by combining QTL mapping and genome-wide association analysis[J]. Scientific Reports, 7: 46446. [24] Kang H M, Zaitlen N A, Wade C M, et al.2008. Efficient control of population structure in model organism association mapping[J]. Genetics, 178(3): 1709. [25] Kang J H, Lee E A, Hong K C, et al.2018. Regulatory gene network from a genome-wide association study for sow lifetime productivity traits[J]. Animal Genetics, 49(3): 254-258. [26] Klein R J, Zeiss C, Chew E Y, et al.2005. Complement factor H polymorphism in age-related macular degeneration, 308(5720): 385-389. [27] Lewien M J, Murray T D, Jernigan K L, et al.2018. Genome-wide association mapping for eyespot disease in US Pacific Northwest winter wheat[J]. PLoS One, 13(4): e194698. [28] Li M, Liu X, Bradbury P, et al.2014. Enrichment of statistical power for genome-wide association studies[J]. BMC Biology, 12(1): 1-10. [29] Liu G, Wang T, Tian R, et al.2018. Alzheimer's disease risk variant rs2373115 regulates GAB2 and NARS2 expression in human brain tissues[J]. Journal of Molecular Neuroscience, 66(1): 37-43. [30] Liu J, He Z, Rasheed A, et al.2017. Genome-wide association mapping of black point reaction in common wheat (Triticum aestivum L.)[J]. BMC Plant Biology, 17(1): 220. [31] Lyu S, Arends D, Nassar M K, et al.2018. Reducing the interval of a growth QTL on chromosome 4 in laying hens[J]. Animal Genetics, 49(5): 467-471. [32] Ma X, Deng W, Liu X, et al.2011. A genome-wide association study for quantitative traits in schizophrenia in China[J]. Genes Brain & Behavior, 10(7): 734-739. [33] Melo T P, Fortes M, Bresolin T, et al.2018. Multi-trait meta-analysis identified genomic regions associated with sexual precocity in tropical beef cattle[J]. Journal of Animal Science, 96(10): 4087-4099. [34] Narayanan B, Soh P, Calhoun V D, et al.2015. Multivariate genetic determinants of EEG oscillations in schizophrenia and psychotic bipolar disorder from the BSNIP study[J]. Translational Psychiatry, 5(6): e588. [35] N'Diaye A, Haile J K, Cory A T, et al.2017. Single marker and haplotype-based association analysis of semolina and pasta colour in elite durum wheat breeding lines using a high-density consensus map[J]. PLoS One, 12(1): e0170941. [36] Ogawa N, Imai Y, Morita H, et al.2010. Genome-wide association study of coronary artery disease[J]. International Journal of Hypertension, 2010(3): 790539. [37] Patterson H D, Thompson R.1971. Recovery of inter-block information when block sizes are unequal[J]. Biometrika, 58(3): 545-554. [38] Pertille F, Moreira G C, Zanella R, et al.2017. Genome-wide association study for performance traits in chickens using genotype by sequencing approach[J]. Scientific Reports, 7: 41748. [39] Purcell S, Neale B, Todd-Brown K, et al.2007. PLINK: A Tool set for whole-genome association and population-based linkage analyses[J]. American Journal of Human Genetics, 81(3): 559-575. [40] Ripke S, Wray N R, Lewis C M, et al.2013. A mega-analysis of genome-wide association studies for major depressive disorder[J]. Molecular Psychiatry, 18(4): 497-511. [41] Risch N, Merikangas K.1996. The future of genetic studies of complex human diseases[J]. Science, 273(5281): 1516-1517. [42] Saxena R, Voight B F, Lyssenko V, et al.2007. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels[J]. Science, 316(5829): 1331-1336. [43] Schaid D J, Rowland C M, Tines D E, et al.2002. Score tests for association between traits and haplotypes when linkage phase is ambiguous[J]. American Journal of Human Genetics, 70(2): 425-434. [44] Scuteri A, Sanna S, Chen W M, et al.2007. Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits[J]. PLoS Genetics, 3(7): e115. [45] Shah S H, Freedman N J, Zhang L, et al.2009. Neuropeptide Y gene polymorphisms confer risk of early-onset Atherosclerosis[J]. PLoS Genetics, 5(1): e1000318. [46] Shi J, Lai J.2015. Patterns of genomic changes with crop domestication and breeding[J]. Current Opinion in Plant Biology, 24: 47-53. [47] Sladek R, Rocheleau G, Rung J, et al.2007. A genome-wide association study identifies novel risk loci for type 2 diabetes[J]. Nature, 445(7130): 881. [48] Sun S, Greenwood C M T, Neal R M.2007. Haplotype inference using a bayesian hidden markov model[J]. Genetic Epidemiology, 31(8): 937-948. [49] Tieman D, Zhu G, Jr. Resende M F, et al.2017. A chemical genetic roadmap to improved tomato flavor[J]. Science, 355(6323): 391-394. [50] Wang F, Meyer N J, Walley K R, et al.2016. Causal genetic inference using haplotypes as instrumental variables[J]. Genetic Epidemiology, 40(1): 35-44. [51] Wang Q, Feng T, Pan Y, et al.2014. A super powerful method for genome wide association study[J]. PLoS One, 9(9): e107684. [52] Wang Y T, Sung P Y, Lin P L, et al.2015. A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families[J]. BMC Genomics, 16(1):1-10. [53] Wen W, Li D, Li X, et al.2015. Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights[J]. Science Foundation in China, 5(2): 32. [54] Wu S, Tohge T, Cuadros-Inostroza A, et al.2018. Mapping the Arabidopsis metabolic landscape by untargeted metabolomics at different environmental conditions[J]. Molecular Plant, 11(1): 118-134. [55] Yu J, Pressoir G, Briggs W H, et al.2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness[J]. Nature Genetics, 38(2): 203-208. [56] Zhang J, Chen J H, Liu X D, et al.2016. Genomewide association studies for hematological traits and T lymphocyte subpopulations in a Duroc x Erhualian F resource population[J]. Journal of Animal Science, 94(12): 5028-5041. [57] Zhang Z, Ersoz E, Cq L, et al.2010. Mixed linear model approach adapted for genome-wide association studies[J]. Nature Genetics, 42(4): 355-360. [58] Zhou X, Stephens M.2012. Genome-wide efficient mixed model analysis for association studies[J]. Nature Genetics, 44(7): 821-824. [59] Zhu B, Niu H, Zhang W, et al.2017. Genome wide association study and genomic prediction for fatty acid composition in Chinese Simmental beef cattle using high density SNP array[J]. BMC Genomics, 18(1): 464. [60] Zhu X M, Shao X Y, Pei Y H, et al.2018. Genetic diversity and genome-wide association study of major ear quantitative traits using high-density SNPs in maize[J]. Frontiers in Plant Science, 9: 966. [61] Zhu X, Zhang S, Zhao H, et al.2002. Association mapping, using a mixture model for complex traits[J]. Genetic Epidemiology, 23(2):181-196.