The Annals of Applied Statistics

Generalized genetic association study with samples of related individuals

Zeny Feng, William W. L. Wong, Xin Gao, and Flavio Schenkel

Full-text: Open access

Abstract

Genetic association study is an essential step to discover genetic factors that are associated with a complex trait of interest. In this paper we present a novel generalized quasi-likelihood score (GQLS) test that is suitable for a study with either a quantitative trait or a binary trait. We use a logistic regression model to link the phenotypic value of the trait to the distribution of allelic frequencies. In our model, the allele frequencies are treated as a response and the trait is treated as a covariate that allows us to leave the distribution of the trait values unspecified. Simulation studies indicate that our method is generally more powerful in comparison with the family-based association test (FBAT) and controls the type I error at the desired levels. We apply our method to analyze data on Holstein cattle for an estimated breeding value phenotype, and to analyze data from the Collaborative Study of the Genetics of Alcoholism for alcohol dependence. The results show a good portion of significant SNPs and regions consistent with previous reports in the literature, and also reveal new significant SNPs and regions that are associated with the complex trait of interest.

Article information

Source
Ann. Appl. Stat. Volume 5, Number 3 (2011), 2109-2130.

Dates
First available in Project Euclid: 13 October 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1318514297

Digital Object Identifier
doi:10.1214/11-AOAS465

Mathematical Reviews number (MathSciNet)
MR2884933

Zentralblatt MATH identifier
1228.62140

Keywords
Genetic association test kinship-inbreeding coefficient logistic regression quasi-likelihood

Citation

Feng, Zeny; Wong, William W. L.; Gao, Xin; Schenkel, Flavio. Generalized genetic association study with samples of related individuals. Ann. Appl. Stat. 5 (2011), no. 3, 2109--2130. doi:10.1214/11-AOAS465. https://projecteuclid.org/euclid.aoas/1318514297.


Export citation

References

  • Affymetrix Inc. (2005). Affymetrix MeAllele GeneChip Bovine 10K SNP array. Affymetrix Inc., South San Francisco, CA. Available at http://www.affymetrix.com/support/technical/datasheets/bovine10k_snp_datasheet.pdf. (Accessed on December, 2009.)
  • Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11 375–386.
  • Bailey-Wilson, J. E., Almasy, L., Andrade, M., Bailey, J., Bickeböller, H., Cordell, H. J., Daw, E. W., Goldin, L., Goode, E. L., Gray-McGuire, C., Hening, W., Jarvik, G., Maher, B. S., Mendell, N., Paterson, A. D., Rice, J., Satten, G., Suarez, B., Vieland, V., Wilcox, M., Zhang, H., Ziegler, A. and MacCluer, J. W. (2005). Genetic analysis workshop 14: Microsattellite and single-nucleotide polymorphism marker loci for genome-wide scans. BMC Genetics 6 (Suppl I) S1.
  • Bennewitz, J., Reinsch, N., Grohs, C., Levéziel, H., Malafosse, A., Thomsen, H., Xu, N., Looft, C., Kühn, C., Brockmann, G. A., Schwerin, M., Weimann, C., Hiendleder, S., Erhardt, G., Medjugorac, I., Russ, I., Förster, M., Brenig, B., Reinhardt, F., Reents, R., Averdunk, G., Blümel, J., Boichard, D. and Kalm, E. (2003). Combined analysis of data from two granddaughter designs: A simple strategy for QTL confirmation and increasing experimental power in dairy cattle. Genetics Selection Evolution 35 319–338.
  • Boichard, D., Grohs, C., Bourgeois, F., Cerqueira, F., Faugeras, R., Neau, A., Rupp, R., Amigues, Y., Boscher, M. Y. and Levéziel, H. (2003). Detection of genes influencing economic traits in three French dairy cattle breeds. Genetics Selection Evolution 35 77–101.
  • Bourgain, C. (2003). KinInbcoef: Calculation of kinship and inbreeding coefficients. Available at http://www.stat.uchicago.edu/~mcpeek/software/KinInbcoef/index.html. (Accessed on December, 2009.)
  • Bourgain, C., Hoffjan, S., Nicolae, R., Newman, D., Steiner, L., Walker, K., Reynolds, R., Ober, C. and McPeek, M. S. (2003). Novel case–control test in a founder population identifies P-seletin as an Antopy-susceptibility locus. American Journal of Human Genetics 73 612–626.
  • Conneally, P. M. (2003). 2002 ASHG presidential address: The complexity of complex diseases. American Journal of Human Genetics 72 228–232.
  • Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
  • Daetwyler, H. D., Schenkel, F. S., Sargolzaei, M. and Robinson, J. A. B. (2007). A genome scan to detect quantitative trait loci for economically important traits in Holstein cattle using two methods and a dense single nucleotide polymorphism map. Journal of Dairy Science 91 3225–3236.
  • Edenberg, H. J., Bierut, L. J., Boyce, P., Cao, M., Cawley, S., Chiles, R., Doheny, K. F., Hansen, M., Hinrichs, T., Jones, K., Kennedy, G. C., Liu, G., Marcus, G., McBride, C., Murray, S. S., Oliphant, O., Pettengill, J., Porjesc, B., Pugh, E. W., Rice, J. P., Rubano, T., Shannon, S., Steeke, R., Tischfield, J. A., Tsai, Y. Y., Zhang, C. and Begleiter, H. (2005). Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and single-nucleotide polymorphism genotyping for Genetic Analysis Workshop 14. BMC Genetics 6 (Suppl I) S2.
  • Epstein, M. P., Duren, W. L. and Boehnke, M. (2000). Improved inference of relationships for pairs of individuals. American Journal of Human Genetics 67 1219–1231.
  • Ewans, W. J. and Spielman, R. S. (2003). The transmission/disequilibrium test: History, subdivision, and admixture. American Journal of Human Genetics 57 455–464.
  • Feng, Z., Wong, W., Gao, X. and Schenkel, F. (2011). Supplement to “Generalized genetic association study with samples of related individuals.” DOI:10.1214/11-AOAS465SUPP.
  • Follmann, D., Proschan, M. and Leifer, E. (2003). Multiple outputation: Inference for complex clustered data by averaging analyses from independent data. Biometrics 59 420–429.
  • Göring, H. H. and Ott, J. (1997). Relationship estimation in affected sib pair analysis of late-onset diseases. European Journal of Human Geneics 5 69–77.
  • Grisart, B., Farnir, F., Karim, L., Cambisano, N., Kim, J., Kvasz, A., Mni, M., Simon, P., Frère, J. M., Coppieters, W. and Georges, M. (2004). Genetic and functional confirmation of the causality of the DGAT1 K232A quantitative trait nucleotide in affecting milk yield and composition. Proc. Natl. Acad. Sci. USA 101 2398–2403.
  • Heyde, C. C. (1997). Quasi-likelihood and Its Application: A General Approach to Optimal Parameter Estimation. Springer, New York.
  • Heyen, D. W., Weller, J. I., Ron, M., Band, M., Beever, J. E., Feldmesser, E., Da, Y., Wiggans, G. R., VanRaden, P. M. and Lewin, H. A. (1999). A genome scan for QTL influencing milk production and health traits in dairy cattle. Physiological Genomics 1 165–175.
  • Hill, S. Y., Shen, S., Zezza, N., Hoffman, E. K., Perlin, M. and Allan, W. (2004). A genome wide search for alcoholism susceptibility genes. American Journal of Medical Genetics Part B (Neuropsychiatric Genetics) 128B 102–113.
  • Horvath, S., Xu, X. and Laird, N. M. (2001). The family based association test method: Strategies for studying general genotype–phenotype associations. European Journal of Human Genetics 9 301–306.
  • Khoury, M. J. and Yang, Q. (1998). The future of genetic studies of complex human diseases: An epidemiologic perspective. Epidemiology 9 350–354.
  • Laird, N. M., Horvath, S. and Xu, X. (2000). Implementing a unified approach to family-based tests of association. Genetics Epidemiology 19 (Suppl 1) S36–S42.
  • Lander, E. S. and Schork, N. J. (1994). Genetic dissection of complex traits: Guideline for interpreting and reporting linkage results. Nature Genetics 11 2037–2048.
  • Lander, E. S. and Schork, N. J. (2006). Genetic dissection of complex traits. The Journal of Lifelong Learning in Psychiatry 4 442–458.
  • Ma, Q., Yu, Y., Meng, Y., Farrell, J., Farrer, L. A. and Wilcox, M. A. (2005). Genome-wide linkage analysis for a alcohol dependence: A comparison between single-nucleotide polymorphism and microsatellite marker assays. BMC Genetics 6 (Suppl 1) S8.
  • Martin, E. R., Bass, M. P. and Kaplan, N. L. (2001). Correcting for a potential bias in the pedigree disequilibrium test. American Journal of Human Genetics 68 1065–1067.
  • McPeek, M. S. and Sun, L. (2000). Statistical tests for detection of misspecified relationships by use of genome-screen data. American Journal of Human Genetics 66 1076–1094.
  • Prescott, C. A., Sullivan, P. F., Kuo, P. H., Webb, B. T., Vittum, J., Patterson, D. G., Thiselton, D. L., Myer, J. M., Devitt, M., Halberstadt, L. J., Robinson, V. P., Neale, M. C., van den Oord, E. J., Walsh, D., Riley, B. P. and Kendler, K. S. (2006). Genomewide linkage study in the Irish affected sib pair study of alcohol dependence: Evidence for a susceptibility region for symptoms of alcohol dependence on chromosome 4. Molecular Psychiatry 11 603–611.
  • R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
  • Risch, N. and Teng, J. (1998). The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Research 8 1273–1288.
  • Sargolzaei, M., Iwaisaki, H. and Colleau, J. J. (2006). CFC: A tool for monitoring genetic diversity. In 8th World Congress of Genetics Applied to Livestock Production, CD-ROM Communication 27–28. Belo Horizonte, Brazil, Aug. 13–18, 2006.
  • Slager, L. and Schaid, D. (2001). Evaluation of candidate genes in case–control studies: A statistical method to account for related subjects. American Journal of Human Genetics 68 1457–1462.
  • Thornton, T. and McPeek, M. S. (2007). Case–control association testing with related individuals: A more powerful quasi-likelihood score test. American Journal of Human Genetics 81 321–337.
  • Valdes, A. M., McWeeney, S. K. and Thomson, G. (1999). Evidence for linkage and association to alcohol dependence on chromosome 19. Genetics Epidemiology 17 (Suppl 1) S367–S372.
  • Viitala, S. (2008). Identification of genes controlling milk production in dairy cattle. Ph.D. thesis, MTT Agrifood Research Finland, Univ. Turku, Finland.
  • Viitala, S. M., Schulman, N. F., de Koning, D. J., Elo, K., Kinos, R., Virta, A., Virta, J., Mäki-Tanila, A. and Vilkki, J. H. (2003). Quantitative trait loci affecting milk production traits in Finnish Ayrshire dairy cattle. J. Dairy Sci. 86 1828–1836.
  • Wright, A. F., Carothers, A. D. and Pirastu, M. (1999). Population choices in mapping for complex diseases. Nature Genetics 23 387–404.
  • Zhu, X., Cooper, R., Kan, D., Cao, G. and Wu, X. (2005). A genome-wide linkage and association study using COGA data. BMC Genetics 6 (Suppl 1) S128.

Supplemental materials

  • Supplementary material: Mathematical justifications and additional results. The supplementary materials of the paper are organized as follows. Appendix A provides the theoretical justification of the variance–covariance matrix Σ_0. Appendix B derives the explicit form of the W_G statistic for a biallelic marker in a single pedigree study design. Appendix C derives the expression of the W_G statistic for a multi-allelic marker in a single pedigree study design. In Appendix D additional results of simulation studies and the results of COGA data analysis are summarized in tables.