Statistical Science

Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

Sebastian Zöllner and Tanya M. Teslovich

Full-text: Open access


Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms.

Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.

Article information

Statist. Sci., Volume 24, Number 4 (2009), 530-546.

First available in Project Euclid: 20 April 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Copy number variation genome-wide association study SNP hidden Markov model linkage disequilibrium


Zöllner, Sebastian; Teslovich, Tanya M. Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases. Statist. Sci. 24 (2009), no. 4, 530--546. doi:10.1214/09-STS304.

Export citation


  • [1] Arlt, M. F., Mulle, J. G., Schaibley, V. M., Ragland, R. L., Durkin, S. G. et al. (2009). Replication stress induces genome-wide copy number changes in human cells that resemble polymorphic and pathogenic variants. Am. J. Hum. Genet. 84 339–350.
  • [2] Barnes, C., Plagnol, V., Fitzgerald, T., Redon, R., Marchini, J. et al. (2008). A robust statistical method for case-control association testing with copy number variation. Nat. Genet. 40 1245–1252.
  • [3] Barrett, M. T., Scheffer, A., Ben-Dor, A., Sampas, N., Lipson, D. et al. (2004). Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc. Natl. Acad. Sci. USA 101 17765–17770.
  • [4] Colella, S., Yau, C., Taylor, J. M., Mirza, G., Butler, H. et al. (2007). QuantiSNP: An Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35 2013–2025.
  • [5] Conrad, D. F., Andrews, T. D., Carter, N. P., Hurles, M. E. and Pritchard, J. K. (2006). A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38 75–81.
  • [6] Cooper, G. M., Zerr, T., Kidd, J. M., Eichler, E. E. and Nickerson, D. A. (2008). Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat. Genet. 40 1199–1203.
  • [7] Copy Number Variation Project Data Index.
  • [8] Database of Genomic Variants.
  • [9] Eads, B., Cash, A., Bogart, K., Costello, J. and Andrews, J. (2006). Troubleshooting microarray hybridizations. Methods Enzymol. 411 34–49.
  • [10] Emerson, J. J., Cardoso-Moreira, M., Borevitz, J. O. and Long, M. (2008). Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science 320 1629–1631.
  • [11] Fellermann, K., Stange, D. E., Schaeffeler, E., Schmalzl, H., Wehkamp, J. et al. (2006). A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79 439–448.
  • [12] Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal. 90 132–153.
  • [13] Gonzalez, E., Kulkarni, H., Bolivar, H., Mangano, A., Sanchez, R. et al. (2005). The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307 1434–1440.
  • [14] Henrichsen, C. N., Vinckenbosch, N., Zollner, S., Chaignat, E., Pradervand, S. et al. (2009). Segmental copy number variation shapes tissue transcriptomes. Nat. Genet. 41 424–429.
  • [15] Hinds, D. A., Kloek, A. P., Jen, M., Chen, X. and Frazer, K. A. (2006). Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38 82–85.
  • [16] Hollox, E. J., Huffmeier, U., Zeeuwen, P. L., Palla, R., Lascorz, J. et al. (2008). Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40 23–25.
  • [17] Horvath, S., Xu, X. and Laird, N. M. (2001). The family based association test method: Strategies for studying general genotype–phenotype associations. Eur. J. Hum. Genet. 9 301–306.
  • [18] Human Genome Structural Variation Project.
  • [19] Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L., Donahoe, P. K. et al. (2004). Detection of large-scale variation in the human genome. Nat. Genet. 36 949–951.
  • [20] Itsara, A., Cooper, G. M., Baker, C., Girirajan, S., Li, J. et al. (2009). Population analysis of large copy number variants and hotspots of human genetic disease. Am. J. Hum. Genet. 84 148–161.
  • [21] Jakobsson, M., Scholz, S. W., Scheet, P., Gibbs, J. R., VanLiere, J. M. et al. (2008). Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451 998–1003.
  • [22] Kohler, J. R. and Cutler, D. J. (2007). Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies. Am. J. Hum. Genet. 81 684–699.
  • [23] Korn, J. M., Kuruvilla, F. G., McCarroll, S. A., Wysoker, A., Nemesh, J. et al. (2008). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40 1253–1260.
  • [24] Kidd, J. M., Cooper, G. M., Donahue, W. F., Hayden, H. S., Sampas, N. et al. (2008). Mapping and sequencing of structural variation from eight human genomes. Nature 453 56–64.
  • [25] Komura, D., Shen, F., Ishikawa, S., Fitch, K. R., Chen, W. et al. (2006). Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16 1575–1584.
  • [26] Levy, S., Sutton, G., Ng, P. C., Feuk, L., Halpern, A. L. et al. (2007). The diploid genome sequence of an individual human. PLoS Biol. 5 e254.
  • [27] Li, Y. and Abecasis, G. R. (2006). Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am. J. Hum. Genet. S79 2290.
  • [28] Locke, D. P., Sharp, A. J., McCarroll, S. A., McGrath, S. D., Newman, T. L. et al. (2006). Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79 275–290.
  • [29] Lucito, R., Healy, J., Alexander, J., Reiner, A., Esposito, D. et al. (2003). Representational oligonucleotide microarray analysis: A high-resolution method to detect genome copy number variation. Genome Res. 13 2291–2305.
  • [30] Maher, B. (2008). Personal genomes: The case of the missing heritability. Nature 456 18–21.
  • [31] Marioni, J. C., Thorne, N. P., Valsesia, A., Fitzgerald, T., Redon, R. et al. (2007). Breaking the waves: Improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 8 R228.
  • [32] McCarroll, S. A. (2008). Extending genome-wide association studies to copy-number variation. Hum. Mol. Genet. 17 R135–R142.
  • [33] McCarroll, S. A., Hadnott, T. N., Perry, G. H., Sabeti, P. C., Zody, M. C. et al. (2006). Common deletion polymorphisms in the human genome. Nat. Genet. 38 86–92.
  • [34] McCarroll, S. A., Huett, A., Kuballa, P., Chilewski, S. D., Landry, A. et al. (2008). Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat. Genet. 40 1107–1112.
  • [35] McCarroll, S. A., Kuruvilla, F. G., Korn, J. M., Cawley, S., Nemesh, J. et al. (2008). Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40 1166–1174.
  • [36] Nachman, M. W. and Crowell, S. L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics 156 297–304.
  • [37] Nguyen, D. V., Arpat, A. B., Wang, N., Carroll, R. J. (2002). DNA microarray experiments: Biological and technological aspects. Biometrics 58 701–717.
  • [38] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557–572.
  • [39] Price, T. S., Regan, R., Mott, R., Hedman, A., Honey, B. et al. (2005). SW-ARRAY: A dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucleic Acids Res. 33 3455–3464.
  • [40] Pritchard, J. K. and Przeworski, M. (2001). Linkage disequilibrium in humans: Models and data. Am. J. Hum. Genet. 69 1–14.
  • [41] Pugh, T. J., Delaney, A. D., Farnoud, N., Flibotte, S., Griffith, M. et al. (2008). Impact of whole genome amplification on analysis of copy number variants. Nucleic Acids Res. 36 e80.
  • [42] Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H. et al. (2006). Global variation in copy number in the human genome. Nature 444 444–454.
  • [43] Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J. et al. (2004). Large-scale copy number polymorphism in the human genome. Science 305 525–528.
  • [44] Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C. et al. (2007). Strong association of de novo copy number mutations with autism. Science 316 445–449.
  • [45] Sharp, A. J., Cheng, Z. and Eichler, E. E. (2006). Structural variation of the human genome. Annu. Rev. Genomics. Hum. Genet. 7 407–442.
  • [46] Sharp, A. J., Locke, D. P., McGrath, S. D., Cheng, Z., Bailey, J. A. et al. (2005). Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77 78–88.
  • [47] Sharp, A. J., Hansen, S., Selzer, R. R., Cheng, Z., Regan, R. et al. (2006). Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38 1038–1042.
  • [48] She, X., Cheng, Z., Zollner, S., Church, D. M. and Eichler, E. E. (2008). Mouse segmental duplication and copy number variation. Nat. Genet. 40 909–914.
  • [49] Snijders, A. M., Nowak, N., Segraves, R., Blackwood, S., Brown, N. et al. (2001). Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29 263–264.
  • [50] Spielman, R. S., McGinnis, R. E. and Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52 506–516.
  • [51] Stefansson, H., Rujescu, D., Cichon, S., Pietilainen, O. P., Ingason, A. et al. (2008). Large recurrent microdeletions associated with schizophrenia. Nature 455 232–236.
  • [52] Stranger, B. E., Forrest, M. S., Dunning, M., Ingle, C. E., Beazley, C. et al. (2007). Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315 848–853.
  • [53] Turner, D. J., Miretti, M., Rajan, D., Fiegler, H., Carter, N. P. et al. (2008). Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat. Genet. 40 90–95.
  • [54] Tuzun, E., Sharp, A. J., Bailey, J. A., Kaul, R., Morrison, V. A. et al. (2005). Fine-scale structural variation of the human genome. Nat. Genet. 37 727–732.
  • [55] Walsh, T., McClellan, J. M., McCarthy, S. E., Addington, A. M., Pierce, S. B. et al. (2008). Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320 539–543.
  • [56] Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J. et al. (2007). PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17 1665–1674.
  • [57] Wang, K., Chen, Z., Tadesse, M. G., Glessner, J., Grant, S. F. et al. (2008). Modeling genetic inheritance of copy number variations. Nucleic Acids Res. 36 e138.
  • [58] Weiss, L. A., Shen, Y., Korn, J. M., Arking, D. E., Miller, D. T. et al. (2008). Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358 667–675.
  • [59] Willer, C. J., Speliotes, E. K., Loos, R. J., Li, S., Lindgren, C. M. et al. (2009). Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41 25–34.
  • [60] Zhang, D., Cheng, L., Qian, Y., Alliey-Rodriguez, N., Kelsoe, J. R. et al. (2009). Singleton deletions throughout the genome increase risk of bipolar disorder. Mol. Psychiatry 14 376–380.
  • [61] Zöllner, S. and von Haeseler, A. (2000). A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms. Am. J. Hum. Genet. 66 615–628.
  • [62] Zöllner, S., Su, G., Stewart, W. C., Chen, Y., McInnis, M. G. et al. (2009). Bayesian EM algorithm for scoring polymorphic deletions from SNP data and application to a common CNV on 8q24. Genet. Epidemiol. 33 357–368.