Annals of Applied Statistics

Identifying heterogeneous transgenerational DNA methylation sites via clustering in beta regression

Shengtong Han, Hongmei Zhang, Gabrielle A. Lockett, Nandini Mukherjee, John W. Holloway, and Wilfried Karmaus

Full-text: Open access


This paper explores the transgenerational DNA methylation pattern (DNA methylation transmitted from one generation to the next) via a clustering approach. Beta regression is employed to model the transmission pattern from parents to their offsprings at the population level. To facilitate this goal, an expectation maximization algorithm for parameter estimation along with a BIC criterion to determine the number of clusters is proposed. Applying our method to the DNA methylation data composed of 4063 CpG sites of 41 mother–father-infant triads, we identified a set of CpG sites in which DNA methylation transmission is dominated by fathers, while at a large number of CpG sites, DNA methylation is mainly maternally transmitted to the offspring.

Article information

Ann. Appl. Stat., Volume 9, Number 4 (2015), 2052-2072.

Received: October 2014
Revised: August 2015
First available in Project Euclid: 28 January 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

DNA Methylation transmission EM clustering Beta regression


Han, Shengtong; Zhang, Hongmei; Lockett, Gabrielle A.; Mukherjee, Nandini; Holloway, John W.; Karmaus, Wilfried. Identifying heterogeneous transgenerational DNA methylation sites via clustering in beta regression. Ann. Appl. Stat. 9 (2015), no. 4, 2052--2072. doi:10.1214/15-AOAS865.

Export citation


  • Arshad, S. H. and Hide, D. W. (1992). Effect of environmental factors on the development of allergic disorders in infancy. J. Allergy Clin. Immunol. 90 235–241.
  • Arshad, S. H., Karmaus, W., Raza, A., Kurukulaaratchy, R. J., Matthews, S. M., Holloway, J. W., Sadeghnejad, A., Zhang, H., Roberts, G. and Ewart, S. L. (2012). The effect of parental allergy on childhood allergic diseases depends on the sex of the child. J. Allergy Clin. Immunol. 130 427–434.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • Bielawski, D. M., Zaher, F. M., Svinarich, D. M. and Abel, E. L. (2002). Paternal alcohol exposure affects sperm cytosine methyltransferase messenger RNA levels. Alcohol. Clin. Exp. Res. 26 347–351.
  • Cicero, T. J., Adams, M. L., Giordano, A., Miller, B. T., O’connor, L. and Nock, B. (1991). Influence of morphine exposure during adolescence on the sexual maturation of male rats and the development of their offspring. J. Pharmacol. Exp. Ther. 256 1086–1093.
  • Ferrari, S. L. P. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. J. Appl. Stat. 31 799–815.
  • Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. J. R. Stat. Soc. Ser. C. Appl. Stat. 28 100–108.
  • He, F., Lidow, I. A. and Lidow, M. S. (2006). Consequences of paternal cocaine exposure in mice. Neurotoxicol. Teratol. 28 198–209.
  • Houseman, E. A., Christensen, B., Yeh, R.-F., Marsit, C., Karagas, M., Wrensch, M., Nelson, H., Wiemels, J., Zheng, S., Wiencke, J. and Kelsey, K. (2008). Model-based clustering of DNA methylation array data: A recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9 365.
  • Johnson, W. E., Li, C. and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 118–127.
  • Kaati, G., Bygren, L. O., Pembrey, M. and Sjöström, M. (2007). Transgenerational response to nutrition, early life circumstances and longevity. Eur. J. Hum. Genet. 15 784–790.
  • Ledig, M., Misslin, R., Vogel, E., Holownia, A., Copin, J. C. and Tholey, G. (1998). Paternal alcohol exposure: Developmental and behavioral effects on the offspring of rats. Neuropharmacology 37 57–66.
  • Lockett, G. A. and Holloway, J. W. (2013). Genome-wide association studies in asthma; perhaps, the end of the beginning. Curr. Opin. Allergy Clin. Immunol. 13 463–469.
  • Lockett, G. A., Patil, V. K., Soto-Ramirez, N., Ziyab, A. H., Holloway, J. W. and Karmaus, W. (2013). Epigenomics and allergic disease. Epigenomics 5 685–699.
  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 281–297. Univ. California Press, Berkeley, CA.
  • Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., Cho, J. H., Guttmacher, A. E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C. N., Slatkin, M., Valle, D., Whittemore, A. S., Boehnke, M., Clark, A. G., Eichler, E. E., Gibson, G., Haines, J. L., Mackay, T. F. C., McCarroll, S. A. and Visscher, P. M. (2009). Finding the missing heritability of complex diseases. Nature 461 747–753.
  • Nestor, C. E., Barrenäs, F., Wang, H., Lentini, A., Zhang, H., Bruhn, S., Jörnsten, R., Langston, M. A., Rogers, G., Gustafsson, M. and Benson, M. (2014). DNA methylation changes separate allergic patients from healthy controls and may reflect altered CD4$+$ T-cell population structure. PLOS Genetics 10 e1004059.
  • Ouko, L. A., Shantikumar, K., Knezovich, J., Haycock, P., Schnugh, D. J. and Ramsay, M. (2009). Effect of alcohol consumption on CpG methylation in the differentially methylated regions of H19 and IG-DMR in male gametes-implications for fetal alcohol spectrum disorders. Alcohol. Clin. Exp. Res. 33 1615–1627.
  • Padmanabhan, N., Jia, D., Geary-Joo, C., Wu, X., Ferguson-Smith, A. C., Fung, E., Bieda, M. C., Snyder, F. F., Gravel, R. A., Cross, J. C. and Watsonemail, E. D. (2013). Mutation in folate metabolism causes epigenetic instability and transgenerational effects on development. Cell 155 81–93.
  • Park, H. S. and Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36 3336–3341.
  • Pembrey, M. E., Bygren, L. O., Kaati, G., Edvinsson, S., Northstone, K., Sjöström, M., Golding, J. and Team, T. A. S. (2006). Sex-specific, male-line transgenerational responses in humans. Eur. J. Hum. Genet. 14 159–166.
  • Qin, L.-X. and Self, S. G. (2006). The clustering of regression models method with applications in gene expression data. Biometrics 62 526–533.
  • Rakyan, V. K., Chong, S., Champ, M. E., Cuthbert, P. C., Morgan, H. D., Luu, K. V. K. and Whitelaw, E. (2003). Transgenerational inheritance of epigenetic states at the murine AxinFu allele occurs after maternal and paternal transmission. Proc. Natl. Acad. Sci. USA 100 2538–2543.
  • Romieu, I., Torrent, M., Garcia-Esteban, R., Ferrer, C., Ribas-Fitó, N., Antó, J. M. and Sunyer, J. (2007). Maternal fish intake during pregnancy and atopy and asthma in infancy. Clilincal and Experimental Allergy 37 518–525.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Soto-Ramirez, N., Arshad, S. H., Holloway, J. W., Zhang, H., Schauberger, E., Ewart, S., Patil, V. and Karmaus, W. (2013). The interaction of genetic variants and DNA methylation of the interleukin-4 receptor gene increase the risk of asthma at age 18 years. Clinical Epigenetics 5 1–8.
  • Szyf, M. (2009). Epigenetics, DNA methylation, and chromatin modifying drugs. Annu. Rev. Pharmacol. Toxicol. 49 243–264.
  • Wang, D., Yan, L., Hu, Q., Sucheston, L. E., Higgins, M. J., Ambrosone, C. B., Johnson, C. S., Smiraglia, D. J. and Liu, S. (2012). IMA: An R package for high-throughput analysis of illumina’s 450K infinium methylation data. Bioinformatics 28 729–730.
  • Yousefi, M., Karmaus, W., Zhang, H., Ewart, S., Arshad, H. and Holloway, J. W. (2013). The methylation of the LEPR/LEPROT genotype at the promoter and body regions influence concentrations of leptin in girls and BMI at age 18 years if their mother smoked during pregnancy. International Journal of Molecular Epidemiology and Genetics 4 86–100.
  • Zhang, H., Tong, X., Holloway, J. W., Rezwan, F. I., Patil, V., Ray, M., Everson, T. M., Soto-Ramírez, N., Arshad, S. H. et al. (2014). The interplay of DNA methylation over time with Th2 pathway genetic variants on asthma risk and temporal asthma transition. Clinical Epigenetics 6 8.
  • Ziyab, A. H., Karmaus, W., Holloway, J. W., Zhang, H., Ewart, S. and Arshad, S. H. (2012). DNA methylation of the filaggrin gene adds to the risk of eczema associated with loss-of-function variants. J. Eur. Acad. Dermatol. Venereol. 27 e420–e423.