Brazilian Journal of Probability and Statistics

A semiparametric Bayesian model for comparing DNA copy numbers

Luis Nieto-Barajas, Yuan Ji, and Veerabhadran Baladandayuthapani

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis of multiple samples from patients with different subtypes of a disease. While the biological interest is to identify regions of differential copy numbers across disease subtypes, our model also includes sample-specific random effects that account for copy number alterations between different samples in the same disease subtype. We model the subtype and sample-specific effects using a random effects mixture model. The subtype’s main effects are characterized by a mixture distribution whose components are assigned Dirichlet process priors. The performance of the proposed model is examined using simulated data as well as a breast cancer genomic data set.

Article information

Source
Braz. J. Probab. Stat. Volume 30, Number 3 (2016), 345-365.

Dates
Received: January 2014
Accepted: February 2015
First available in Project Euclid: 29 July 2016

Permanent link to this document
http://projecteuclid.org/euclid.bjps/1469807216

Digital Object Identifier
doi:10.1214/15-BJPS283

Mathematical Reviews number (MathSciNet)
MR3531688

Zentralblatt MATH identifier
06633263

Keywords
Bayesian nonparametrics bivariate spike and slab prior circular binary segmentation comparative genomic hybridization Dirichlet process mixture model random effects

Citation

Nieto-Barajas, Luis; Ji, Yuan; Baladandayuthapani, Veerabhadran. A semiparametric Bayesian model for comparing DNA copy numbers. Braz. J. Probab. Stat. 30 (2016), no. 3, 345--365. doi:10.1214/15-BJPS283. http://projecteuclid.org/euclid.bjps/1469807216.


Export citation

References

  • Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2, 1152–1174.
  • Baladandayuthapani, V., Ji, Y., Talluri, R., Nieto-Barajas, L. E. and Morris, J. S. (2010). Bayesian random segmentation models to identify shared copy number aberrations for array CGH data. Journal of the American Statistical Association 105, 1358–1375.
  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. The Annals of Statistics 1, 353–355.
  • Bush, C. A. and MacEachern, S. N. (1996). A semiparametric Bayesian model for randomized block designs. Biometrika 83, 275–285.
  • Curtis, C., Shah, S. P., Chin, S. F., Turashvili, G., Rueda, O. M., Dunning, M. J., et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352.
  • Eilers, P. H. C. and de Menezes, R. X. (2005). Quantile smoothing of array CGH data. Bioinformatics 21, 1146–1153.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1, 209–230.
  • Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the analysis of the array CGH data. Journal of Multivariate Analysis 90, 132–153.
  • Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association 74, 153–160.
  • Guha, S., Li, Y. and Neuberg, D. (2008). Bayesian hidden Markov modeling of array CGH data. Journal of the American Statistical Association 103, 485–497.
  • Hodgson, G., Hager, J., Volik, S., Hariono, S., Wernick, M., Moore, D., et al. (2001). Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas. Nature Genetics 929, 459–464.
  • Huang, T., Wu, B., Lizardi, P. and Zhao, H. (2005). Detection of DNA copy number alterations using penalized least squares regression. Bioinformatics 21, 3811–3817.
  • MacEachern, S. N. and Müller, P. (1998). Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics 7, 223–239.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association 83, 1023–1032.
  • Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5, 155–176.
  • Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 4, 557–572.
  • Pinkel, D. and Albertson, D. G. (2005). Array comparative genomic hybridization and its applications in cancer. Nature Genetics 37, 11–17.
  • Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., et al. (1998). High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics 20, 207–211.
  • Pollack, J. R., Sorlie, T., Perou, C., Rees, C., Jeffrey, S., Lonning, P., et al. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proceedings of the National Academy of Sciences of the United States of America 99, 12963–12968.
  • Russnes, H. G., Vollan, H. K., Lingjaerde, O. C., Krasnitz, A., Lundin, P., Naume, B., et al. (2010). Genomic architecture characterizes tumor progression paths and fate in breast cancer patients. Science Translational Medicine 2, 1–13.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650.
  • Shah, S. P., Lam, W. L., Ng, R. T. and Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23, 450–458.
  • Snijders, A. M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., et al. (2001). Assembly of microarrays for genome-wide measurement of DNA copy number. Nature Genetics 29, 263–264.
  • Teo, S. M., Pawitan, Y., Kumar, V., Thalamuthu, A., Seielstad, M., Chia, K. S. and Salim, A. (2011). Multi-platform segmentation for joint detection of copy number variants. Bioinformatics 27, 1555–1561.
  • Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18–29.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions. The Annals of Statistics 22, 1701–1722.
  • Yau, C., Papaspiliopoulos, O., Roberts, G. and Holmes, C. (2011). Bayesian non-parametric hidden Markov models with applications in genomics. Journal of the Royal Statistical Society, Series B 73, 37–57.
  • Zhang, Z., Lange, K. and Sbatti, C. (2012). Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics 13, 205.