The Annals of Applied Statistics

Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences

Zhou Fan and Lester Mackey

Full-text: Open access

Abstract

Copy number variations in cancer cells and volatility fluctuations in stock prices are commonly manifested as changepoints occurring at the same positions across related data sequences. We introduce a Bayesian modeling framework, BASIC, that employs a changepoint prior to capture the co-occurrence tendency in data of this type. We design efficient algorithms to sample from and maximize over the BASIC changepoint posterior and develop a Monte Carlo expectation-maximization procedure to select prior hyperparameters in an empirical Bayes fashion. We use the resulting BASIC framework to analyze DNA copy number variations in the NCI-60 cancer cell lines and to identify important events that affected the price volatility of S&P 500 stocks from 2000 to 2009.

Article information

Source
Ann. Appl. Stat. Volume 11, Number 4 (2017), 2200-2221.

Dates
Received: July 2016
Revised: April 2017
First available in Project Euclid: 28 December 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1514430283

Digital Object Identifier
doi:10.1214/17-AOAS1075

Keywords
Changepoint detection empirical Bayes Markov chain Monte Carlo copy number variation stock price volatility

Citation

Fan, Zhou; Mackey, Lester. Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences. Ann. Appl. Stat. 11 (2017), no. 4, 2200--2221. doi:10.1214/17-AOAS1075. https://projecteuclid.org/euclid.aoas/1514430283


Export citation

References

  • Adams, R. P. and MacKay, D. J. (2007). Bayesian online changepoint detection. Technical report. Available at arXiv:0710.3742 [stat.ML].
  • Akhoondi, S. et al. (2007). FBXW7/hCDC4 is a general tumor suppressor in human cancer. Cancer Res. 67 9006–9012.
  • Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 269–342.
  • Bardwell, L. and Fearnhead, P. (2017). Bayesian detection of abnormal segments in multiple time series. Bayesian Anal. 12 193–218.
  • Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Amer. Statist. Assoc. 88 309–319.
  • Basseville, M. and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs, NJ.
  • Chen, J. and Gupta, A. K. (2012). Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, 2nd ed. Birkhäuser/Springer, New York.
  • Chernoff, H. and Zacks, S. (1964). Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat. 35 999–1018.
  • Chib, S. (1998). Estimation and comparison of multiple change-point models. J. Econometrics 86 221–241.
  • Dang, C. V. (2012). MYC on the path to cancer. Cell 149 22–35.
  • Dobigeon, N., Tourneret, J.-Y. and Davy, M. (2007). Joint segmentation of piecewise constant autoregressive processes by using a hierarchical model and a Bayesian sampling approach. IEEE Trans. Signal Process. 55 1251–1263.
  • Fan, Z. and Mackey, L. (2017). Supplement to “Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences.” DOI:10.1214/17-AOAS1075SUPP.
  • Fan, Z., Dror, R. O., Mildorf, T. J., Piana, S. and Shaw, D. E. (2015). Identifying localized changes in large systems: Change-point detection for biomolecular simulations. Proc. Natl. Acad. Sci. USA 112 7454–7459.
  • Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16 203–213.
  • Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple changepoint problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 589–605.
  • Harlé, F., Chatelain, F., Gouy-Pailler, C. and Achard, S. (2016). Bayesian model for multiple change-points detection in multivariate time series. IEEE Trans. Signal Process. 64 4351–4362.
  • Healy, J. D. (1987). A note on multivariate CUSUM procedures. Technometrics 29 409–412.
  • Hsu, D.-A. (1977). Tests for variance shift at an unknown time point. J. R. Stat. Soc. Ser. C. Appl. Stat. 26 279–284.
  • Hughes, A. E. et al. (2006). A common CFH haplotype, with deletion of CFHR1 and CFHR3, is associated with lower risk of age-related macular degeneration. Nat. Genet. 38 1173–1177.
  • Jackson, B. et al. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett. 12 105–108.
  • Jeng, X. J., Cai, T. T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
  • Kamb, A. et al. (1994). A cell cycle regulator potentially involved in genesis of many tumor types. Science 264 436–439.
  • Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107 1590–1598.
  • Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763–3770.
  • Lindorff-Larsen, K., Piana, S., Dror, R. O. and Shaw, D. E. (2011). How fast-folding proteins fold. Science 334 517–520.
  • Long, J. et al. (2013). A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl. Cancer Inst. 105 573–579.
  • Louhimo, R., Lepikhova, T., Monni, O. and Hautaniemi, S. (2012). Comparative analysis of algorithms for integration of copy number and expression data. Nat. Methods 9 351–355.
  • Menges, C. W., Altomare, D. A. and Testa, J. R. (2009). FAS-associated factor 1 (FAF1): Diverse functions and implications for oncogenesis. Cell Cycle 8 2528–2534.
  • Nobori, T. (1994). Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple human cancers. Trends in Genetics 10 228.
  • Nowak, G., Hastie, T., Pollack, J. R. and Tibshirani, R. (2011). A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 12 776–791.
  • Olshen, A. B., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557–572.
  • Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B. and Robin, S. (2011). Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 12 413–428.
  • Pollack, J. R. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23 41–46.
  • Robbins, H. (1956). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. I 157–163. Univ. California Press, Berkeley.
  • Shah, S. P., Lam, W. L., Ng, R. T. and Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23 i450–i458.
  • Siegmund, D., Yakir, B. and Zhang, N. R. (2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5 645–668.
  • Srivastava, M. S. and Worsley, K. J. (1986). Likelihood ratio tests for a change in the multivariate normal mean. J. Amer. Statist. Assoc. 81 199–204.
  • Stephens, D. A. (1994). Bayesian retrospective multiple-changepoint identification. J. R. Stat. Soc. Ser. C. Appl. Stat. 43 159–178.
  • Tada, M. et al. (2010). Prognostic significance of genetic alterations detected by high-density single nucleotide polymorphism array in gastric cancer. Cancer Science 101 1261–1269.
  • Theurillat, J.-P. et al. (2011). URI is an oncogene amplified in ovarian cancer cells and is required for their survival. Cancer Cell 19 317–332.
  • Trautmann, K. et al. (2006). Chromosomal instability in microsatellite-unstable and stable colon cancer. Clin. Cancer Res. 12 6379–6385.
  • Varma, S., Pommier, Y., Sunshine, M., Weinstein, J. N. and Reinhold, W. C. (2014). High resolution copy number variation data in the NCI-60 cancer cell lines from whole genome microarrays accessible through CellMiner. PLoS ONE 9 e92047.
  • Wei, G. C. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc. 85 699–704.
  • Xuan, D. et al. (2013). APOBEC3 deletion polymorphism is associated with breast cancer risk among women of European ancestry. Carcinogenesis 34 2240–2243.
  • Yao, Y.-C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.
  • Zhang, N. R. and Siegmund, D. O. (2012). Model selection for high-dimensional, multi-sequence change-point problems. Statist. Sinica 22 1507–1538.
  • Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.
  • Zhou, X., Yang, C., Wan, X., Zhao, H. and Yu, W. (2013). Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM Trans. Comput. Biol. Bioinform. 10 230–235.

Supplemental materials

  • Supplementary Appendices. The Supplementary Appendices [Fan and Mackey (2017)] contain the following additional materials, as referenced in the main text: Description of common likelihood models and associated priors, details of inference procedures, comparison of MCMC sampler with naïve Gibbs sampling, and additional details of copy number analysis for the NCI-60 cell lines.