The Annals of Applied Statistics

Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences

Zhou Fan and Lester Mackey

Full-text: Open access


Copy number variations in cancer cells and volatility fluctuations in stock prices are commonly manifested as changepoints occurring at the same positions across related data sequences. We introduce a Bayesian modeling framework, BASIC, that employs a changepoint prior to capture the co-occurrence tendency in data of this type. We design efficient algorithms to sample from and maximize over the BASIC changepoint posterior and develop a Monte Carlo expectation-maximization procedure to select prior hyperparameters in an empirical Bayes fashion. We use the resulting BASIC framework to analyze DNA copy number variations in the NCI-60 cancer cell lines and to identify important events that affected the price volatility of S&P 500 stocks from 2000 to 2009.

Article information

Ann. Appl. Stat. Volume 11, Number 4 (2017), 2200-2221.

Received: July 2016
Revised: April 2017
First available in Project Euclid: 28 December 2017

Permanent link to this document

Digital Object Identifier

Changepoint detection empirical Bayes Markov chain Monte Carlo copy number variation stock price volatility


Fan, Zhou; Mackey, Lester. Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences. Ann. Appl. Stat. 11 (2017), no. 4, 2200--2221. doi:10.1214/17-AOAS1075.

Export citation


  • Adams, R. P. and MacKay, D. J. (2007). Bayesian online changepoint detection. Technical report. Available at arXiv:0710.3742 [stat.ML].
  • Akhoondi, S. et al. (2007). FBXW7/hCDC4 is a general tumor suppressor in human cancer. Cancer Res. 67 9006–9012.
  • Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 269–342.
  • Bardwell, L. and Fearnhead, P. (2017). Bayesian detection of abnormal segments in multiple time series. Bayesian Anal. 12 193–218.
  • Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Amer. Statist. Assoc. 88 309–319.
  • Basseville, M. and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs, NJ.
  • Chen, J. and Gupta, A. K. (2012). Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, 2nd ed. Birkhäuser/Springer, New York.
  • Chernoff, H. and Zacks, S. (1964). Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat. 35 999–1018.
  • Chib, S. (1998). Estimation and comparison of multiple change-point models. J. Econometrics 86 221–241.
  • Dang, C. V. (2012). MYC on the path to cancer. Cell 149 22–35.
  • Dobigeon, N., Tourneret, J.-Y. and Davy, M. (2007). Joint segmentation of piecewise constant autoregressive processes by using a hierarchical model and a Bayesian sampling approach. IEEE Trans. Signal Process. 55 1251–1263.
  • Fan, Z. and Mackey, L. (2017). Supplement to “Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences.” DOI:10.1214/17-AOAS1075SUPP.
  • Fan, Z., Dror, R. O., Mildorf, T. J., Piana, S. and Shaw, D. E. (2015). Identifying localized changes in large systems: Change-point detection for biomolecular simulations. Proc. Natl. Acad. Sci. USA 112 7454–7459.
  • Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16 203–213.
  • Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple changepoint problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 589–605.
  • Harlé, F., Chatelain, F., Gouy-Pailler, C. and Achard, S. (2016). Bayesian model for multiple change-points detection in multivariate time series. IEEE Trans. Signal Process. 64 4351–4362.
  • Healy, J. D. (1987). A note on multivariate CUSUM procedures. Technometrics 29 409–412.
  • Hsu, D.-A. (1977). Tests for variance shift at an unknown time point. J. R. Stat. Soc. Ser. C. Appl. Stat. 26 279–284.
  • Hughes, A. E. et al. (2006). A common CFH haplotype, with deletion of CFHR1 and CFHR3, is associated with lower risk of age-related macular degeneration. Nat. Genet. 38 1173–1177.
  • Jackson, B. et al. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett. 12 105–108.
  • Jeng, X. J., Cai, T. T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
  • Kamb, A. et al. (1994). A cell cycle regulator potentially involved in genesis of many tumor types. Science 264 436–439.
  • Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107 1590–1598.
  • Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763–3770.
  • Lindorff-Larsen, K., Piana, S., Dror, R. O. and Shaw, D. E. (2011). How fast-folding proteins fold. Science 334 517–520.
  • Long, J. et al. (2013). A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl. Cancer Inst. 105 573–579.
  • Louhimo, R., Lepikhova, T., Monni, O. and Hautaniemi, S. (2012). Comparative analysis of algorithms for integration of copy number and expression data. Nat. Methods 9 351–355.
  • Menges, C. W., Altomare, D. A. and Testa, J. R. (2009). FAS-associated factor 1 (FAF1): Diverse functions and implications for oncogenesis. Cell Cycle 8 2528–2534.
  • Nobori, T. (1994). Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple human cancers. Trends in Genetics 10 228.
  • Nowak, G., Hastie, T., Pollack, J. R. and Tibshirani, R. (2011). A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 12 776–791.
  • Olshen, A. B., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557–572.
  • Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B. and Robin, S. (2011). Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 12 413–428.
  • Pollack, J. R. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23 41–46.
  • Robbins, H. (1956). An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. I 157–163. Univ. California Press, Berkeley.
  • Shah, S. P., Lam, W. L., Ng, R. T. and Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23 i450–i458.
  • Siegmund, D., Yakir, B. and Zhang, N. R. (2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5 645–668.
  • Srivastava, M. S. and Worsley, K. J. (1986). Likelihood ratio tests for a change in the multivariate normal mean. J. Amer. Statist. Assoc. 81 199–204.
  • Stephens, D. A. (1994). Bayesian retrospective multiple-changepoint identification. J. R. Stat. Soc. Ser. C. Appl. Stat. 43 159–178.
  • Tada, M. et al. (2010). Prognostic significance of genetic alterations detected by high-density single nucleotide polymorphism array in gastric cancer. Cancer Science 101 1261–1269.
  • Theurillat, J.-P. et al. (2011). URI is an oncogene amplified in ovarian cancer cells and is required for their survival. Cancer Cell 19 317–332.
  • Trautmann, K. et al. (2006). Chromosomal instability in microsatellite-unstable and stable colon cancer. Clin. Cancer Res. 12 6379–6385.
  • Varma, S., Pommier, Y., Sunshine, M., Weinstein, J. N. and Reinhold, W. C. (2014). High resolution copy number variation data in the NCI-60 cancer cell lines from whole genome microarrays accessible through CellMiner. PLoS ONE 9 e92047.
  • Wei, G. C. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc. 85 699–704.
  • Xuan, D. et al. (2013). APOBEC3 deletion polymorphism is associated with breast cancer risk among women of European ancestry. Carcinogenesis 34 2240–2243.
  • Yao, Y.-C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.
  • Zhang, N. R. and Siegmund, D. O. (2012). Model selection for high-dimensional, multi-sequence change-point problems. Statist. Sinica 22 1507–1538.
  • Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.
  • Zhou, X., Yang, C., Wan, X., Zhao, H. and Yu, W. (2013). Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM Trans. Comput. Biol. Bioinform. 10 230–235.

Supplemental materials

  • Supplementary Appendices. The Supplementary Appendices [Fan and Mackey (2017)] contain the following additional materials, as referenced in the main text: Description of common likelihood models and associated priors, details of inference procedures, comparison of MCMC sampler with naïve Gibbs sampling, and additional details of copy number analysis for the NCI-60 cell lines.