Annals of Statistics

Segmentation and estimation of change-point models: False positive control and confidence regions

Xiao Fang, Jian Li, and David Siegmund

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

To segment a sequence of independent random variables at an unknown number of change-points, we introduce new procedures that are based on thresholding the likelihood ratio statistic, and give approximations for the probability of a false positive error when there are no change-points. We also study confidence regions based on the likelihood ratio statistic for the change-points and joint confidence regions for the change-points and the parameter values. Applications to segment array CGH data are discussed.

Article information

Source
Ann. Statist., Volume 48, Number 3 (2020), 1615-1647.

Dates
Received: March 2018
Revised: February 2019
First available in Project Euclid: 17 July 2020

Permanent link to this document
https://projecteuclid.org/euclid.aos/1594972832

Digital Object Identifier
doi:10.1214/19-AOS1861

Mathematical Reviews number (MathSciNet)
MR4124337

Zentralblatt MATH identifier
07241605

Subjects
Primary: 62G05: Estimation 62G15: Tolerance and confidence regions

Keywords
Array CGH analysis change-points confidence regions exponential families likelihood ratio statistics

Citation

Fang, Xiao; Li, Jian; Siegmund, David. Segmentation and estimation of change-point models: False positive control and confidence regions. Ann. Statist. 48 (2020), no. 3, 1615--1647. doi:10.1214/19-AOS1861. https://projecteuclid.org/euclid.aos/1594972832


Export citation

References

  • Aston, J. A. D. and Kirch, C. (2012). Evaluating stationarity via change-point alternatives with applications to fMRI data. Ann. Appl. Stat. 6 1906–1948.
  • Baranowski, R., Chen, Y. and Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 649–672.
  • Chan, H. P. and Chen, H. (2017). Multi-sequence segmentation via score and higher-criticism tests. Available at arXiv:1706.07586v1.
  • Churchill, G. A. (1989). Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51 79–94.
  • Du, C., Kao, C.-L. M. and Kou, S. C. (2016). Stepwise signal extraction via marginal likelihood. J. Amer. Statist. Assoc. 111 314–330.
  • Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124–152.
  • Elhaik, E., Graur, D. and Josic, K. (2010). Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol. Biol. Evol. 27 1015–1024.
  • Fang, X., Li, J. and Siegmund, D. (2020). Supplement to “Segmentation and estimation of change-point models: False positive control and confidence regions.” https://doi.org/10.1214/19-AOS1861SUPP.
  • Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change point inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 495–580.
  • Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist. 42 2243–2281.
  • Hao, N., Niu, Y. S. and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm. Statist. Sinica 23 1553–1572.
  • Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763–3770.
  • Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6 1306–1326.
  • Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557–572.
  • Picard, F., Robin, S., Lavielle, M., Vaisse, C. and Daudin, J. J. (2005). A statistical approach for array CGH data analysis. BMC Bioinform. 6 27.
  • Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23 41–46.
  • Pollack, J. R., Sørlie, T., Perou, C. M., Rees, C. A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R., Botstein, D., Børresen-Dale, A. L. et al. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99 12963–12968.
  • Robbins, M. W., Gallagher, C. M. and Lund, R. B. (2016). A general regression changepoint test for time series data. J. Amer. Statist. Assoc. 111 670–683.
  • Schwartzman, A., Gavrilov, Y. and Adler, R. J. (2011). Multiple testing of local maxima for detection of peaks in 1D. Ann. Statist. 39 3290–3319.
  • Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer Series in Statistics. Springer, New York.
  • Siegmund, D. (1988a). Approximate tail probabilities for the maxima of some random fields. Ann. Probab. 16 487–501.
  • Siegmund, D. (1988b). Confidence sets in change-point problems. Int. Stat. Rev. 56 31–48.
  • Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191–213.
  • Siegmund, D. and Yakir, B. (2007). The Statistics of Gene Mapping. Statistics for Biology and Health. Springer, New York.
  • Siegmund, D. O., Zhang, N. R. and Yakir, B. (2011). False discovery rate for scanning statistics. Biometrika 98 979–985.
  • Snijders, A. M., Fridlyand, J., Mans, D. A., Segraves, R., Jain, A. N., Pinkel, D. and Albertsonn, D. G. (2003). Shaping of tumor and drug-resistant genomes by instability and selection. Oncogene 22 4370–4379.
  • Tu, I. and Siegmund, D. (1999). The maximum of a function of a Markov chain and application to linkage analysis. Adv. in Appl. Probab. 31 510–531.
  • Vostrikova, L. (1981). Detecting ‘disorder’ in multidimensional random processes. Sov. Math., Dokl. 24 55–59.
  • Worsley, K. J. (1986). Confidence regions and test for a change-point in a sequence of exponential family random variables. Biometrika 73 91–104.
  • Yakir, B. (2013). Extremes in Random Fields: A Theory and Its Applications. Wiley Series in Probability and Statistics. Wiley, Chichester.
  • Zhang, Y. and Liu, J. S. (2011). Fast and accurate approximation to significance tests in genome-wide association studies. J. Amer. Statist. Assoc. 106 846–857.
  • Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 22–32.
  • Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. With supplementary data available online. Biometrika 97 631–645.
  • Zhang, N. R., Yakir, B., Xia, L. C. and Siegmund, D. (2016). Scan statistics on Poisson random fields with applications in genomics. Ann. Appl. Stat. 10 726–755.
  • Zhao, X., Li, C., Paez, J. G., Chin, K., Jänne, P. A., Chen, T.-H., Girard, L., Minna, J., Christiani, D. et al. (2004). An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 64 3060–3071.

Supplemental materials

  • Supplement to “Segmentation and estimation of change-point models: False positive control and confidence regions.”. This supplement contains proofs of Theorems 2.1 and 3.1.