Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 10, Number 1 (2016), 918-959.
FDR-control in multiscale change-point segmentation
Housen Li, Axel Munk, and Hannes Sieling
Full-text: Open access
Abstract
Fast multiple change-point segmentation methods, which additionally provide faithful statistical statements on the number, locations and sizes of the segments, have recently received great attention. In this paper, we propose a multiscale segmentation method, FDRSeg, which controls the false discovery rate (FDR) in the sense that the number of false jumps is bounded linearly by the number of true jumps. In this way, it adapts the detection power to the number of true jumps. We prove a non-asymptotic upper bound for its FDR in a Gaussian setting, which allows to calibrate the only parameter of FDRSeg properly. Moreover, we show that FDRSeg estimates change-point locations, as well as the signal, in a uniform sense at optimal minimax convergence rates up to a log-factor. The latter is w.r.t. $L^{p}$-risk, $p\ge 1$, over classes of step functions with bounded jump sizes and either bounded, or even increasing, number of change-points. FDRSeg can be efficiently computed by an accelerated dynamic program; its computational complexity is shown to be linear in the number of observations when there are many change-points. The performance of the proposed method is examined by comparisons with some state of the art methods on both simulated and real datasets. An R-package is available online.
Article information
Source
Electron. J. Statist., Volume 10, Number 1 (2016), 918-959.
Dates
Received: October 2015
First available in Project Euclid: 8 April 2016
Permanent link to this document
https://projecteuclid.org/euclid.ejs/1460141648
Digital Object Identifier
doi:10.1214/16-EJS1131
Mathematical Reviews number (MathSciNet)
MR3486421
Zentralblatt MATH identifier
1338.62117
Subjects
Primary: 62G08: Nonparametric regression 62G10: Hypothesis testing 62G20: Asymptotic properties
Secondary: 90C39: Dynamic programming [See also 49L20]
Keywords
Multiscale inference change-point regression false discovery rate deviation bound dynamic programming minimax lower bound honest inference array CGH data ion channel recordings
Citation
Li, Housen; Munk, Axel; Sieling, Hannes. FDR-control in multiscale change-point segmentation. Electron. J. Statist. 10 (2016), no. 1, 918--959. doi:10.1214/16-EJS1131. https://projecteuclid.org/euclid.ejs/1460141648
References
- [1] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods., Bull. Math. Biol. 51 39–54.
- [2] Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes., Econometrica 66 47–78.
- [3] Bellman, R. (1957)., Dynamic Programming. Princeton University Press, Princeton, NJ, USA.Mathematical Reviews (MathSciNet): MR90477
- [4] Bellman, R. E. and Dreyfus, S. E. (1962)., Applied Dynamic Programming. Princeton, NJ: Princeton University Press.Mathematical Reviews (MathSciNet): MR140369
- [5] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing., J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.Mathematical Reviews (MathSciNet): MR1325392
- [6] Benjamini, Y. and Yekutieli, D. (2001). The Control of the False Discovery Rate in Multiple Testing under Dependency., Ann. Statist. 29 1165–88.Mathematical Reviews (MathSciNet): MR1869245
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998 - [7] Birgé, L. and Massart, P. (2006). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields 138 33–73.Mathematical Reviews (MathSciNet): MR2288064
Digital Object Identifier: doi:10.1007/s00440-006-0011-8 - [8] Blythe, D. A. J., von Bunau, P., Meinecke, F. C. and Muller, K. (2012). Feature Extraction for Change-Point Detection Using Stationary Subspace Analysis., IEEE Trans. Neur. Netwrks Learn. Syst. 23 631–643.
- [9] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators., Ann. Statist. 37 157–183.Mathematical Reviews (MathSciNet): MR2488348
Digital Object Identifier: doi:10.1214/07-AOS558
Project Euclid: euclid.aos/1232115931 - [10] Braun, J. V., Braun, R. K. and Mueller, H. G. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation., Biometrika 87 301–314.
- [11] Carlstein, E., Müller, H.-G. and Siegmund, D. (1994)., Change-point problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 23. Institute of Mathematical Statistics, Hayward, CA. Papers from the AMS-IMS-SIAM Summer Research Conference held at Mt. Holyoke College, South Hadley, MA, July 11–16, 1992.Mathematical Reviews (MathSciNet): MR1477909
- [12] Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio., Statist. Sinica 23 409–428.Mathematical Reviews (MathSciNet): MR3076173
- [13] Chen, Y., Shah, R. D. and Samworth, R. J. (2014). Discussion of “Multiscale change-point inference”., J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 544–546.
- [14] Cheng, D. and Schwartzman, A. (2015). Multiple testing of local extrema for detection of change points., arXiv:1504.06384.
- [15] Chung, S.-H., Andersen, O. S. and Krishnamurthy, V. (2007)., Biological Membrane Ion Channels. Springer.
- [16] Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Stein, C. (2009)., Introduction to Algorithms, third ed. MIT press.Mathematical Reviews (MathSciNet): MR2572804
- [17] Csörgö, M. and Horváth, L. (1997)., Limit Theorems in Change-point Analysis. John Wiley & Sons Ltd., Chichester.Mathematical Reviews (MathSciNet): MR2743035
- [18] Davies, L., Höhenrieder, C. and Krämer, W. (2012). Recursive computation of piecewise constant volatilities., Comput. Stat. Data Anal. 56 3623–3631.Mathematical Reviews (MathSciNet): MR2943916
- [19] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution., Ann. Statist. 29 1–65. With discussion and rejoinder by the authors.Mathematical Reviews (MathSciNet): MR1833958
Digital Object Identifier: doi:10.1214/aos/996986501
Project Euclid: euclid.aos/996986501 - [20] Dette, H., Munk, A. and Wagner, T. (1998). Estimating the variance in nonparametric regression—what is a reasonable choice?, J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 751–764.
- [21] Diskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H., Bucan, M., Maris, J. M. and Wang, K. (2008). Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms., Nucleic Acids Res. 36 e126.
- [22] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage., Biometrika 81 425–455.
- [23] Du, C., Kao, C.-L. M. and Kou, S. C. (2015). Stepwise Signal Extraction via Marginal Likelihood., J. Amer. Statist. Assoc. in press.Mathematical Reviews (MathSciNet): MR3494662
Digital Object Identifier: doi:10.1080/01621459.2015.1006365 - [24] Dümbgen, L. and Kovac, A. (2009). Extensions of smoothing via taut strings., Electron. J. Stat. 3 41–75.Mathematical Reviews (MathSciNet): MR2471586
Digital Object Identifier: doi:10.1214/08-EJS216
Project Euclid: euclid.ejs/1233176790 - [25] Dümbgen, L., Piterbarg, V. I. and Zholud, D. (2006). On the limit distribution of multiscale test statistics for nonparametric curve estimation., Math. Methods Statist. 15 20–25.Mathematical Reviews (MathSciNet): MR2225428
- [26] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses., Ann. Statist. 29 124–152.Mathematical Reviews (MathSciNet): MR1833961
Digital Object Identifier: doi:10.1214/aos/996986504
Project Euclid: euclid.aos/996986504 - [27] Dümbgen, L. and Walther, G. (2008). Multiscale inference about a density., Ann. Statist. 36 1758–1785.Mathematical Reviews (MathSciNet): MR2435455
Digital Object Identifier: doi:10.1214/07-AOS521
Project Euclid: euclid.aos/1216237299 - [28] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model., Statist. Sci., with discussion and rejoinder by the authors 23 1–22.Mathematical Reviews (MathSciNet): MR2431866
Digital Object Identifier: doi:10.1214/07-STS236
Project Euclid: euclid.ss/1215441276 - [29] Efron, B. and Zhang, N. R. (2011). False discovery rates and copy number variation., Biometrika 98 251–271.
- [30] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale Change-point Inference., J. R. Stat. Soc. Ser. B. Stat. Methodol., with discussion and rejoinder by the authors 76 495–580.
- [31] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Statist. 1 302–332.Mathematical Reviews (MathSciNet): MR2415737
Digital Object Identifier: doi:10.1214/07-AOAS131
Project Euclid: euclid.aoas/1196438020 - [32] Friedrich, F., Kempe, A., Liebscher, V. and Winkler, G. (2008). Complexity penalized M-estimation: fast computation., J. Comput. Graph. Statist. 17 201–224.Mathematical Reviews (MathSciNet): MR2424802
Digital Object Identifier: doi:10.1198/106186008X285591 - [33] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection., Ann. Statist. 42 2243–2281.Mathematical Reviews (MathSciNet): MR3269979
Digital Object Identifier: doi:10.1214/14-AOS1245
Project Euclid: euclid.aos/1413810727 - [34] Guo, W. and Sarkar, S. (2013). Adaptive Controls of FWER and FDR Under Block Dependence., Unpublished manuscript. https://web.njit.edu/~wguo/Guo%20&%20Sarkar%202012.pdf.
- [35] Hall, P., Kay, J. W. and Titterinton, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression., Biometrika 77 521–528.
- [36] Hao, N., Niu, Y. and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm., Statist. Sinica 23 1553–1572.Mathematical Reviews (MathSciNet): MR3222810
- [37] Harchaoui, Z. and Lévy-Leduc, C. (2008). Catching change-points with lasso., Adv. in Neur. Inform. Processing Syst. 20 161–168.
- [38] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty., J. Amer. Statist. Assoc. 105 1480–1493.Mathematical Reviews (MathSciNet): MR2796565
Digital Object Identifier: doi:10.1198/jasa.2010.tm09181 - [39] Hille, B. (2001)., Ion Channels of Excitable Membranes. Sinauer Sunderland, MA.
- [40] Hotz, T., Schütte, O. M., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C. and Munk, A. (2013). Idealizing ion channel recordings by jump segmentation and statistical multiresolution analysis., IEEE Trans. Nanobiosci. 12 376–386.
- [41] Inclán, C. and Tiao, G. C. (1994). Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance., J. Amer. Statist. Assoc. 89 913–923.Mathematical Reviews (MathSciNet): MR1294735
- [42] Jackson, B., Sargle, J. D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005). An algorithm for optimal partitioning of data on an interval., IEEE Signal Process. Lett. 12 105–108.
- [43] Jeng, X. J., Cai, T. T. and Li, H. (2010). Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis., J. Amer. Statist. Assoc. 105 1156–1166.Mathematical Reviews (MathSciNet): MR2752611
Digital Object Identifier: doi:10.1198/jasa.2010.tm10083 - [44] Kass, R. S. et al. (2005). The channelopathies: novel insights into molecular and genetic mechanisms of human disease., J. Clin. Invest. 115 1986–1989.
- [45] Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal Detection of Changepoints With a Linear Computational Cost., J. Amer. Statist. Assoc. 107 1590–1598.Mathematical Reviews (MathSciNet): MR3036418
Digital Object Identifier: doi:10.1080/01621459.2012.737745 - [46] Lavielle, M. and Teyssière, G. (2007). Adaptive detection of multiple change-points in asset price volatility. In, Long memory in economics 129–156. Springer, Berlin.Mathematical Reviews (MathSciNet): MR2265058
Digital Object Identifier: doi:10.1007/978-3-540-34625-8_5 - [47] Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations., Ann. Appl. Statist. 6 1306–1326.Mathematical Reviews (MathSciNet): MR3012531
Digital Object Identifier: doi:10.1214/12-AOAS539
Project Euclid: euclid.aoas/1346418584 - [48] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data., Biostatistics 5 557–572.
- [49] Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W.-L., Chen, C., Zhai, Y. et al. (1998). High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays., Nat. Genet. 20 207–211.
- [50] Rice, J. (1984). Bandwidth choice for nonparametric regression., Ann. Statist. 12 1215–1230.Mathematical Reviews (MathSciNet): MR760684
Digital Object Identifier: doi:10.1214/aos/1176346788
Project Euclid: euclid.aos/1176346788 - [51] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics., Scand. J. Stat. 40 752–769.
- [52] Rosenberg, A. and Hirschberg, J. (2007). V-measure: a conditional entropy-based external cluster evaluation measures., Proc. Conf. Empirical Methods Natural Lang. Process. June 410–420.
- [53] Scott, A. J. and Knott, M. (1974). A Cluster Analysis Method for Grouping Means in the Analysis of Variance., Biometrics 30 pp. 507–512.
- [54] Siegmund, D. (2013). Change-points: from sequential detection to biology and back., Sequent. Anal. 32 2–14.Mathematical Reviews (MathSciNet): MR3023983
- [55] Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics., Bernoulli 6 191–213.Mathematical Reviews (MathSciNet): MR1748719
Digital Object Identifier: doi:10.2307/3318574
Project Euclid: euclid.bj/1081788026 - [56] Siegmund, D. O., Zhang, N. R. and Yakir, B. (2011). False discovery rate for scanning statistics., Biometrika 98 979–985.
- [57] Snijders, A. M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A. K., Huey, B., Kimura, K. et al. (2001). Assembly of microarrays for genome-wide measurement of DNA copy number., Nat. Genet. 29 263–264.
- [58] Spokoiny, V. (2009). Multiscale local change point detection with applications to value-at-risk., Ann. Statist. 37 1405–1436.Mathematical Reviews (MathSciNet): MR2509078
Digital Object Identifier: doi:10.1214/08-AOS612
Project Euclid: euclid.aos/1239369026 - [59] Storath, M., Weinmann, A. and Demaret, L. (2014). Jump-sparse and sparse recovery using Potts functionals., IEEE Trans. Signal Process. 62 3654–3666.Mathematical Reviews (MathSciNet): MR3245573
Digital Object Identifier: doi:10.1109/TSP.2014.2329263 - [60] Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso., Biostatistics 9 18–29.
- [61] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.Mathematical Reviews (MathSciNet): MR2136641
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00490.x - [62] Tsybakov, A. (2009)., Introduction to Nonparametric Estimation. Springer-Verlag New York.Mathematical Reviews (MathSciNet): MR2724359
- [63] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics.Mathematical Reviews (MathSciNet): MR1385671
- [64] VanDongen, A. (1996). A new algorithm for idealizing single ion channel data containing multiple unknown conductance levels., Biophys. J. 70 1303.
- [65] Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data., Bioinformatics 23 657–663.
- [66] Vitale, R. A. (2000). Some comparisons for Gaussian processes., Proc. Amer. Math. Soc. 128 3043–3046.Mathematical Reviews (MathSciNet): MR1664383
Digital Object Identifier: doi:10.1090/S0002-9939-00-05367-3 - [67] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics., Ann. Statist. 38 1010–1033.Mathematical Reviews (MathSciNet): MR2604703
Digital Object Identifier: doi:10.1214/09-AOS732
Project Euclid: euclid.aos/1266586621 - [68] Wu, W. B. (2007). Strong invariance principles for dependent random variables., Ann. Probab. 35 2294–2320.Mathematical Reviews (MathSciNet): MR2353389
Digital Object Identifier: doi:10.1214/009117907000000060
Project Euclid: euclid.aop/1191860422 - [69] Wu, W. B. and Zhao, Z. (2007). Inference of trends in time series., J. R. Stat. Soc. Ser. B Stat. Methodol. 69 391–410.Mathematical Reviews (MathSciNet): MR2323759
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00594.x - [70] Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data., Biometrics 63 22–32.Mathematical Reviews (MathSciNet): MR2345571
Digital Object Identifier: doi:10.1111/j.1541-0420.2006.00662.x - [71] Zhang, N. R. and Siegmund, D. O. (2012). Model selection for high-dimensional, multi-sequence change-point problems., Statist. Sinica 22 1507–1538.Mathematical Reviews (MathSciNet): MR3027097
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Multiscale change-point segmentation: beyond step functions
Li, Housen, Guo, Qinghai, and Munk, Axel, Electronic Journal of Statistics, 2019 - FDR control with adaptive procedures and FDR
monotonicity
Zeisel, Amit, Zuk, Or, and Domany, Eytan, Annals of Applied Statistics, 2011 - Estimation of a function with discontinuities via local polynomial
fit with an adaptive window choice
Spokoiny, V. G., Annals of Statistics, 1998
- Multiscale change-point segmentation: beyond step functions
Li, Housen, Guo, Qinghai, and Munk, Axel, Electronic Journal of Statistics, 2019 - FDR control with adaptive procedures and FDR
monotonicity
Zeisel, Amit, Zuk, Or, and Domany, Eytan, Annals of Applied Statistics, 2011 - Estimation of a function with discontinuities via local polynomial
fit with an adaptive window choice
Spokoiny, V. G., Annals of Statistics, 1998 - A new multiple testing method in the dependent case
Cohen, Arthur, Sackrowitz, Harold B., and Xu, Minya, Annals of Statistics, 2009 - Multiple testing of local extrema for detection of change points
Cheng, Dan, He, Zhibing, and Schwartzman, Armin, Electronic Journal of Statistics, 2020 - Finite sample bounds for expected number of false rejections under martingale dependence with applications to FDR
Benditkis, Julia and Janssen, Arnold, Electronic Journal of Statistics, 2017 - Tail-greedy bottom-up data decompositions and fast multiple change-point detection
Fryzlewicz, Piotr, Annals of Statistics, 2018 - Online rules for control of false discovery rate and false discovery exceedance
Javanmard, Adel and Montanari, Andrea, Annals of Statistics, 2018 - Wild binary segmentation for multiple change-point detection
Fryzlewicz, Piotr, Annals of Statistics, 2014 - Some optimality properties of FDR controlling rules under sparsity
Frommlet, Florian and Bogdan, Małgorzata, Electronic Journal of Statistics, 2013
