The Annals of Statistics
- Ann. Statist.
- Volume 46, Number 6B (2018), 3217-3245.
Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs
Zhou Fan and Leying Guan
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Abstract
We study recovery of piecewise-constant signals on graphs by the estimator minimizing an $l_{0}$-edge-penalized objective. Although exact minimization of this objective may be computationally intractable, we show that the same statistical risk guarantees are achieved by the $\alpha$-expansion algorithm which computes an approximate minimizer in polynomial time. We establish that for graphs with small average vertex degree, these guarantees are minimax rate-optimal over classes of edge-sparse signals. For spatially inhomogeneous graphs, we propose minimization of an edge-weighted objective where each edge is weighted by its effective resistance or another measure of its contribution to the graph’s connectivity. We establish minimax optimality of the resulting estimators over corresponding edge-weighted sparsity classes. We show theoretically that these risk guarantees are not always achieved by the estimator minimizing the $l_{1}$/total-variation relaxation, and empirically that the $l_{0}$-based estimates are more accurate in high signal-to-noise settings.
Article information
Source
Ann. Statist., Volume 46, Number 6B (2018), 3217-3245.
Dates
Received: April 2017
Revised: September 2017
First available in Project Euclid: 11 September 2018
Permanent link to this document
https://projecteuclid.org/euclid.aos/1536631272
Digital Object Identifier
doi:10.1214/17-AOS1656
Mathematical Reviews number (MathSciNet)
MR3852650
Zentralblatt MATH identifier
06965686
Subjects
Primary: 62G05: Estimation
Keywords
Approximation algorithm graph cut effective resistance total-variation denoising
Citation
Fan, Zhou; Guan, Leying. Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs. Ann. Statist. 46 (2018), no. 6B, 3217--3245. doi:10.1214/17-AOS1656. https://projecteuclid.org/euclid.aos/1536631272
References
- [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.Zentralblatt MATH: 1200.62059
Digital Object Identifier: doi:10.1214/10-AOS817
Project Euclid: euclid.aos/1283175989 - [2] Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.Zentralblatt MATH: 1209.62097
Digital Object Identifier: doi:10.1214/10-AOS839
Project Euclid: euclid.aos/1291388376 - [3] Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.Zentralblatt MATH: 1143.62006
Digital Object Identifier: doi:10.1214/07-AOS526
Project Euclid: euclid.aos/1216237298 - [4] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
- [5] Arias-Castro, E. and Grimmett, G. R. (2013). Cluster detection in networks using percolation. Bernoulli 19 676–719.Zentralblatt MATH: 06168768
Digital Object Identifier: doi:10.3150/11-BEJ412
Project Euclid: euclid.bj/1363192043 - [6] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 51 39–54.
- [7] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
- [8] Barry, D. and Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Amer. Statist. Assoc. 88 309–319.Zentralblatt MATH: 0775.62065
- [9] Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B 48 259–302.
- [10] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830 - [11] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
- [12] Boykov, Y. and Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26 1124–1137.Zentralblatt MATH: 1005.68964
- [13] Boykov, Y., Veksler, O. and Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23 1222–1239.
- [14] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist. 37 157–183.Zentralblatt MATH: 1155.62034
Digital Object Identifier: doi:10.1214/07-AOS558
Project Euclid: euclid.aos/1232115931 - [15] Chambolle, A. (2005). Total variation minimization and a class of binary MRF models. In EMMCVPR 2005 136–152. Springer, Berlin.
- [16] Chambolle, A. and Lions, P.-L. (1997). Image recovery via total variation minimization and related problems. Numer. Math. 76 167–188.
- [17] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
- [18] Chernoff, H. and Zacks, S. (1964). Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat. 35 999–1018.Zentralblatt MATH: 0218.62033
Digital Object Identifier: doi:10.1214/aoms/1177700517
Project Euclid: euclid.aoms/1177700517 - [19] Dalalyan, A. S., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.Zentralblatt MATH: 1359.62295
Digital Object Identifier: doi:10.3150/15-BEJ756
Project Euclid: euclid.bj/1475001366 - [20] Darbon, J. and Sigelle, M. (2005). A fast and exact algorithm for total variation minimization. In Iberian Conference on Pattern Recognition and Image Analysis 351–359. Springer, Berlin.
- [21] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29 1–65.Zentralblatt MATH: 1029.62038
Digital Object Identifier: doi:10.1214/aos/996986501
Project Euclid: euclid.aos/996986501 - [22] Donoho, D. L. (1999). Wedgelets: Nearly minimax estimation of edges. Ann. Statist. 27 859–897.Zentralblatt MATH: 0957.62029
Digital Object Identifier: doi:10.1214/aos/1018031261
Project Euclid: euclid.aos/1018031261 - [23] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
- [24] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
- [25] Fan, Z. and Guan, L. (2018). Supplement to “Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs.” DOI:10.1214/17-AOS1656SUPP.
- [26] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6 721–741.
- [27] Ghosh, A., Boyd, S. and Saberi, A. (2008). Minimizing effective resistance of a graph. SIAM Rev. 50 37–66.
- [28] Goldstein, T. and Osher, S. (2009). The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2 323–343.
- [29] Greig, D. M., Porteous, B. T. and Seheult, A. H. (1989). Exact maximum a posteriori estimation for binary images. J. R. Stat. Soc. Ser. B. Stat. Methodol. 51 271–279.
- [30] Guntuboyina, A., Lieu, D., Chatterjee, S. and Sen, B. (2017). Spatial adaptation in trend filtering. Available at arXiv:1702.05113.arXiv: 1702.05113
- [31] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480–1493.
- [32] Harris, X. T. (2016). Prediction error after model search. Available at arXiv:1610.06107.arXiv: 1610.06107
- [33] Hoefling, H. (2010). A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Statist. 19 984–1006. Supplementary materials available online.
- [34] Hütter, J.-C. and Rigollet, P. (2016). Optimal rates for total variation denoising. In Conf. Learning Theory 1115–1146.
- [35] Jackson, B., Scargle, J. D. et al. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett. 12 105–108.
- [36] Johnstone, I. (2015). Gaussian Estimation: Sequence and Wavelet Models. Available at statweb.stanford.edu/~imj/GE09-08-15.pdf.
- [37] Karger, D. R. and Stein, C. (1996). A new approach to the minimum cut problem. J. ACM 43 601–640.
- [38] Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107 1590–1598.
- [39] Kolmogorov, V. and Zabin, R. (2004). What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26 147–159.
- [40] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statistics 82. Springer, New York.Zentralblatt MATH: 0833.62039
- [41] Kovac, A. and Smith, A. D. (2011). Nonparametric regression on a graph. J. Comput. Graph. Statist. 20 432–447.
- [42] Land, S. R. and Friedman, J. H. (1997). Variable fusion: A new adaptive signal regression method. Technical Report 656, Dept. Statistics, Carnegie Mellon Univ., Pittsburgh, PA.
- [43] Lebarbier, É. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process. 85 717–736.
- [44] Lin, K., Sharpnack, J., Rinaldo, A. and Tibshirani, R. J. (2016). Approximate recovery in changepoint problems, from $\ell_{2}$ estimation error rates. Available at arXiv:1606.06746.arXiv: 1606.06746
- [45] Livne, O. E. and Brandt, A. (2012). Lean algebraic multigrid (LAMG): Fast graph Laplacian linear solver. SIAM J. Sci. Comput. 34 B499–B522.
- [46] Lovász, L. (1996). Random walks on graphs: A survey. In Combinatorics: Paul Erdős Is Eighty, Vol. 2 (Keszthely, 1993). Bolyai Soc. Math. Stud. 2 353–397. János Bolyai Math. Soc., Budapest.
- [47] Madrid Padilla, O. H., Scott, J. G., Sharpnack, J. and Tibshirani, R. J. (2016). The DFS fused lasso: Nearly optimal linear-time denoising over graphs and trees. Available at arXiv:1608.03384.
- [48] Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.Zentralblatt MATH: 0871.62040
Digital Object Identifier: doi:10.1214/aos/1034276635
Project Euclid: euclid.aos/1034276635 - [49] Moore, C. and Newman, M. E. (2000). Epidemics and percolation in small-world networks. Phys. Rev. E 61 5678–5682.
- [50] Mumford, D. and Shah, J. (1989). Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42 577–685.
- [51] Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist. 37 2922–2952.Zentralblatt MATH: 1173.62027
Digital Object Identifier: doi:10.1214/08-AOS665
Project Euclid: euclid.aos/1247836673 - [52] Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268. Experimental mathematics: Computational issues in nonlinear science (Los Alamos, NM, 1991).
- [53] Sadhanala, V., Wang, Y.-X. and Tibshirani, R. (2016). Graph sparsification approaches for Laplacian smoothing. In Int. Conf. Artific. Intell. Statist. 1250–1259.
- [54] Sadhanala, V., Wang, Y.-X. and Tibshirani, R. J. (2016). Total variation classes beyond 1d: Minimax rates, and the limitations of linear smoothers. In Adv. Neural Inform. Process. Syst. 3513–3521.
- [55] Sharpnack, J., Rinaldo, A. and Singh, A. (2012). Sparsistency of the edge lasso over graphs. In Int. Conf. Artific. Intell. Statist. 1028–1036.
- [56] Sharpnack, J., Singh, A. and Rinaldo, A. (2013). Detecting activations over graphs using spanning tree wavelet bases. In Int. Conf. Artific. Intell. Statist. 545–553.
- [57] Sharpnack, J. L., Krishnamurthy, A. and Singh, A. (2013). Near-optimal anomaly detection in graphs using Lovasz extended scan statistic. In Adv. Neural Inform. Process. Syst. 1959–1967.
- [58] Spielman, D. A. and Srivastava, N. (2011). Graph sparsification by effective resistances. SIAM J. Comput. 40 1913–1926.
- [59] Spielman, D. A. and Teng, S.-H. (2004). Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In ACM Symp. Theory Comput. 81–90. ACM, New York.Zentralblatt MATH: 1192.65048
- [60] Tansey, W. and Scott, J. G. (2015). A fast and flexible algorithm for the graph-fused lasso. Available at arXiv:1505.06475.arXiv: 1505.06475
- [61] Tian, X. and Taylor, J. E. (2015). Selective inference with a randomized response. Available at arXiv:1507.06739.arXiv: 1507.06739
Zentralblatt MATH: 1392.62144
Digital Object Identifier: doi:10.1214/17-AOS1564
Project Euclid: euclid.aos/1522742433 - [62] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
- [63] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
- [64] Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.Zentralblatt MATH: 1234.62107
Digital Object Identifier: doi:10.1214/11-AOS878
Project Euclid: euclid.aos/1304514656 - [65] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
- [66] Wang, Y.-X., Sharpnack, J., Smola, A. J. and Tibshirani, R. J. (2016). Trend filtering on graphs. J. Mach. Learn. Res. 17 Paper No. 105.Zentralblatt MATH: 1369.62082
- [67] Winkler, G. and Liebscher, V. (2002). Smoothers for discontinuous signals. J. Nonparametr. Stat. 14 203–222.
- [68] Xin, B., Kawahara, Y., Wang, Y. and Gao, W. (2014). Efficient generalized fused lasso and its application to the diagnosis of Alzheimer’s disease. In Proc. Assoc. Adv. Artific. Intell. Conf. 2163–2169.
- [69] Yao, Y.-C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.Zentralblatt MATH: 0551.62069
Digital Object Identifier: doi:10.1214/aos/1176346802
Project Euclid: euclid.aos/1176346802 - [70] Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statist. Probab. Lett. 6 181–189.
- [71] Yao, Y.-C. and Au, S.-T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370–381.Zentralblatt MATH: 0711.62031
- [72] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2014). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In Conf. Learning Theory 35 1–28.
- [73] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2017). Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators. Electron. J. Stat. 11 752–799.
Supplemental materials
- Supplementary Appendices. The supplementary appendices contain proofs of theoretical results.Digital Object Identifier: doi:10.1214/17-AOS1656SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Wedgelets: nearly minimax estimation of edges
Donoho, David L., The Annals of Statistics, 1999 - Function estimation via wavelet shrinkage for long-memory data
Wang, Yazhen, The Annals of Statistics, 1996 - Isotonic regression in general dimensions
Han, Qiyang, Wang, Tengyao, Chatterjee, Sabyasachi, and Samworth, Richard J., The Annals of Statistics, 2019
- Wedgelets: nearly minimax estimation of edges
Donoho, David L., The Annals of Statistics, 1999 - Function estimation via wavelet shrinkage for long-memory data
Wang, Yazhen, The Annals of Statistics, 1996 - Isotonic regression in general dimensions
Han, Qiyang, Wang, Tengyao, Chatterjee, Sabyasachi, and Samworth, Richard J., The Annals of Statistics, 2019 - A two stage $k$-monotone B-spline regression estimator: Uniform Lipschitz property and optimal convergence rate
Lebair, Teresa M. and Shen, Jinglai, Electronic Journal of Statistics, 2018 - Multiscale change-point segmentation: beyond step functions
Li, Housen, Guo, Qinghai, and Munk, Axel, Electronic Journal of Statistics, 2019 - Exact minimax estimation of the predictive density in sparse Gaussian models
Mukherjee, Gourab and Johnstone, Iain M., The Annals of Statistics, 2015 - Oracle inequalities for network models and sparse graphon estimation
Klopp, Olga, Tsybakov, Alexandre B., and Verzelen, Nicolas, The Annals of Statistics, 2017 - $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs
van de Geer, Sara and Bühlmann, Peter, The Annals of Statistics, 2013 - Sharp minimax adaptation over Sobolev ellipsoids in nonparametric testing
Ji, Pengsheng and Nussbaum, Michael, Electronic Journal of Statistics, 2017 - Efficient estimation of a density in a problem of
tomography
Cavalier, Laurent, The Annals of Statistics, 2000