Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

Variational multiscale nonparametric regression: Smooth functions

Markus Grasmair, Housen Li, and Axel Munk

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

For the problem of nonparametric regression of smooth functions, we reconsider and analyze a constrained variational approach, which we call the MultIscale Nemirovski–Dantzig (MIND) estimator. This can be viewed as a multiscale extension of the Dantzig selector (Ann. Statist. 35 (2009) 2313–2351) based on early ideas of Nemirovski (J. Comput. System Sci. 23 (1986) 1–11). MIND minimizes a homogeneous Sobolev norm under the constraint that the multiresolution norm of the residual is bounded by a universal threshold. The main contribution of this paper is the derivation of convergence rates of MIND with respect to $L^{q}$-loss, $1\le q\le\infty $, both almost surely and in expectation. To this end, we introduce the method of approximate source conditions. For a one-dimensional signal, these can be translated into approximation properties of B-splines. A remarkable consequence is that MIND attains almost minimax optimal rates simultaneously for a large range of Sobolev and Besov classes, which provides certain adaptation. Complimentary to the asymptotic analysis, we examine the finite sample performance of MIND by numerical simulations. A MATLAB package is available online.

Résumé

Dans le cadre du problème de la régression paramétrique de fonctions lisses, nous revisitons et analysons une approche variationnelle contrainte, que nous appelons l’estimateur de Nemirovski–Dantzig multi-échelle (MIND). Il peut être vu comme une extension multi-échelle du sélecteur de Dantzig (Ann. Statist. 35 (2009) 2313–2351), reposant sur des idées antérieures de Nemirovski (J. Comput. System Sci. 23 (1986) 1–11). MIND minimise une norme de Sobolev homogène sous la contrainte que la norme multi-résolution du terme résiduel est bornée par un seuil universel. La contribution principale de cet article est d’obtenir des vitesses de convergence de MIND pour une norme $L^{q},1\leq q\leq \infty $, à la fois p.s. et en moyenne. À cette fin, nous introduisons la méthode des conditions de sources approximées. Dans le cas d’un signal unidimensionnel, on peut exprimer ces dernières en termes de propriétés d’approximation de B-splines. Une conséquence remarquable est que MIND atteint presque des taux minimax optimaux simultanément pour un large ensemble d’espaces de Sobolev et Besov, ce qui montre une adaptabilité certaine. En plus de l’analyse asymptotique, nous étudions la performance de MIND pour un échantillon fini à l’aide de simulations numériques. Un package MATLAB est disponible en ligne.

Article information

Source
Ann. Inst. H. Poincaré Probab. Statist., Volume 54, Number 2 (2018), 1058-1097.

Dates
Received: 4 December 2015
Revised: 27 February 2017
Accepted: 20 March 2017
First available in Project Euclid: 25 April 2018

Permanent link to this document
https://projecteuclid.org/euclid.aihp/1524643240

Digital Object Identifier
doi:10.1214/17-AIHP832

Mathematical Reviews number (MathSciNet)
MR3795077

Zentralblatt MATH identifier
06897979

Subjects
Primary: 62G08: Nonparametric regression 62G20: Asymptotic properties
Secondary: 90C25: Convex programming

Keywords
Nonparametric regression Adaptation Convergence rates Minimax optimality Multiresolution norm Approximate source conditions

Citation

Grasmair, Markus; Li, Housen; Munk, Axel. Variational multiscale nonparametric regression: Smooth functions. Ann. Inst. H. Poincaré Probab. Statist. 54 (2018), no. 2, 1058--1097. doi:10.1214/17-AIHP832. https://projecteuclid.org/euclid.aihp/1524643240


Export citation

References

  • [1] R. A. Adams and J. J. F. Fournier. Sobolev Spaces, 2nd edition. Pure and Applied Mathematics (Amsterdam) 140. Elsevier/Academic Press, Amsterdam, 2003.
  • [2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 (1) (2009) 183–202.
  • [3] A. S. Besicovitch. A general form of the covering principle and relative differentiation of additive functions. Math. Proc. Cambridge Philos. Soc. 41 (1945) 103–110.
  • [4] A. S. Besicovitch. A general form of the covering principle and relative differentiation of additive functions. II. Math. Proc. Cambridge Philos. Soc. 42 (1946) 1–10.
  • [5] P. J. Bickel, Y. Ritov and A. B. Tsybakov. Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 (4) (2009) 1705–1732.
  • [6] N. Bissantz, T. Hohage, A. Munk and F. Ruymgaart. Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal. 45 (6) (2007) 2610–2636.
  • [7] J. P. Boyle and R. L. Dykstra. A method for finding projections onto the intersection of convex sets in Hilbert spaces. In Advances in Order Restricted Statistical Inference 28–47. Iowa City, Iowa, 1985. Lecture Notes in Statist. 37. Springer, Berlin, 1986.
  • [8] S. C. Brenner and L. R. Scott. The Mathematical Theory of Finite Element Methods, 3rd edition. Texts in Applied Mathematics 15. Springer, New York, 2008.
  • [9] T. Cai. Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 27 (3) (1999) 898–924.
  • [10] T. Cai. On block thresholding in wavelet regression: Adaptivity, block size, and threshold level. Statist. Sinica 12 (4) (2002) 1241–1273.
  • [11] T. Cai, L. Wang and G. Xu. Stable recovery of sparse signals and an oracle inequality. IEEE Trans. Inform. Theory 56 (7) (2010) 3516–3522.
  • [12] T. Cai and H. Zhou. A data-driven block thresholding approach to wavelet estimation. Ann. Statist. 37 (2) (2009) 569–595.
  • [13] E. J. Candès and F. Guo. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction. Signal Process. 82 (2002) 1519–1543.
  • [14] E. J. Candès and T. Tao. The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 (6) (2007) 2313–2351.
  • [15] L. Cavalier, G. K. Golubev, D. Picard and A. B. Tsybakov. Oracle inequalities for inverse problems. Ann. Statist. 30 (3) (2002) 843–874. Dedicated to the memory of Lucien Le Cam.
  • [16] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40 (1) (2011) 120–145.
  • [17] H. P. Chan and G. Walther. Detection with the scan and the average likelihood ratio. Statist. Sinica 23 (1) (2013) 409–428.
  • [18] T. F. Chan and J. Shen. Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2005.
  • [19] C. Chesneau, J. Fadili and J.-L.-L. Starck. Stein block thresholding for wavelet-based image deconvolution. Electron. J. Stat. 4 (2010) 415–435.
  • [20] A. Cohen, M. Hoffmann and M. Reiß. Adaptive wavelet Galerkin methods for linear inverse problems. SIAM J. Numer. Anal. 42 (4) (2004) 1479–1501. (electronic).
  • [21] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992.
  • [22] P. L. Davies and A. Kovac. Local extremes, runs, strings and multiresolution. Ann. Statist. 29 (1) (2001) 1–65.
  • [23] P. L. Davies, A. Kovac and M. Meise. Nonparametric regression, confidence regions and regularization. Ann. Statist. 37 (2009) 2597–2625.
  • [24] P. L. Davies and M. Meise. Approximating data with weighted smoothing splines. J. Nonparametr. Stat. 20 (3) (2008) 207–228.
  • [25] C. de Boor. On the (bi)infinite case of Shadrin’s theorem concerning the $L_{\infty}$-boundedness of the $L_{2}$-spline projector. Proc. Steklov Inst. Math. 277 (2012) 73–78.
  • [26] W. Deng and W. Yin. On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. (2015). In press.
  • [27] H. Dette, A. Munk and T. Wagner. Estimating the variance in nonparametric regression-what is a reasonable choice? J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 (4) (1998) 751–764.
  • [28] Y. Dong, M. Hintermüller and M. M. Rincon-Camacho. Automated regularization parameter selection in multi-scale total variation models for image restoration. J. Math. Imaging Vision 40 (1) (2011) 82–104.
  • [29] D. L. Donoho. De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 (3) (1995) 613–627.
  • [30] D. L. Donoho. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal. 2 (2) (1995) 101–126.
  • [31] D. L. Donoho, M. Elad and V. N. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (1) (2006) 6–18.
  • [32] D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 (3) (1994) 425–455.
  • [33] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard. Wavelet shrinkage: Asymptopia? J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 (2) (1995) 301–369.
  • [34] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard. Universal near minimaxity of wavelet shrinkage. In Festschrift for Lucien Le Cam 183–218. D. Pollard and G. Yang (Eds). Springer, New York, 1996.
  • [35] L. Dümbgen and A. Kovac. Extensions of smoothing via taut strings. Electron. J. Stat. 3 (2009) 41–75.
  • [36] L. Dümbgen and V. G. Spokoiny. Multiscale testing of qualitative hypotheses. Ann. Statist. 29 (1) (2001) 124–152.
  • [37] N. Dyn, F. J. Narcowich and J. D. Ward. Variational principles and Sobolev-type estimates for generalized interpolation on a Riemannian manifold. Constr. Approx. 15 (2) (1999) 175–208.
  • [38] P. P. B. Eggermont and V. N. LaRiccia. Maximum Penalized Likelihood Estimation. Volume II: Regression. Springer Series in Statistics. Springer, Dordrecht, 2009.
  • [39] H. W. Engl, M. Hanke and A. Neubauer. Regularization of Inverse Problems. Mathematics and Its Applications. 375. Kluwer Academic Publishers Group, Dordrecht, 1996.
  • [40] L. C. Evans. Partial Differential Equations, 2nd edition. Graduate Studies in Mathematics 19. American Mathematical Society, Providence, RI, 2010.
  • [41] J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability. 66. Chapman & Hall, London, 1996.
  • [42] J. Flemming. Solution smoothness of ill-posed equations in Hilbert spaces: Four concepts and their cross connections. Appl. Anal. 91 (5) (2012) 1029–1044.
  • [43] J. Flemming and B. Hofmann. A new approach to source conditions in regularization with general residual term. Numer. Funct. Anal. Optim. 31 (2) (2010) 254–284.
  • [44] K. Frick, P. Marnitz and A. Munk. Statistical multiresolution Dantzig estimation in imaging: Fundamental concepts and algorithmic framework. Electron. J. Stat. 6 (2012) 231–268.
  • [45] K. Frick, P. Marnitz and A. Munk. Statistical multiresolution estimation for variational imaging: With an application in Poisson-biophotonics. J. Math. Imaging Vision 46 (3) (2013) 370–387.
  • [46] K. Frick, A. Munk and H. Sieling. Multiscale change point inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 (3) (2014) 495–580.
  • [47] J. Glaz and N. Balakrishnan (Eds). Scan Statistics and Applications. Statistics for Industry and Technology. Birkhäuser, Boston, 1999.
  • [48] A. Goldenshluger and A. Nemirovski. On spatially adaptive estimation of nonparametric regression. Math. Methods Statist. 6 (2) (1997) 135–170.
  • [49] M. v. Golitschek. On the $L_{\infty}$-norm of the orthogonal projector onto splines. A short proof of A. Shadrin’s theorem. J. Approx. Theory 181 (2014) 30–42.
  • [50] P. Green and B. Silverman. Nonparametric Regression and Generalized Linear Models. A Roughness Penalty Approach. Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1994.
  • [51] C. W. Groetsch. The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Pitman, Boston, 1984.
  • [52] L. Györfi, M. Kohler, A. Krzyżak and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York, 2002.
  • [53] P. Hall, J. W. Kay and D. M. Titterinton. Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 (1990) 521–528.
  • [54] P. Hall, S. Penev, G. Kerkyacharian and D. Picard. Numerical performance of block thresholded wavelet estimators. Stat. Comput. 7 (1997) 115–124.
  • [55] M. Haltmeier and A. Munk. Extreme value analysis of empirical frame coefficients and implications for denoising by soft-thresholding. Appl. Comput. Harmon. Anal. (2013). In press.
  • [56] W. Härdle, G. Kerkyacharian, D. Picard and A. Tsybakov. Wavelets, Approximation, and Statistical Applications. Lecture Notes in Statistics 129. Springer-Verlag, New York, 1998.
  • [57] T. Hein. Convergence rates for regularization of ill-posed problems in Banach spaces by approximate source conditions. Inverse Probl. 24 (4), 045007 (2008). 10.
  • [58] M. Hoffmann and M. Reiss. Nonlinear estimation for linear inverse problems with error in the operator. Ann. Statist. 36 (1) (2008) 310–336.
  • [59] B. Hofmann. Approximate source conditions in Tikhonov-Phillips regularization and consequences for inverse problems with multiplication operators. Math. Methods Appl. Sci. 29 (3) (2006) 351–371.
  • [60] B. Hofmann, B. Kaltenbacher, C. Pöschl and O. Scherzer. A convergence rates result for Tikhonov regularization in Banach spaces with non-smooth operators. Inverse Probl. 23 (3) (2007) 987–1010.
  • [61] B. Hofmann and M. Yamamoto. Convergence rates for Tikhonov regularization based on range inclusions. Inverse Probl. 21 (3) (2005) 805–820.
  • [62] V. K. Ivanov, V. V. Vasin and V. P. Tanana. Theory of Linear Ill-Posed Problems and Its Applications, 36, 2nd edition. Walter de Gruyter, Berlin, 2002.
  • [63] Z. Kabluchko. Extremes of the standardized Gaussian noise. Stochastic Process. Appl. 121 (3) (2011) 515–533.
  • [64] A. Korostelev and O. Korosteleva. Mathematical Statistics: Asymptotic Minimax Theory. Graduate Studies in Mathematics 119. American Mathematical Society, Providence, RI, 2011.
  • [65] A. Kovac and M. Meise. Minimizing total variation under multiresolution constraints. Technical report, University of Bristol, 2006.
  • [66] R. Kress. Numerical Analysis. Graduate Texts in Mathematics 181. Springer-Verlag, New York, 1998.
  • [67] O. V. Lepski, E. Mammen and V. G. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 (3) (1997) 929–947.
  • [68] O. V. Lepskiĭ. A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatn. Primen. 35 (3) (1990) 459–470.
  • [69] E. Mammen and S. van de Geer. Locally adaptive regression splines. Ann. Statist. 25 (1) (1997) 387–413.
  • [70] A. Munk, N. Bissantz, T. Wagner and G. Freitag. On difference-based variance estimation in nonparametric regression when the covariate is high dimensional. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 (1) (2005) 19–41.
  • [71] E. A. Nadaraya. On estimating regression. Theory Probab. Appl. 9 (1) (1964) 141–142.
  • [72] F. J. Narcowich, R. Schaback and J. D. Ward. Approximations in Sobolev spaces by kernel expansions. J. Approx. Theory 114 (1) (2002) 70–83.
  • [73] F. J. Narcowich, J. D. Ward and H. Wendland. Refined error estimates for radial basis function interpolation. Constr. Approx. 19 (4) (2003) 541–564.
  • [74] A. Nemirovski. Nonparametric estimation of smooth regression functions. Izv. Akad. Nauk. SSR Teckhn. Kibernet. 23 (1985) 1–11. (in Russian), 3:50–60. J. Comput. System Sci, 1986, (in English).
  • [75] A. Nemirovski. Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics 85–277. Saint-Flour, 1998. Lecture Notes in Math. 1738. Springer, Berlin, 2000.
  • [76] Y. Nesterov, A. Nemirovskii and Y. Ye. Interior-Point Polynomial Algorithms in Convex Programming, 13. SIAM, Philadelphia, 1994.
  • [77] J. Rice. Bandwidth choice for nonparametric regression. Ann. Statist. 12 (4) (1984) 1215–1230.
  • [78] C. Rivera and G. Walther. Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat. 40 (2013) 752–769.
  • [79] K. Scherer and A. Shadrin. New upper bound for the $B$-spline basis condition number. II. A proof of de Boor’s $2^{k}$-conjecture. J. Approx. Theory 99 (2) (1999) 217–229.
  • [80] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier and F. Lenzen. Variational Methods in Imaging, 167. Springer, New York, 2009.
  • [81] L. L. Schumaker. Spline Functions: Basic Theory, 3rd edition. Cambridge University Press, Cambridge, 2007.
  • [82] A. Y. Shadrin. The $L_{\infty}$-norm of the $L_{2}$-spline projector is bounded independently of the knot sequence: A proof of de Boor’s conjecture. Acta Math. 187 (1) (2001) 59–137.
  • [83] J. Sharpnack and E. Arias-Castro. Exact asymptotics for the scan statistic and fast alternatives, 2014. Available at arXiv:1409.7127.
  • [84] D. Siegmund and B. Yakir. Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 (2) (2000) 191–213.
  • [85] V. Spokoiny. Variance estimation for high-dimensional regression models. J. Multivariate Anal. 82 (1) (2002) 111–133.
  • [86] C. J. Stone. An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. 12 (4) (1984) 1285–1297.
  • [87] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 (1) (1996) 267–288.
  • [88] H. Triebel. Theory of Function Spaces. Modern Birkhäuser Classics. Birkhäuser Verlag, Basel, 1983.
  • [89] H. Triebel. Theory of Function Spaces. II. Monographs in Mathematics 84. Birkhäuser Verlag, Basel, 1992.
  • [90] H. Triebel. Interpolation Theory, Function Spaces, Differential Operators, 2nd edition. Johann Ambrosius Barth, Heidelberg, 1995.
  • [91] A. Tsybakov. Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York, 2009.
  • [92] S. A. van de Geer. Regression Analysis and Empirical Processes. CWI Tract. Stichting Mathematisch Centrum, Centrum voor Wiskunde en Informatica 45. Amsterdam, 1988.
  • [93] A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer-Verlag, New York, 1996.
  • [94] G. Wahba. Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14 (4) (1977) 651–667.
  • [95] G. Wahba. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, 1990.
  • [96] G. Walther. Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 (2) (2010) 1010–1033.
  • [97] W. P. Ziemer. Weakly Differentiable Functions. Sobolev Spaces and Functions of Bounded Variation. Graduate Texts in Mathematics 120. Springer, Berlin etc., 1989.