• Bernoulli
  • Volume 24, Number 4A (2018), 2776-2810.

A new approach to estimator selection

O.V. Lepski

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In the framework of an abstract statistical model, we discuss how to use the solution of one estimation problem (Problem A) in order to construct an estimator in another, completely different, Problem B. As a solution of Problem A we understand a data-driven selection from a given family of estimators $\mathbf{A}(\mathfrak{H})=\{\widehat{A}_{\mathfrak{h}},\mathfrak{h}\in\mathfrak{H}\}$ and establishing for the selected estimator so-called oracle inequality. If $\hat{\mathfrak{h}}\in\mathfrak{H}$ is the selected parameter and $\mathbf{B}(\mathfrak{H})=\{\widehat{B}_{\mathfrak{h}},\mathfrak{h}\in\mathfrak{H}\}$ is an estimator’s collection built in Problem B, we suggest to use the estimator $\widehat{B}_{\hat{\mathfrak{h}}}$. We present very general selection rule led to selector $\hat{\mathfrak{h}}$ and find conditions under which the estimator $\widehat{B}_{\hat{\mathfrak{h}}}$ is reasonable. Our approach is illustrated by several examples related to adaptive estimation.

Article information

Bernoulli, Volume 24, Number 4A (2018), 2776-2810.

Received: October 2016
Revised: March 2017
First available in Project Euclid: 26 March 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptive estimation density model generalized deconvolution model oracle approach upper function


Lepski, O.V. A new approach to estimator selection. Bernoulli 24 (2018), no. 4A, 2776--2810. doi:10.3150/17-BEJ945.

Export citation


  • [1] Abramovich, F., Grinshtein, V., Petsa, A. and Sapatinas, T. (2010). On Bayesian testimation and its application to wavelet thresholding. Biometrika 97 181–198.
  • [2] Baraud, Y., Birgé, L. and Sart, M. (2017). A new method for estimation and model selection: $\rho$-estimation. Invent. Math. 207 425–517.
  • [3] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • [4] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203–268.
  • [5] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [6] Cai, T.T. (1999). Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 27 898–924.
  • [7] Cavalier, L. and Golubev, Yu. (2006). Risk hull method and regularization by projections of ill-posed inverse problems. Ann. Statist. 34 1653–1677.
  • [8] Cavalier, L. and Tsybakov, A.B. (2001). Penalized blockwise Stein’s method, monotone oracles and sharp adaptive estimation. Math. Methods Statist. 10 247–282.
  • [9] Chernousova, E. and Golubev, G.K. (2014). Pointwise adaptive estimation of a multivariate function. Math. Methods Statist. 23 1–16.
  • [10] Comte, F. and Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. Ann. Inst. Henri Poincaré Probab. Stat. 49 569–609.
  • [11] Dalalyan, A. and Tsybakov, A.B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.
  • [12] Devroye, L. and Lugosi, G. (1997). Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann. Statist. 25 2626–2637.
  • [13] Efromovich, S. (1998). Simultaneous sharp estimation of functions and their derivatives. Ann. Statist. 26 273–278.
  • [14] Egishyants, S.A. and Ostrovskiĭ, E.I. (1996). Local and global upper functions for random fields. Theory Probab. Appl. 41 657–665.
  • [15] Folland, G.B. (1999). Real Analysis. Modern Techniques and Their Applications, 2nd ed. Pure and Applied Mathematics (New York). New York: Wiley.
  • [16] Goldenshluger, A. (2009). A universal procedure for aggregating estimators. Ann. Statist. 37 542–568.
  • [17] Goldenshluger, A. and Lepski, O. (2008). Universal pointwise selection rule in multivariate function estimation. Bernoulli 14 1150–1190.
  • [18] Goldenshluger, A. and Lepski, O. (2009). Structural adaptation via $\mathbb{L}_{p}$-norm oracle inequalities. Probab. Theory Related Fields 143 41–71.
  • [19] Goldenshluger, A. and Lepski, O. (2011). Uniform bounds for norms of sums of independent random functions. Ann. Probab. 39 2318–2384.
  • [20] Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 39 1608–1632.
  • [21] Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on $R^{d}$. Probab. Theory Related Fields 159 479–543.
  • [22] Goldenshluger, A.V. and Lepski, O.V. (2013). General selection rule from a family of linear estimators. Theory Probab. Appl. 57 209–226.
  • [23] Hesse, C.H. (1995). Deconvolving a density from partially contaminated observations. J. Multivariate Anal. 55 246–260.
  • [24] Juditsky, A.B., Lepski, O.V. and Tsybakov, A.B. (2009). Nonparametric estimation of composite functions. Ann. Statist. 37 1360–1404.
  • [25] Kerkyacharian, G., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137–170.
  • [26] Kerkyacharian, G., Lepski, O. and Picard, D. (2008). Nonlinear estimation in anisotropic multiindex denoising. Sparse case. Theory Probab. Appl. 52 58–77.
  • [27] Knapik, B. and Solomond, J.-B. (2015). A general approach to posterior contraction in nonparametric inverse problems. Available at arXiv:1407.0335v2.
  • [28] Lepski, O. (2013). Multivariate density estimation under sup-norm loss: Oracle approach, adaptation and independence structure. Ann. Statist. 41 1005–1034.
  • [29] Lepski, O. (2013). Upper functions for positive random functionals. I. General setting and Gaussian random functions. Math. Methods Statist. 22 1–27.
  • [30] Lepski, O. (2013). Upper functions for positive random functionals. II. Application to the empirical processes theory, Part 1. Math. Methods Statist. 22 83–99.
  • [31] Lepski, O. (2013). Upper functions for positive random functionals. II. Application to the empirical processes theory, Part 2. Math. Methods Statist. 22 193–212.
  • [32] Lepski, O. (2015). Adaptive estimation over anisotropic functional classes via oracle approach. Ann. Statist. 43 1178–1242.
  • [33] Lepski, O. (2016). Upper functions for $\mathbb{L}_{p}$-norms of Gaussian random fields. Bernoulli 22 732–773.
  • [34] Lepski, O.V. and Levit, B.Y. (1998). Adaptive minimax estimation of infinitely differentiable functions. Math. Methods Statist. 7 123–156.
  • [35] Lepski, O.V. and Willer, T. (2017). Lower bounds in the convolution structure density model. Bernoulli 23 884–926.
  • [36] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer.
  • [37] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Berlin: Springer.
  • [38] Nikol’skiĭ, S.M. (1977). Priblizhenie Funktsiĭ Mnogikh Peremennykh i Teoremy Vlozheniya. Moscow: “Nauka”. [Approximation of functions of several variables and imbedding theorems.] Second edition, revised and supplemented.
  • [39] Rebelles, G. (2016). Structural adaptive deconvolution under $\mathbb{L}_{p}$-losses. Math. Methods Statist. 25 26–53.
  • [40] Reynaud-Bouret, P., Rivoirard, V., Grammont, F. and Tuleau-Malot, C. (2014). Goodness-of-fit tests and nonparametric adaptive estimation for spike train analysis. J. Math. Neurosci. 4 Art. 3, 41.
  • [41] Rigollet, P. and Tsybakov, A.B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
  • [42] Rosenthal, H.P. (1970). On the subspaces of ${\mathbb{L}}_{p}$ ($p>2$) spanned by sequences of independent random variables. Israel J. Math. 8 273–303.
  • [43] Tsybakov, A. (2003). Optimal rate of aggregation. In Proc. COLT. Lecture Notes in Artificial Intelligence 2777 303–313.
  • [44] van de Geer, S.A. (2000). Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics 6. Cambridge: Cambridge Univ. Press.
  • [45] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. Springer Series in Statistics. New York: Springer.
  • [46] von Bahr, B. and Esseen, C.-G. (1965). Inequalities for the $r$th absolute moment of a sum of random variables, $1\leq r\leq2$. Ann. Math. Stat. 36 299–303.
  • [47] Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.