The Annals of Statistics

Adaptive estimation over anisotropic functional classes via oracle approach

Oleg Lepski

Full-text: Open access


We address the problem of adaptive minimax estimation in white Gaussian noise models under $\mathbb{L}_{p}$-loss, $1\leq p\leq\infty$, on the anisotropic Nikol’skii classes. We present the estimation procedure based on a new data-driven selection scheme from the family of kernel estimators with varying bandwidths. For the proposed estimator we establish so-called $\mathbb{L}_{p}$-norm oracle inequality and use it for deriving minimax adaptive results. We prove the existence of rate-adaptive estimators and fully characterize behavior of the minimax risk for different relationships between regularity parameters and norm indexes in definitions of the functional class and of the risk. In particular some new asymptotics of the minimax risk are discovered, including necessary and sufficient conditions for the existence of a uniformly consistent estimator. We provide also a detailed overview of existing methods and results and formulate open problems in adaptive minimax estimation.

Article information

Ann. Statist., Volume 43, Number 3 (2015), 1178-1242.

Received: April 2014
Revised: November 2014
First available in Project Euclid: 15 May 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62G20: Asymptotic properties

White Gaussian noise model oracle inequality adaptive estimation kernel estimators with varying bandwidths $\mathbb{L}_{p}$-risk


Lepski, Oleg. Adaptive estimation over anisotropic functional classes via oracle approach. Ann. Statist. 43 (2015), no. 3, 1178--1242. doi:10.1214/14-AOS1306.

Export citation


  • Akakpo, N. (2012). Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Methods Statist. 21 1–28.
  • Baraud, Y. and Birgé, L. (2014). Estimating composite functions by model selection. Ann. Inst. Henri Poincaré Probab. Stat. 50 285–314.
  • Baraud, Y., Birgé, L. and Sart, M. (2014). A new method for estimation and model selection: $\rho$-estimation. Available at arXiv:1403.6057v1.
  • Baraud, Y., Giraud, C. and Huet, S. (2014). Estimator selection in the Gaussian setting. Ann. Inst. Henri Poincaré Probab. Stat. 50 1092–1119.
  • Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • Bertin, K. (2005). Sharp adaptive estimation in sup-norm for $d$-dimensional Hölder classes. Math. Methods Statist. 14 267–298.
  • Birgé, L. (2008). Model selection for density estimation with ${\mathbb{L}}_{2}$-loss. Available at arXiv:0808.1416v2.
  • Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203–268.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • Cai, T. T. (1999). Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 27 898–924.
  • Cai, T. T. and Low, M. G. (2005). On adaptive estimation of linear functionals. Ann. Statist. 33 2311–2343.
  • Cai, T. T. and Low, M. G. (2006). Optimal adaptive estimation of a quadratic functional. Ann. Statist. 34 2298–2325.
  • Cavalier, L. and Golubev, Y. (2006). Risk hull method and regularization by projections of ill-posed inverse problems. Ann. Statist. 34 1653–1677.
  • Cavalier, L. and Tsybakov, A. B. (2001). Penalized blockwise Stein’s method, monotone oracles and sharp adaptive estimation. Math. Methods Statist. 10 247–282.
  • Chichignoud, M. (2012). Minimax and minimax adaptive estimation in multiplicative regression: Locally Bayesian approach. Probab. Theory Related Fields 153 543–586.
  • Chichignoud, M. and Lederer, J. (2014). A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression. Bernoulli 20 1560–1599.
  • Comte, F. and Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. Ann. Inst. Henri Poincaré Probab. Stat. 49 569–609.
  • Dalalyan, A. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.
  • de Guzmán, M. (1975). Differentiation of Integrals in $R^{n}$. Lecture Notes in Mathematics 481. Springer, Berlin.
  • Delyon, B. and Juditsky, A. (1996). On minimax wavelet estimators. Appl. Comput. Harmon. Anal. 3 215–228.
  • Devroye, L. and Lugosi, G. (1996). A universally acceptable smoothing factor for kernel density estimates. Ann. Statist. 24 2499–2512.
  • Devroye, L. and Lugosi, G. (1997). Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann. Statist. 25 2626–2637.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508–539.
  • Efroĭmovich, S. Y. and Pinsker, M. S. (1984). A self-training algorithm for nonparametric filtering. Automat. Remote Control 45 58–65.
  • Efroĭmovich, S. Y. (1986). Nonparametric estimation of the density with unknown smoothness. Theory Probab. Appl. 30 557–568.
  • Efromovich, S. (2008). Adaptive estimation of and oracle inequalities for probability densities and characteristic functions. Ann. Statist. 36 1127–1155.
  • Efromovich, S. and Low, M. G. (1994). Adaptive estimates of linear functionals. Probab. Theory Related Fields 98 261–275.
  • Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications, 2nd ed. Wiley, New York.
  • Gach, F., Nickl, R. and Spokoiny, V. (2013). Spatially adaptive density estimation by localised Haar projections. Ann. Inst. Henri Poincaré Probab. Stat. 49 900–914.
  • Giné, E. and Nickl, R. (2009). An exponential inequality for the distribution function of the kernel density estimator, with applications to adaptive estimation. Probab. Theory Related Fields 143 569–596.
  • Goldenshluger, A. (2009). A universal procedure for aggregating estimators. Ann. Statist. 37 542–568.
  • Goldenshluger, A. and Lepski, O. (2008). Universal pointwise selection rule in multivariate function estimation. Bernoulli 14 1150–1190.
  • Goldenshluger, A. and Lepski, O. (2009). Structural adaptation via $\mathbb{L}_{p}$-norm oracle inequalities. Probab. Theory Related Fields 143 41–71.
  • Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 39 1608–1632.
  • Goldenshluger, A. V. and Lepski, O. V. (2013). General selection rule from a family of linear estimators. Theory Probab. Appl. 57 209–226.
  • Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on $R^{d}$. Probab. Theory Related Fields 159 479–543.
  • Goldenshluger, A. and Nemirovski, A. (1997). Spatial adaptive estimation of smooth nonparametric regression functions. Math. Methods Statist. 6 135–170.
  • Golubev, G. K. (1992). Nonparametric estimation of smooth densities of a distribution in $L_{2}$. Probl. Inform. Transm. 1 52–62.
  • Golubev, G. K. and Nussbaum, M. (1992). Adaptive spline estimates in a nonparametric regression model. Theory Probab. Appl. 37 553–560.
  • Hasminskii, R. and Ibragimov, I. (1990). On density estimation in the view of Kolmogorov’s ideas in approximation theory. Ann. Statist. 18 999–1010.
  • Horowitz, J. L. and Mammen, E. (2007). Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. Ann. Statist. 35 2589–2619.
  • Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29 1537–1566.
  • Ibragimov, I. A. and Has’minskiĭ, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Applications of Mathematics 16. Springer, New York.
  • Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • Juditsky, A. (1997). Wavelet estimators: Adapting to unknown smoothness. Math. Methods Statist. 6 1–25.
  • Juditsky, A. and Lambert-Lacroix, S. (2004). On minimax density estimation on $\mathbb{R}$. Bernoulli 10 187–220.
  • Juditsky, A. B., Lepski, O. V. and Tsybakov, A. B. (2009). Nonparametric estimation of composite functions. Ann. Statist. 37 1360–1404.
  • Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.
  • Kerkyacharian, G., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137–170.
  • Kerkyacharian, G., Lepski, O. and Picard, D. (2008). Nonlinear estimation in anisotropic multiindex denoising. Sparse case. Theory Probab. Appl. 52 58–77.
  • Kerkyacharian, G., Nickl, R. and Picard, D. (2012). Concentration inequalities and confidence bands for needlet density estimators on compact homogeneous manifolds. Probab. Theory Related Fields 153 363–404.
  • Kerkyacharian, G., Thanh, M. and Picard, D. (2011). Localized spherical deconvilution. Ann. Statist. 39 1042–1068.
  • Lepski, O. (2013). Multivariate density estimation under sup-norm loss: Oracle approach, adaptation and independence structure. Ann. Statist. 41 1005–1034.
  • Lepski, O. V. (2015). Upper functions for ${\mathbb{L}}_{p}$-norm of Gaussian random fields. Bernoulli. To appear. Available at arXiv:1311.4996v1.
  • Lepski, O. V. and Levit, B. Y. (1998). Adaptive minimax estimation of infinitely differentiable functions. Math. Methods Statist. 7 123–156.
  • Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929–947.
  • Lepski, O. and Serdyukova, N. (2014). Adaptive estimation under single-index constraint in a regression model. Ann. Statist. 42 1–28.
  • Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 459–470.
  • Lepskiĭ, O. V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 682–697.
  • Lepskiĭ, O. V. (1992a). Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Theory Probab. Appl. 37 468–481.
  • Lepskiĭ, O. V. (1992b). On problems of adaptive estimation in white Gaussian noise. In Topics in Nonparametric Estimation. Adv. Soviet Math. 12 87–106. Amer. Math. Soc., Providence, RI.
  • Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
  • Müller, H.-G. and Stadtmüller, U. (1987). Variable bandwidth kernel estimators of regression curves. Ann. Statist. 15 182–201.
  • Nemirovski, A. (2000). Topics in nonparametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • Nemirovskiy, A. S. (1985). Nonparametric estimation of smooth regression functions. Soviet J. Comput. Systems Sci. 23 1–11.
  • Neumann, M. H. (2000). Multivariate wavelet thresholding in anisotropic function spaces. Statist. Sinica 10 399–431.
  • Nikol’skiĭ, S. M. (1977). Priblizhenie Funktsii Mnogikh Peremennykh i Teoremy Vlozheniya, 2nd ed. Nauka, Moscow.
  • Reynaud-Bouret, P., Rivoirard, V. and Tuleau-Malot, C. (2011). Adaptive density estimation: A curse of support? J. Statist. Plann. Inference 141 115–139.
  • Rigollet, P. (2006). Adaptive density estimation using the blockwise Stein method. Bernoulli 12 351–370.
  • Rigollet, P. and Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
  • Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • Samarov, A. and Tsybakov, A. (2007). Aggregation of density estimators and dimension reduction. In Advances in Statistical Modeling and Inference. Ser. Biostat. 3 233–251. World Scientific, Hackensack, NJ.
  • Tsybakov, A. B. (1998). Pointwise and sup-norm sharp adaptive estimation of functions on the Sobolev classes. Ann. Statist. 26 2420–2469.
  • Tsybakov, A. (2003). Optimal rate of aggregation. In Proc. COLT. Lecture Notes in Artificial Intelligence 2777 303–313. Springer, New York.
  • Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
  • Zhang, C.-H. (2005). General empirical Bayes wavelet methods and exactly adaptive minimax estimation. Ann. Statist. 33 54–100.