## The Annals of Statistics

### Random rates in anisotropic regression (with a discussion and a rejoinder by the authors)

#### Abstract

In the context of minimax theory, we propose a new kind of risk, normalized by a random variable, measurable with respect to the data. We present a notion of optimality and a method to construct optimal procedures accordingly. We apply this general setup to the problem of selecting significant variables in Gaussian white noise. In particular, we show that our method essentially improves the accuracy of estimation, in the sense of giving explicit improved confidence sets in $L_2$-norm. Links to adaptive estimation are discussed.

#### Article information

Source
Ann. Statist., Volume 30, Number 2 (2002), 325-396.

Dates
First available in Project Euclid: 14 May 2002

https://projecteuclid.org/euclid.aos/1021379858

Digital Object Identifier
doi:10.1214/aos/1021379858

Mathematical Reviews number (MathSciNet)
MR1902892

#### Citation

Hoffman, M.; Lepski, O. Random rates in anisotropic regression (with a discussion and a rejoinder by the authors). Ann. Statist. 30 (2002), no. 2, 325--396. doi:10.1214/aos/1021379858. https://projecteuclid.org/euclid.aos/1021379858

#### References

• [1] AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716-723.
• [2] BARRON, A., BIRGÉ, L. and MASSART, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413.
• [3] BREIMAN, L. and FREEDMAN, D. (1983). How many variables should be entered in a regression model? J. Amer. Statist. Assoc. 78 131-136.
• [4] BROWN, L. D. and LOW, M. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384-2398.
• [5] CHERNOFF, H. (1956). Large sample theory: parametric case. Ann. Math. Statist. 27 1-22.
• [6] CSISZÁR, I. and KÖRNER, J. (1981). Information Theory: Coding Theorems for Discrete Memoryless Systems. Akademic, New York.
• [7] DELYON, B. and JUDITSKY, A. (1996). On minimax wavelet estimators. Appl. Comput. Harmon. Anal. 3 215-228.
• [8] DONOHO, D. L. and JOHNSTONE, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455.
• [9] DONOHO, D. L., JOHNSTONE, I. M., KERKYACHARIAN, G. and PICARD, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301-369.
• [10] DYCHAKOV, A. G. (1971). On a search model of false coins. In Colloquia Mathematica Societatis Janos Bolyai: Topics in Information Theory. North-Holland, Amsterdam.
• [11] EFROMOVICH, S. YU. (1985). Nonparametric estimation of a density of unknown smoothness. Theory Probab. Appl. 30 524-534.
• [12] EFROMOVICH, S. YU. and PINSKER, M. S. (1984). Adaptive algorithms for nonparametric filtering. Automat. Remote Control 11 54-60.
• [13] FREIDLINA, V. L. (1975). On one problem of screening experimental design. Theory Probab. Appl. 20 100-114.
• [14] GOLDENSHLUGER, A. and NEMIROVSKI, A. (1997). On spatially adaptive estimation of nonparametric regression. Math. Methods Statist. 6 135-170.
• [15] GOLUBEV, G. K. (1990). Quasilinear estimates of a signal in L2. Problems Inform. Transmission 26 15-20.
• [16] GRAMA, I. and NUSSBAUM, M. (1998). Asymptotic equivalence for nonparametric generalized linear models. Probab. Theory Related Fields 111 167-214.
• [17] HÄRDLE, W. and MARRON, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist. 3 1465-1481.
• [18] HALL, P. (1991). Edgeworth expansions for nonparametric density estimators, with applications. Statistics 22 215-232.
• [19] HALL, P. (1992). Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density. Ann. Statist. 20 675-694.
• [20] HALL, P., KERKYACHARIAN, G. and PICARD, D. (1998). Block threshold rules for curve estimation using kernel and wavelet methods. Ann. Statist. 26 922-942.
• [21] HALL, P., KERKYACHARIAN, G. and PICARD, D. (1999). A note on the wavelet oracle. Statist. Probab. Lett. 43 415-420.
• [22] HALL, P., KERKYACHARIAN, G. and PICARD, D. (1999). On the minimax optimality of block thresholding wavelet estimators. Statist. Sinica 9 33-49.
• [23] IBRAGIMOV, I. A. and KHASMINSKI, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Springer, New York.
• [24] IBRAGIMOV, I. A. and KHASMINSKI, R. Z. (1984). More on estimation of the density of a distribution. J. Soviet. Math. 25 1155-1165.
• [25] LEPSKI, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454-466.
• [26] LEPSKI, O. V. (1991). Asymptotic minimax adaptive estimation. 1. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 682-697.
• [27] LEPSKI, O. V. (1992). Asymptotic minimax adaptive estimation. 2. Statistical models without optimal adaptation. Adaptive estimators. Theory Probab. Appl. 37 433-448.
• [28] LEPSKI, O. V. (1992). On problems of adaptive estimation in white Gaussian noise. In Advances in Soviet Mathematics (R. Z. Khasminskii, ed.) 12 87-106. Amer. Math. Soc., Providence, RI.
• [29] LEPSKI, O. V. (1999). How to improve the accuracy of estimation. Math. Methods Statist. 8 441-486.
• [30] LEPSKI, O. V. and SPOKOINY, V. G. (1995). Local adaptation to inhomogeneous smoothness: resolution level. Math. Methods Statist. 4 239-258.
• [31] LEPSKI, O. V. and SPOKOINY, V. G. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 2512-2546.
• [32] LEPSKI, O. V., MAMMEN, E. and SPOKOINY, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929-947.
• [33] LOW, M. G. (1997). On nonparametric confidence intervals. Ann. Statist. 25 2547-2554.
• [34] MALYUTOV, M. B. and TSITOVICH, I. I. (1996). On sequential search for significant variables of unknown function. In Proceedings of 6th Lukacs Symposium 155-178. VSP, Utrecht.
• [35] MESHALKIN, P. S. (1970). To the justification of random balance method. Industrial Laboratory 36.
• [36] NEUMANN, M. (1995). Automatic bandwidth choice and confidence intervals in nonparametric regression. Ann. Statist. 23 1937-1959.
• [37] NEUMANN, M. and VON SACHS, R. (1995). Wavelet thresholding in anisotropic function classes and application to the adaptive estimation of evolutionary spectra. Preprint, WIAS, Berlin.
• [38] NIKOLSKII, S. M. (1975). Approximation of Functions of Several Variables and Imbedding Theorems. Springer, Berlin.
• [39] NUSSBAUM, M. (1983). Optimal filtration of a function of many variables in white Gaussian noise. Problems Inform. Transmission 19 23-29.
• [40] NUSSBAUM, M. (1986). On nonparametric estimation of a regression function, being smooth on a domain in Rk. Theory Probab. Appl. 31 118-125.
• [41] NUSSBAUM, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist. 24 2399-2430.
• [42] PATEL, M. S., ed. (1987). Experiments in Factor Screening. Comm. Statist. Theory Methods 16(10). (Special issue.)
• [43] PETROV, V. V. (1975). Sums of Independent Random Variables. Springer, Berlin.
• [44] PICARD, D. and TRIBOULEY, K. (2000). Adaptive confidence intervals for pointwise curve estimation. Ann. Statist. 28 298-335.
• [45] POLYAK, B. T. and TSYBAKOV, A. B. (1990). Asymptotic optimality of the Cp test in the projection estimation of a regression. Theory Probab. Appl. 35 293-306.
• [46] RÉNYI, A. (1965). On the theory of random search. Bull. Amer. Math. Soc. 71 809-828.
• [47] SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464.
• [48] STONE, C. J. (1986). The dimensionality reduction principle for generalized additive models. Ann. Statist. 14 590-606.
• TOPOLOGIE, PROBABILITÉS CENTRE DE MATHÉMATIQUES ET D 'INFORMATIQUE CNRS-UMR 6632 UNIVERSITÉ DE PROVENCE 39, RUE JOLIOT CURIE 13453 MARSEILLE CEDEX 13 FRANCE E-MAIL: lepski@cmi.univ-mrs.fr
• Lepski (1999). In Section 5 of Lepski [3], the author gives two concrete applications of his ideas after having listed in Section 4 a number of potentially interesting problems to be solved, concluding that "the treatment of minimax risk with RNFs for multidimensional models is the subject of a series of forthcoming papers." The paper by Marc Hoffmann and Oleg Lepski (hereafter H&L) to be discussed is one of these and, in view of the numerous connections between both papers, my discussion will deal with the two of them simultaneously.
• [1] BIRGÉ, L. and ROZENHOLC, Y. (2002). How many bins should be put in a regular histogram? Technical report, Univ. Paris VI.
• [2] CASTELLAN, G. (1999). Modified Akaike's criterion for histogram density estimation. Technical Report, Univ. Paris-Sud, Orsay.
• [3] LEPSKI, O. (1999). How to improve the accuracy of estimation. Math. Methods Statist. 8 441- 486.
• , is = 2 log 1 s-1 m/(2m+1). Notations A (is), (is) and (is) have the same definitions as those in HL, but with our new T (is), (, is), (), and (0) (is). We will show that Theorem 1 in HL holds in our case with our definitions (ignoring the constants). To do that, we first prove Lemma 1. This is a different lemma since all the quantities involved are defined differently now. Therefore we provide a detailed proof:
• [1] BROWN, L. D. and LOW, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384-2398.
• [2] LIN, Y. (2000). Tensor product space ANOVA models. Ann. Statist. 28 734-755.
• [3] NUSSBAUM, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist. 24 2399-2430.
• [4] STONE, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation (with discussion). Ann. Statist. 22 118-184.
• [5] WAHBA, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
• [6] WAHBA, G., WANG, Y., GU, C., KLEIN, R. and KLEIN, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Statist. 23 1865-1895.
• PHILADELPHIA, PENNSYLVANIA 19104-6302 E-MAIL: lbrown@wharton.upenn.edu
• [1] BROWN, L. D., LOW, M. G. and ZHAO, L. H. (1997). Superefficiency in nonparametric function estimation. Ann. Statist. 25 2607-2625.
• [2] EFROMOVICH, S. (1999). Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.
• [3] LEPSKI, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454-466.
• [4] LEPSKI, O. V. (1992). Asymptotic minimax adaptive estimation. 2. Statistical models without optimal adaptation. Adaptive estimators. Theory Probab. Appl. 37 433-448.
• ALBUQUERQUE, NEW MEXICO 87131-1141 E-MAIL: efrom@math.unm.edu
• [1] FARAWAY, J. (1990). Bootstrap selection of bandwidth and confidence bands for nonparametric regression. J. Statist. Comput. Simulation 37 37-44.
• [2] HALL, P. (1991). Edgeworth expansions for nonparametric density estimators, with applications. Statistics 22 215-232.
• [3] HALL, P. (1992). Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density. Ann. Statist. 20 675-694.
• [4] KERKYACHARIAN, G. and PICARD, D. (2000). Minimax or maxisets? Technical report, Univ. Paris VI and Paris VII.
• [5] LOW, M. G. (1997). On nonparametric confidence intervals. Ann. Statist. 25 2547-2554.
• [6] NEUMANN, M. (1995). Automatic bandwidth choice and confidence intervals in nonparametric regression. Ann. Statist. 23 1937-1959.
• [7] PICARD, D. and TRIBOULEY, K. (2000). Adaptive confidence interval for pointwise curve estimation. Ann. Statist. 28 298-335.
• ,k = kYk, k = 1 2|Bm| k Bm Y2k +, k Bm, m 1,..., J d, (2)
• [1] CAVALIER, L., GOLUBEV, YU., PICARD, D. and TSYBAKOV, A. B. (2000). Oracle inequalities for inverse problems. Ann. Statist. To appear.
• [2] CAVALIER, L. and TSYBAKOV, A. B. (2000). Sharp adaptation for inverse problems with random noise. Probab. Theory Related Fields. To appear. Available at www.proba. jussieu.fr.
• [3] CAVALIER, L. and TSYBAKOV, A. B. (2001). Penalized blockwise Stein's method, monotone oracles and sharp adaptive estimation. Math. Methods Statist. 10 247-282.
• [4] GOLDENSHLUGER, A. and TSYBAKOV, A. (2001). Adaptive prediction and estimation in linear regression with infinitely many parameters. Ann. Statist. 29 1601-1619.
• [5] JOHNSTONE, I. M. (1998). Function estimation in Gaussian noise: sequence models. (Draft of a monograph; available at http://www-stat.stanford.edu/.)
• [6] KERKYACHARIAN, G., LEPSKI, O. and PICARD, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields. 121 137-170.
• [7] LEPSKI, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454-466.
• [8] LEPSKI, O. V. (1999). How to improve the accuracy of estimation. Math. Methods Statist. 8 441-486.
• [9] LEPSKI, O. V. and LEVIT, B. YA. (1999). Adaptive nonparametric estimation of smooth multivariate functions. Math. Methods Statist. 8 344-370.
• [10] LI, K.-C. (1989). Honest confidence regions for nonparametric regression. Ann. Statist. 17 1001-1008.
• [11] LOW, M. G. (1997). On nonparametric confidence intervals. Ann. Statist. 25 2547-2554.
• [12] POLZEHL, J. and SPOKOINY, V. G. (1999). Image denoising: pointwise adaptive approach. Preprint, Weierstrass Inst., Berlin.
• [13] TSYBAKOV, A. B. (2001a). Introduction à l'estimation non-paramétrique. Unpublished manuscript (book, submitted for publication).
• [14] TSYBAKOV, A. B. (2001b). Optimal aggregation of classifiers in statistical learning. Preprint 682, Laboratoire de Probabilités et Modèles Aléatoires, Univ. Paris VI and Paris VII. (Available at http://www.proba.jussieu.fr/mathdoc/preprints/index.html#2001.)
• [1] BARRON, A., BIRGÉ, L. and MASSART, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413.
• [2] DÜMBGEN, L. and SPOKOINY, V. G. (2001). Multiscale testing of qualitative hypothesis. Ann. Statist. 29 124-152.
• [3] LEPSKI, O. V. (1999). How to improve the accuracy of estimation. Math. Methods Statist. 8 441-486.
• [4] PICARD, D. and TRIBOULEY, K. (2000). Adaptive confidence intervals for pointwise curve estimation. Ann. Statist. 28 298-335.
• [1] BARRON, A., BIRGÉ, L. and MASSART, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413.
• [2] INGSTER, Y. I. (1993). Asymptotically minimax hypothesis testing for nonparametric alternatives, I-III. Math. Methods Statist. 2 85-114, 171-189, 249-268.
• [3] IOUDITSKY, A. and LEPSKI, O. (2001). Evaluation of the accuracy of nonparametric estimators. Math. Methods Statist. To appear.
• [4] LEPSKI O. V. (1991). Asymptotic minimax adaptive estimation. 1. Upper bounds. Theory Probab. Appl. 36 682-697.
• [5] KERKYACHARIAN, G., LEPSKI, O. and PICARD, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137-170.
• [6] NEUMANN, M. H. and VON SACHS, R. (1997). Wavelet threshholding in anisotropic function classes and application to adaptive estimation of evolutionary spectra. Ann. Statist. 25 38-76.
• TOPOLOGIE, PROBABILITÉS CENTRE DE MATHÉMATIQUES ET D 'INFORMATIQUE CNRS-UMR 6632 UNIVERSITÉ DE PROVENCE 39, RUE JOLIOT CURIE, 13453 MARSEILLE CEDEX 13 FRANCE E-MAIL: lepski@cmi.univ-mrs.fr