Source: Ann. Statist.
Volume 36, Number 1
We present a greedy method for simultaneously performing local bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth of variables for which the gradient of the estimator with respect to bandwidth is large. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests to threshold derivatives, and is easy to implement. Under certain assumptions on the regression function and sampling density, it is shown that the rodeo applied to local linear smoothing avoids the curse of dimensionality, achieving near optimal minimax rates of convergence in the number of relevant variables, as if these variables were isolated in advance.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR726392
Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Mach. Learn. Res. 7 1001–1024.
Donoho, D. (2004). For most large underdetermined systems of equations, the minimal ℓ1-norm near-solution approximates the sparest near-solution. Comm. Pure Appl. Math. 59 797–829.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87 998–1004.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.
Fu, W. and Knight, K. (2000). Asymptotics for lasso type estimators. Ann. Statist. 28 1356–1378.
George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
Girosi, F. (1997). An equivalence between sparse approximation and support vector machines. Neural Comput. 10 1455–1480.
Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
Hastie, T. and Loader, C. (1993). Local regression: Automatic kernel carpentry. Statist. Sci. 8 120–129.
Hastie, T., Tibshirani, R. and Friedman, J. H. (2001). The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, Berlin.
Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29 1537–1566.
Kerkyacharian, K., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137–170.
Lawrence, N. D., Seeger, M. and Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems 15 625–632.
Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929–947.
Li, L., Cook, R. D. and Nachsteim, C. (2005). Model-free variable selection. J. Roy. Statist. Soc. Ser. B 67 285–299.
Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
Mathematical Reviews (MathSciNet): MR760684
Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J. Amer. Statist. Assoc. 92 1049–1062.
Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares regression. Ann. Statist. 22 1346–1370.
Samarov, A., Spokoiny, V. and Vial, C. (2005). Component identification and estimation in nonlinear high-dimensional regression models by structural adaptation. J. Amer. Statist. Assoc. 100 429–445.
Smola, A. and Bartlett, P. (2001). Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems 13 619–625.
Stone, C. J., Hansen, M. H., Kooperberg, C. and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371–1470.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research 1 211–244.
Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2241.
Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030–1051.
Turlach, B. (2004). Discussion of “Least angle regression” by Efron, Hastie, Jonstone and Tibshirani. Ann. Statist. 32 494–499.
Zhang, H., Wahba, G., Lin, Y., Voelker, M., Ferris, R. K. and Klein, B. (2005). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659–672.