The Annals of Statistics

Rodeo: Sparse, greedy nonparametric regression

John Lafferty and Larry Wasserman

Full-text: Open access


We present a greedy method for simultaneously performing local bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth of variables for which the gradient of the estimator with respect to bandwidth is large. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests to threshold derivatives, and is easy to implement. Under certain assumptions on the regression function and sampling density, it is shown that the rodeo applied to local linear smoothing avoids the curse of dimensionality, achieving near optimal minimax rates of convergence in the number of relevant variables, as if these variables were isolated in advance.

Article information

Ann. Statist., Volume 36, Number 1 (2008), 28-63.

First available in Project Euclid: 1 February 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62G20: Asymptotic properties

Nonparametric regression sparsity local linear smoothing bandwidth estimation variable selection minimax rates of convergence


Lafferty, John; Wasserman, Larry. Rodeo: Sparse, greedy nonparametric regression. Ann. Statist. 36 (2008), no. 1, 28--63. doi:10.1214/009053607000000811.

Export citation


  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Mach. Learn. Res. 7 1001–1024.
  • Donoho, D. (2004). For most large underdetermined systems of equations, the minimal 1-norm near-solution approximates the sparest near-solution. Comm. Pure Appl. Math. 59 797–829.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87 998–1004.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.
  • Fu, W. and Knight, K. (2000). Asymptotics for lasso type estimators. Ann. Statist. 28 1356–1378.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • Girosi, F. (1997). An equivalence between sparse approximation and support vector machines. Neural Comput. 10 1455–1480.
  • Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
  • Hastie, T. and Loader, C. (1993). Local regression: Automatic kernel carpentry. Statist. Sci. 8 120–129.
  • Hastie, T., Tibshirani, R. and Friedman, J. H. (2001). The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, Berlin.
  • Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29 1537–1566.
  • Kerkyacharian, K., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137–170.
  • Lawrence, N. D., Seeger, M. and Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems 15 625–632.
  • Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929–947.
  • Li, L., Cook, R. D. and Nachsteim, C. (2005). Model-free variable selection. J. Roy. Statist. Soc. Ser. B 67 285–299.
  • Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
  • Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. J. Amer. Statist. Assoc. 92 1049–1062.
  • Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares regression. Ann. Statist. 22 1346–1370.
  • Samarov, A., Spokoiny, V. and Vial, C. (2005). Component identification and estimation in nonlinear high-dimensional regression models by structural adaptation. J. Amer. Statist. Assoc. 100 429–445.
  • Smola, A. and Bartlett, P. (2001). Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems 13 619–625.
  • Stone, C. J., Hansen, M. H., Kooperberg, C. and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371–1470.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research 1 211–244.
  • Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2241.
  • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030–1051.
  • Turlach, B. (2004). Discussion of “Least angle regression” by Efron, Hastie, Jonstone and Tibshirani. Ann. Statist. 32 494–499.
  • Zhang, H., Wahba, G., Lin, Y., Voelker, M., Ferris, R. K. and Klein, B. (2005). Variable selection and model building via likelihood basis pursuit. J. Amer. Statist. Assoc. 99 659–672.