Electronic Journal of Statistics

Forest Garrote

Nicolai Meinshausen

Full-text: Open access


Variable selection for high-dimensional linear models has received a lot of attention lately, mostly in the context of 1-regularization. Part of the attraction is the variable selection effect: parsimonious models are obtained, which are very suitable for interpretation. In terms of predictive power, however, these regularized linear models are often slightly inferior to machine learning procedures like tree ensembles. Tree ensembles, on the other hand, lack usually a formal way of variable selection and are difficult to visualize. A Garrote-style convex penalty for trees ensembles, in particular Random Forests, is proposed. The penalty selects functional groups of nodes in the trees. These could be as simple as monotone functions of individual predictor variables. This yields a parsimonious function fit, which lends itself easily to visualization and interpretation. The predictive power is maintained at least at the same level as the original tree ensemble. A key feature of the method is that, once a tree ensemble is fitted, no further tuning parameter needs to be selected. The empirical performance is demonstrated on a wide array of datasets.

Article information

Electron. J. Statist., Volume 3 (2009), 1288-1304.

First available in Project Euclid: 4 December 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Nonnegative Garrote Random Forests sparsity tree ensembles


Meinshausen, Nicolai. Forest Garrote. Electron. J. Statist. 3 (2009), 1288--1304. doi:10.1214/09-EJS434. https://projecteuclid.org/euclid.ejs/1259944247

Export citation


  • Asuncion, A. and Newman, D. (2007). UCI Machine Learning Repository. Available at, http://www.ics.uci.edu/~mlearn/MLRepository.html
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote., Technometrics 37 373–384.
  • Breiman, L. (2001). Random Forests., Machine Learning 45 5–32.
  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)., Classification and Regression Trees. Wadsworth, Belmont.
  • Chen, S., Donoho, S. and Saunders, M. (2001). Atomic Decomposition by Basis Pursuit., SIAM Review 43 129–159.
  • Conlon, E., Liu, X., Lieb, J. and Liu, J. (2003). Integrating regulatory motif discovery and genome-wide expression analysis., Proceedings of the National Academy of Science 100 3339–3344.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least Angle Regression., Annals of Statistics 32 407–451.
  • Friedman, J. (1991). Multivariate adaptive regression splines., Annals of Statistics 19 1–67.
  • Friedman, J. and Popescu, B. (2008). Predictive learning via rule ensembles., Annals of Applied Statistics 2 916–954.
  • Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J. and Tibshirani, R. (2001)., The Elements of Statistical Learning. Springer New York.
  • Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection., Statistica Sinica 16 1273–1284.
  • Lykou, A. and Whittaker, J. (2009). Sparse CCA using a Lasso with positivity constraints., Computational Statistics & Data Analysis, to appear.
  • Meier, L., van de Geer, S. and Buhlmann, P. (2008). The group lasso for logistic regression., Journal of the Royal Statistical Society, Series B 70 53–71.
  • Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso., Annals of Statistics 34 1436–1462.
  • Nash, W., Sellers, T., Talbot, S., Cawthorn, A. and Ford, W. (1994). The Population Biology of Abalone in Tasmania Technical Report, Sea Fisheries, Division.
  • Strobl, C., Boulesteix, A., Zeileis, A. and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution., BMC Bioinformatics 8 25.
  • Strobl, C., Boulesteix, A., Kneib, T., Augustin, T. and Zeileis, A. (2008). Conditional variable importance for random forests., BMC Bioinformatics 9 307.
  • Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., Journal of the Royal Statistical Society, Series B 58 267–288.
  • Yu, B. and Bühlmann, P. (2003). Boosting with the L2 loss: Regression and classification., Journal of the American Statistical Association 98 324–339.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society, Series B 68 49–67.
  • Yuan, M. and Lin, Y. (2007). On the Nonnegative Garrote Estimator., Journal of the Royal Statistical Society, Series B 69 143–161.
  • Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection., Annals of Statistics 37 3468–3497.
  • Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American Statistical Association 101 1418–1429.