The Annals of Applied Statistics

Node harvest

Nicolai Meinshausen

Full-text: Open access

Abstract

When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like Random Forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as ‘black box’ predictors.

Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to ‘pick’ the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 4 (2010), 2049-2072.

Dates
First available in Project Euclid: 4 January 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1294167809

Digital Object Identifier
doi:10.1214/10-AOAS367

Mathematical Reviews number (MathSciNet)
MR2829946

Zentralblatt MATH identifier
1220.62084

Keywords
Trees tree ensembles machine learning Random Forests sparsity quadratic programming

Citation

Meinshausen, Nicolai. Node harvest. Ann. Appl. Stat. 4 (2010), no. 4, 2049--2072. doi:10.1214/10-AOAS367. https://projecteuclid.org/euclid.aoas/1294167809


Export citation

References

  • Allen, M. (1999). Do-it-yourself climate prediction. Nature 401 642–642.
  • Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput. 9 1545–1588.
  • Asuncion, A. and Newman, D. (2007). UCI Machine Learning Repository. Univ. California, Irvine, CA.
  • Bartlett, P., Jordan, M. and McAuliffe, J. (2003). Convexity, classification, and risk bounds. Technical report, Dept. Statistics, UC Berkeley.
  • Blanchard, G., Schäfer, C., Rozenholc, Y. and Müller, K. (2007). Optimal dyadic decision trees. Mach. Learn. 66 209–241.
  • Breiman, L. (1996a). Bagging predictors. Mach. Learn. 24 123–140.
  • Breiman, L. (1996b). Stacked regressions. Mach. Learn. 24 49–64.
  • Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Chen, S., Donoho, S. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
  • Conlon, E., Liu, X., Lieb, J. and Liu, J. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl. Acad. Sci. 100 3339–3344.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148–156. Morgan Kauffman, San Francisko, CA.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist. 28 337–407.
  • Friedman, J. and Popescu, B. (2008). Predictive learning via rule ensembles. Ann. Appl. Statist. 2 916–954.
  • Goldfarb, D. and Idnani, A. (1983). A numerically stable dual method for solving strictly convex quadratic programs. Math. Program. 27 1–33.
  • Hastie, T., Friedman, J. and Tibshirani, R. (2001). The Elements of Statistical Learning. Springer, New York.
  • Hastie, T., Tibshirani, R., Botstein, D. and Brown, P. (2001). Supervised harvesting of expression trees. Genome Biol. 2 0003–1.
  • Ishwaran, H., Kogalur, U., Blackstone, E. and Lauer, M. (2006). Random survival forests. Ann. Appl. Statist. 2 841–860.
  • Johns, T., Gregory, J., Ingram, W., Johnson, C., Jones, A., Lowe, J., Mitchell, J., Roberts, D., Sexton, D., Stevenson, D. et al. (2003). Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dynamics 20 583–612.
  • Liaw, A. and Wiener, M. (2002). Classification and regression by random forest. R News 2 18–22.
  • Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. 101 578–590.
  • Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43 570–577.
  • Meinshausen, N. (2009). Forest Garrote. Electron. J. Statist. 3 1288–1304.
  • Nash, W., Sellers, T., Talbot, S., Cawthorn, A. and Ford, W. (1994). The population biology of abalone in Tasmania. Technical report, Sea Fisheries Division.
  • Oakley, J. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. Roy. Statist. Soc. Ser. B 66 751–769.
  • R Development Core Team (2005). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Strobl, C., Boulesteix, A., Zeileis, A. and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8 25.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Wolpert, D. (1992). Stacked generalization. Neural Networks 5 241–259.
  • Yu, B. and Bühlmann, P. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.