The Annals of Applied Statistics

Predictive learning via rule ensembles

Jerome H. Friedman and Bogdan E. Popescu

Full-text: Open access


General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects.

Article information

Ann. Appl. Stat. Volume 2, Number 3 (2008), 916-954.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Regression classification learning ensembles rules interaction effects variable importance machine learning data mining


Friedman, Jerome H.; Popescu, Bogdan E. Predictive learning via rule ensembles. Ann. Appl. Stat. 2 (2008), no. 3, 916--954. doi:10.1214/07-AOAS148.

Export citation


  • Breiman, L. (1996). Bagging predictors. Machine Learning 26 123–140.
  • Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
  • Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Clark, P. and Niblett, R. (1989). The CN2 induction algorithm. In Machine Learning 3 261–284.
  • Cohen, W. (1995). Fast efficient rule induction. Machine Learning: Proceedings of the Twelfth International Conference 115–123. Morgan Kaufmann, Lake Tahoe, CA.
  • Cohen, W. and Singer, Y. (1999). A simple, fast and efficient rule learner. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI–99) 335–342. AAAI Press.
  • Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: asymptotia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301–337.
  • Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
  • Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference 148–156. Morgan Kauffman, San Francisco.
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • Friedman, J. H. and Hall, P. (2007). On bagging and nonlinear estimation. J. Statist. Plann. Inference 137 669–683.
  • Friedman, J. H. and Popescu, B. E. (2003). Importance sampled learning ensembles. Technical report, Dept. Statistics, Stanford Univ.
  • Friedman, J. H. and Popescu, B. E. (2004). Gradient directed regularization for linear regression and classification. Technical report, Dep. Statist. Dept. Statistics, Stanford Univ.
  • Harrison, D. and Rubinfield, D. C. (1978). Hedonic prices and the demand for clean air. J. Environmental Economics and Management 8 276–290.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastie, T., Tibshirani, R. and Friedman, J. H. (2001). Elements of Statistical Learning. Springer, New York.
  • Ho, T. K. and Kleinberg, E. M. (1996). Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition 880–885. Vienna, Austria.
  • Hooker, G. (2004). Black box diagnostics and the problem of extrapolation: extending the functional ANOVA. Technical report, Dept. Statistics, Stanford Univ.
  • Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 53 73–101.
  • Jiang, T. and Owen, A. B. (2001). Quasi-regression for visualization and interpretation of black box functions. Technical report, Dept. Statistics, Stanford Univ.
  • Kleinberg, E. M. (1996). An overtraining-resistant stochastic modelling method for pattern recognition. Ann. Statist. 24 2319–2349.
  • Kleinberg, E. M. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Trans. Anal. Machine Intelligence 22 473–490.
  • Lavrač, N. and Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood.
  • Mitchell, T. (1997). Machine Learning. McGraw-Hill, New York.
  • Muggleton, S. (1995). Inverse entailment and PROGOL. New Generation Computing 13 245–286.
  • Owen, A. B. (2001). The dimension distribution and quadrature test functions. Statist. Sinica 13 1–17.
  • Pfahringer, B., Holmes, G. and Weng, C. (2004). Millions of random rules. In Proceedings of the 15th European Conference on Machine Learning (ECML/PKDD 2004). Morgan Kaufmann, San Mateo.
  • Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.
  • Roosen, C. (1995). Visualization and exploration of high–dimensional functions using the functional Anova decomposition. PH.D. thesis, Dept. Statistics, Stanford Univ.
  • Rosset, S. and Inger, I. (2000). KDD–CUP 99: Knowledge discovery in a charitable organization’s donor data base. SIGKDD Explorations 1 85–90.
  • Ruckert, U. and Kramer, S. (2006). A statistical approach to learning. In Proceedings of the 23rd International Conference on Machine Learning. Morgan Kaufmann, San Mateo.
  • Ruczinski, I., Kooperberg, C. and LeBlanc, M. L. (2003). Logic regression. J. Comput. Graph. Statist. 12 475–511.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Weiss, S. and Indurkhya, N. (2000). Lightweight rule induction. In Proceedings of the 17th International Conference on Machine Learning (P. Langley, ed.) 1135–1142. Morgan Kaufmann, San Mateo.