Open Access
2007 Variable importance in binary regression trees and forests
Hemant Ishwaran
Electron. J. Statist. 1: 519-537 (2007). DOI: 10.1214/07-EJS039

Abstract

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

Citation

Download Citation

Hemant Ishwaran. "Variable importance in binary regression trees and forests." Electron. J. Statist. 1 519 - 537, 2007. https://doi.org/10.1214/07-EJS039

Information

Published: 2007
First available in Project Euclid: 15 November 2007

zbMATH: 1320.62158
MathSciNet: MR2357716
Digital Object Identifier: 10.1214/07-EJS039

Keywords: CART , maximal subtree , random forests

Rights: Copyright © 2007 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top