Electronic Journal of Statistics

Variable importance in binary regression trees and forests

Hemant Ishwaran

Full-text: Open access

Abstract

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

Article information

Source
Electron. J. Statist. Volume 1 (2007), 519-537.

Dates
First available: 15 November 2007

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1195157166

Digital Object Identifier
doi:10.1214/07-EJS039

Mathematical Reviews number (MathSciNet)
MR2357716

Zentralblatt MATH identifier
05274627

Citation

Ishwaran, Hemant. Variable importance in binary regression trees and forests. Electronic Journal of Statistics 1 (2007), 519--537. doi:10.1214/07-EJS039. http://projecteuclid.org/euclid.ejs/1195157166.


Export citation