Open Access
April 2007 Confidence sets for split points in decision trees
Moulinath Banerjee, Ian W. McKeague
Ann. Statist. 35(2): 543-574 (April 2007). DOI: 10.1214/009053606000001415

Abstract

We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others calibrated using plug-in estimates of some nuisance parameters. The performance of the confidence sets is assessed in a simulation study. A motivation for developing such confidence sets comes from the problem of phosphorus pollution in the Everglades. Ecologists have suggested that split points provide a phosphorus threshold at which biological imbalance occurs, and the lower endpoint of the confidence set may be interpreted as a level that is protective of the ecosystem. This is illustrated using data from a Duke University Wetlands Center phosphorus dosing study in the Everglades.

Citation

Download Citation

Moulinath Banerjee. Ian W. McKeague. "Confidence sets for split points in decision trees." Ann. Statist. 35 (2) 543 - 574, April 2007. https://doi.org/10.1214/009053606000001415

Information

Published: April 2007
First available in Project Euclid: 5 July 2007

zbMATH: 1117.62037
MathSciNet: MR2336859
Digital Object Identifier: 10.1214/009053606000001415

Subjects:
Primary: 62E20 , 62G08 , 62G20

Keywords: CART , Change-point estimation , cube-root asymptotics , Empirical processes , logistic regression , Nonparametric regression , Poisson regression , split point

Rights: Copyright © 2007 Institute of Mathematical Statistics

Vol.35 • No. 2 • April 2007
Back to Top