Open Access
August 2002 Analyzing bagging
Peter Bühlmann, Bin Yu
Ann. Statist. 30(4): 927-961 (August 2002). DOI: 10.1214/aos/1031689014

Abstract

Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to analyze the variance reduction effect of bagging (or variants thereof) in mainly hard decision problems, which include estimation after testing in regression and decision trees for regression functions and classifiers. Hard decisions create instability, and bagging is shown to smooth such hard decisions, yielding smaller variance and mean squared error. With theoretical explanations, we motivate subagging based on subsampling as an alternative aggregation scheme. It is computationally cheaper but still shows approximately the same accuracy as bagging. Moreover, our theory reveals improvements in first order and in line with simulation studies.

In particular, we obtain an asymptotic limiting distribution at the cube-root rate for the split point when fitting piecewise constant functions. Denoting sample size by n, it follows that in a cylindric neighborhood of diameter $n^{-1/3}$ of the theoretically optimal split point, the variance and mean squared error reduction of subagging can be characterized analytically. Because of the slow rate, our reasoning also provides an explanation on the global scale for the whole covariate space in a decision tree with finitely many splits.

Citation

Download Citation

Peter Bühlmann. Bin Yu. "Analyzing bagging." Ann. Statist. 30 (4) 927 - 961, August 2002. https://doi.org/10.1214/aos/1031689014

Information

Published: August 2002
First available in Project Euclid: 10 September 2002

zbMATH: 1029.62037
MathSciNet: MR1926165
Digital Object Identifier: 10.1214/aos/1031689014

Subjects:
Primary: 62G08
Secondary: 62G09 , 62H30 , 68T10

Keywords: bootstrap , ‎classification‎ , decision tree , MARS , Model selection , multiple predictions , Nonparametric regression

Rights: Copyright © 2002 Institute of Mathematical Statistics

Vol.30 • No. 4 • August 2002
Back to Top