Bernoulli

Nonparametric bootstrap prediction

Tadayoshi Fushiki, Fumiyasu Komaki, and Kazuyuki Aihara

Source: Bernoulli Volume 11, Number 2 (2005), 293-307.

Abstract

Ensemble learning has recently been intensively studied in the field of machine learning. `Bagging' is a method of ensemble learning and uses bootstrap data to construct various predictors. The required prediction is then obtained by averaging the predictors. Harris proposed using this technique with the parametric bootstrap predictive distribution to construct predictive distributions, and showed that the parametric bootstrap predictive distribution gives asymptotically better prediction than a plug-in distribution with the maximum likelihood estimator. In this paper, we investigate nonparametric bootstrap predictive distributions. The nonparametric bootstrap predictive distribution is precisely that obtained by applying bagging to the statistical prediction problem. We show that the nonparametric bootstrap predictive distribution gives predictions asymptotically as good as the parametric bootstrap predictive distribution.

Keywords: asymptotic theory; bagging; bootstrap predictive distribution; information geometry; Kullback-Leibler divergence

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.bj/1116340296
Mathematical Reviews number (MathSciNet): MR2132728
Zentralblatt MATH identifier: 1063.62062
Digital Object Identifier: doi:10.3150/bj/1116340296

References

[1] Aitchison, J. (1975) Goodness of predictive fit. Biometrika, 62, 547-554.
Mathematical Reviews (MathSciNet): MR391353
Zentralblatt MATH: 0339.62018
Digital Object Identifier: doi:10.1093/biomet/62.3.547
[2] Amari, S. (1985) Differential-Geometrical Methods in Statistics. New York: Springer-Verlag.
Mathematical Reviews (MathSciNet): MR788689
[3] Amari, S. and Nagaoka, H. (2000) Methods of Information Geometry. New York: AMS and Oxford University Press.
Mathematical Reviews (MathSciNet): MR1800071
[4] Breiman, L. (1996) Bagging predictors. Machine Learning, 24, 123-140.
Mathematical Reviews (MathSciNet): MR1425957
Zentralblatt MATH: 0867.62055
Digital Object Identifier: doi:10.1214/aos/1032181158
Project Euclid: euclid.aos/1032181158
[5] Freund, Y. and Schapire, R. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci., 55, 119-139.
Mathematical Reviews (MathSciNet): MR1473055
Zentralblatt MATH: 0880.68103
Digital Object Identifier: doi:10.1006/jcss.1997.1504
[6] Fushiki, T., Komaki, F. and Aihara, K. (2004) On parametric bootstrapping and Bayesian prediction. Scand. J. Statist., 31 403-416.
Mathematical Reviews (MathSciNet): MR2087833
Digital Object Identifier: doi:10.1111/j.1467-9469.2004.02_127.x
[7] Harris, I.R. (1989) Predictive fit for natural exponential families. Biometrika, 76, 675-684.
Mathematical Reviews (MathSciNet): MR1041412
Zentralblatt MATH: 0679.62021
Digital Object Identifier: doi:10.1093/biomet/76.4.675
[8] Hartigan, J.A. (1998) The maximum likelihood prior. Ann. Statist., 26, 2083-2103.
Mathematical Reviews (MathSciNet): MR1700222
Zentralblatt MATH: 0927.62023
Digital Object Identifier: doi:10.1214/aos/1024691462
Project Euclid: euclid.aos/1024691462
[9] Komaki, F. (1996) On asymptotic properties of predictive distributions. Biometrika, 83, 299-313.
Mathematical Reviews (MathSciNet): MR1439785
Zentralblatt MATH: 0864.62007
Digital Object Identifier: doi:10.1093/biomet/83.2.299
[10] McCullagh, P. (1987) Tensor Methods in Statistics. London: Chapman & Hall.
Mathematical Reviews (MathSciNet): MR907286
Zentralblatt MATH: 0732.62003
[11] Vidoni, P. (1995) A simple predictive density based on the p*-formula. Biometrika, 82, 855-863.
Mathematical Reviews (MathSciNet): MR1380820
Zentralblatt MATH: 0878.62017

2010 © Bernoulli Society for Mathematical Statistics and Probability