Bayesian Analysis

A Bayes Interpretation of Stacking for $\mathcal{M}$-Complete and $\mathcal{M}$-Open Settings

Tri Le and Bertrand Clarke

Full-text: Access has been disabled (more information)

Abstract

In ${\mathcal{M}}$-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in ${\mathcal{M}}$-complete problems, taking a predictive approach can be very useful. Stacking is a model averaging procedure that gives a composite predictor by combining individual predictors from a list of models using weights that optimize a cross-validation criterion. We show that the stacking weights also asymptotically minimize a posterior expected loss. Hence we formally provide a Bayesian justification for cross-validation. Often the weights are constrained to be positive and sum to one. For greater generality, we omit the positivity constraint and relax the ‘sum to one’ constraint.

A key question is ‘What predictors should be in the average?’ We first verify that the stacking error depends only on the span of the models. Then we propose using bootstrap samples from the data to generate empirical basis elements that can be used to form models. We use this in two computed examples to give stacking predictors that are (i) data driven, (ii) optimal with respect to the number of component predictors, and (iii) optimal with respect to the weight each predictor gets.

Article information

Source
Bayesian Anal. Volume 12, Number 3 (2017), 807-829.

Dates
First available in Project Euclid: 7 September 2016

Permanent link to this document
http://projecteuclid.org/euclid.ba/1473276261

Digital Object Identifier
doi:10.1214/16-BA1023

Keywords
stacking cross-validation Bayes action prediction problem classes optimization constrains

Rights
Creative Commons Attribution 4.0 International License.

Citation

Le, Tri; Clarke, Bertrand. A Bayes Interpretation of Stacking for M -Complete and M -Open Settings. Bayesian Anal. 12 (2017), no. 3, 807--829. doi:10.1214/16-BA1023. http://projecteuclid.org/euclid.ba/1473276261.


Export citation

References

  • Bernardo, J. and Smith, A. (2000). Bayesian Theory. Chichester: John Wiley & Sons.
  • Breiman, L. (1996). “Stacked regressions.” Machine Learning, 24: 49–64.
  • Clarke, B. (2003). “Bayes model averaging and stacking when model approximation error cannot be ignored.” Journal of Machine Learning Research, 683–712.
  • Clyde, M. (2012). “Bayesian perspectives on combining models.” Slides from presentation at ISBA Kyoto, 648–649.
  • Clyde, M. and Iversen, E. (2013). “Bayesian model averaging in the $\mathcal{M}$-open framework.” In Damien, P., Dellaportas, P., Polson, N., and Stephens, D. (eds.), Bayesian Theory and Applications, 484–498. Oxford: Oxford University Press.
  • Franz, T., Wang, T., Avery, W., Finkenbiner, C., and Brocca, L. (2015). “Combined analysis of soil moisture measurements from roving and fixed cosmic ray neutron probes for multiscale real-time monitoring.” Geophysical Research Letters, 42: 3389–3396.
  • Le, T. and Clarke, B. (2016). “Using the Bayesian Shtarkov solution for predictions.” Computational Statistics and Data Analysis, 104: 183–196.
  • Le, T. and Clarke, B. (2016). “Supplementary Appendices of “A Bayes interpretation of stacking for $\mathcal{M}$-complete and $\mathcal{M}$-open settings”.” Bayesian Analysis.
  • Minka, T. P. (2002). “Bayesian model averaging is not model combination.” http://research.microsoft.com/en-us/um/people/minka/papers/minka-bma-isnt-mc.pdf.
  • Nadaraya, E. A. (1964). “On estimating regression.” Theory of Probability and Its Applications, 9: 141–142.
  • Ozay, M. and Vural, F. T. Y. (2012). “A new fuzzy stacked generalization technique and analysis of its performance.” arxiv:1204.0171.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Massachusetts: The MIT Press.
  • Rokach, L. (2010). “Ensemble-based classifiers.” Artificial Intelligence Review, 33: 1–39.
  • Shao, J. (1997). “An asymptotic theory for linear model selection.” Statistica Sinica, 7: 221–264.
  • Sill, J., Takacs, G., Mackey, L., and Lin, D. (2009). “Feature-Weighted Linear Stacking.” arxiv:0911.0460.
  • Smyth, P. and Wolpert, D. (1999). “Linearly combining density estimators via stacking.” Machine Learning Journal, 36: 59–83.
  • Stone, M. (1977). “Asymptotics for and against cross-validation.” Biometrika, 64: 29–38.
  • Ting, K. M. and Witten, I. (1999). “Issues in stacked generalization.” Journal of Artificial Intelligent Research, 10: 271–289.
  • van de Geer, S. (2014). “On the uniform convergence of empirical norms and inner products, with application to causal inference.” Electronic Journal of Statistics, 8: 543–574.
  • Walker, S. G. and Gutierrez-Pena, E. (1999). “Robustifying Bayesian procedures.” Bayesian Statistics, 6: 685–710.
  • Watson, G. S. (1964). “Smooth regression analysis.” The Indian Journal of Statistics, Series A, 26: 359–372.
  • Wolpert, D. (1992). “Stacked generalization.” Neural Networks, 5: 241–259.
  • Wolpert, D. and Macready, W. (1999). “An efficient method to estimate bagging generalization error.” Machine Learning Journal, 35: 41–55.

Supplemental materials

  • Supplementary Material: Supplementary Appendices of “A Bayes interpretation of stacking for M-complete and M-open settings”.