Bayesian Analysis

Multivariate Stochastic Process Models for Correlated Responses of Mixed Type

Tony Pourmohamad and Herbert K. H. Lee

Full-text: Open access

Abstract

We propose a new model for correlated outputs of mixed type, such as continuous and binary outputs, with a particular focus on joint regression and classification, motivated by an application in constrained optimization for computer simulation modeling. Our framework is based upon multivariate stochastic processes, extending Gaussian process methodology for modeling of continuous multivariate spatial outputs by adding a latent process structure that allows for joint modeling of a variety of types of correlated outputs. In addition, we implement fully Bayesian inference using particle learning, which allows us to conduct fast sequential inference. We demonstrate the effectiveness of our proposed methods on both synthetic examples and a real world hydrology computer experiment optimization problem where it is helpful to model the black box objective function as correlated with satisfaction of the constraint.

Article information

Source
Bayesian Anal., Volume 11, Number 3 (2016), 797-820.

Dates
First available in Project Euclid: 8 October 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1444308210

Digital Object Identifier
doi:10.1214/15-BA976

Mathematical Reviews number (MathSciNet)
MR3498046

Zentralblatt MATH identifier
1359.62402

Keywords
Gaussian process particle learning Bayesian statistics constrained optimization computer simulation experiment

Citation

Pourmohamad, Tony; Lee, Herbert K. H. Multivariate Stochastic Process Models for Correlated Responses of Mixed Type. Bayesian Anal. 11 (2016), no. 3, 797--820. doi:10.1214/15-BA976. https://projecteuclid.org/euclid.ba/1444308210


Export citation

References

  • Albert, J. H. and Chib, S. (1993). “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association, 88: 669–679.
  • Banerjee, S. and Gelfand, A. E. (2002). “Prediction, Interpolation, and Regression for Spatial Misaligned Data Points.” Sankhya, 64: 227–245.
  • Banerjee, S., Gelfand, A. E., and Carlin, B. P. (2004). Hierarchical Modeling and Analysis for Spatial Data. New York: Chapman & Hall/CRC.
  • Bonilla, E. V., Chai, K. M., and Williams, C. K. I. (2008). “Multi-task Gaussian Process Prediction.” In: Platt, J. C., Koller, D., Singer, Y., and Rowesi, S. (eds.), Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press.
  • Carvalho, C. M., Johannes, M. S., Lopes, H. F., and Polson, N. G. (2010). “Particle Learning and Smoothing.” Statistical Science, 25: 88–106.
  • Chan, A. B. (2013). “Multivariate Generlaized Gaussian Process Models.” Technical report, Department of Computer Science, City University of Hong Kong.
  • Conti, S. and O’Hagan, A. (2010). “Bayesian Emulation of Complex Multi-output and Dynamic Computer Models.” Journal of Statistical Planning and Inference, 140: 640–651.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data, revised edition. John Wiley & Sons.
  • Diggle, P. J. and Ribeiro Jr., P. J. (2007). Model-based Geostatistics. Springer.
  • Fowler, K. R., Reese, J. P., Kees, C. E., Dennis Jr., J. E., Kelley, C. T., Miller, C. T., Audet, C., Booker, A. J., Couture, G., Darwin, R. W., Farthing, M. W., Finkel, D. E., Gablonsky, G., Gray, G., and Kolda, T. G. (2008). “A Comparison of Derivative-free Optimization Methods for Water Supply and Hydraulic Capture Community Problem.” Advances in Water Resources, 31: 743–757.
  • Fricker, T. E., Oakley, J. E., and Urban, N. M. (2013). “Multivariate Gaussian Process Emulators with Nonseparable Covariance Structures.” Technometrics, 55: 47–56.
  • Gaspari, G. and Cohn, S. E. (1999). “Construction of Correlation Functions in Two and Three Dimensions.” Quarterly Journal of the Royal Metereological Society, 125: 723–757.
  • Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary Multivariate Process Modeling through Spatially Varying Coregionalization.” Test, 13: 263–312.
  • Goulard, M. and Voltz, M. (1992). “Linear Coregionalization Model: Tools for Estimation and Choice of Cross-covariogram Matrix.” Mathematical Geology, 21: 269–286.
  • Gramacy, R. B. (2012). plgp: Particle Learning of Gaussian Processes. R package version 1.1-5. http://CRAN.R-project.org/package=plgp
  • Gramacy, R. B. and Lee, H. K. H. (2008). “Bayesian Treed Gaussian Process Models with an Application to Computer Modeling.” Journal of the American Statistical Association, 103: 1119–1130.
  • Gramacy, R. B. and Lee, H. K. H. (2012). “Cases for the Nugget in Modeling Computer Experiments.” Statistics and Computing, 22: 713–722.
  • Gramacy, R. B. and Polson, N. G. (2011). “Particle Learning of Gaussian Process Models for Sequential Design and Optimization.” Journal of Computational and Graphical Statistics, 20: 102–118.
  • Gupta, A. K. and Nagar, D. K. (2000). Matrix Variate Distributions. Boca Raton, Florida: Chapman & Hall/CRC Press.
  • Hayashi, K., Takenouchi, T., Tomioka, R., and Kashima, H. (2012). “Self-measuring Similarity for Multi-task Gaussian Process.” In: Guyon, I., Dror, G., Lemaire, V., Taylor, G. W., and Silver, D. L. (eds.), ICML Unsupervised and Transfer Learning, volume 27 of JMLR Proceedings, 145–154. JMLR.org.
  • Higdon, D. (2002). “Space and Space–time Modeling Using Process Convolutions.” In: Anderson, C., Barnett, V., Chatwin, P. C., and El-Shaarawi, A. H. (eds.), Quantitative Methods for Current Environmental Issues, 37–56. London: Springer-Verlag.
  • Jones, D., Schonlau, M., and Welch, W. J. (1998). “Efficient Global Optimization of Expensive Black Box Functions.” Journal of Global Optimization, 13: 455–492.
  • Lee, H. K. H., Gramacy, R. B., Linkletter, C., and Gray, G. A. (2011). “Optimization Subject to Hidden Constraints via Statistical Emulation.” Pacific Journal of Optimization, 7: 467–478.
  • Liang, W. W. J. and Lee, H. K. H. (2014). “Sequential Process Convolution Gaussian Process Models via Particle Learning.” Statistics and Its Interface, 7: 465–475.
  • Lindberg, D. and Lee, H. K. H. (2015). “Optimization Under Constraints by Applying an Asymmetric Entropy Measure.” Journal of Computational and Graphical Statistics, 24: 379–393.
  • Liu, X., Chen, F., Lu, Y., and Lu, C. (2013). “Spatial Prediction of Large Multivariate Non-Gaussian Data.” Technical report, Department of Computer Science, Virginia Tech.
  • MacEachern, S. M., Clyde, M., and Liu, J. S. (1999). “Sequential Importance Sampling for Nonparametric Bayes Models: The Next Generation.” Canadian Journal of Statistics, 27: 251–267.
  • Majumdar, A. and Gelfand, A. E. (2007). “Multivariate Spatial Modeling for Geostatistical Data Using Convolved Covariance Functions.” Mathematical Geology, 39: 229–245.
  • Mardia, K. V. and Goodall, C. R. (1993). “Spatial–Temporal Analysis of Multivariate Environmental Monitoring Data.” Multivariate Environmental Statistics, 6(76): 347–385.
  • Mayer, A. S., Kelley, C. T., and Miller, C. T. (2002). “Optimal Design for Problems Involving Flow and Transport Phenomena in Saturated Subsurface Systems.” Advances in Water Resources, 25: 1233–1256.
  • McDonald, M. G. and Harbaugh, A. W. (1996). “Programmer’s Documentation for MODFLOW-96, an Update to the U.S. Geological Survey Modular Finite Difference Ground-water Flow Model.” Technical report, Open-File Report 96-486, U. S. Geological Survey.
  • McKay, M. D., Conover, W. J., and Beckman, R. J. (1979). “A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code.” Technometrics, 21: 239–245.
  • Montagna, S. and Tokdar, S. T. (2013). “Computer Emulation with Non-stationary Gaussian Processes.” Technical report, Duke University.
  • Moustaki, I. and Knott, M. (2000). “Generalized Latent Trait Models.” Psychometrika, 65: 391–411.
  • Neal, R. M. (1999). “Regression and classification using Gaussian process priors (with discussion).” In: J. M. Bernardo, et al. (ed.), Bayesian Statistics 6, 476–501. Oxford University Press.
  • Prado, R. and West, M. (2010). Time Series: Modelling, Computation & Inference. Boca Raton, FL: Chapman & Hall/CRC Press.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press.
  • Rowe, D. B. (2003). Multivariate Bayesian Statistics: Models for Source Separation and Signal Unmixing. Boca Raton, FL: Chapman & Hall/CRC.
  • Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). “Design and Analysis of Computer Experiments.” Statistical Science, 4: 409–435.
  • Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). “Latent Variable Models for Mixed Discrete and Continuous Outcomes.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 59: 667–678.
  • Santner, T. J., Williams, B. J., and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York, NY: Springer-Verlag.
  • Storvik, G. (2002). “Particle Filters in State Space Models with the Presence of Unknown Static Parameters.” IEEE Transactions on Signal Processing, 50: 281–289.
  • Taddy, M., Lee, H. K. H., Gray, G. A., and Griffin, J. D. (2009). “Bayesian Guided Pattern Search for Robust Local Optimization.” Technometrics, 51: 389–401.
  • Urban, N. M. and Fricker, T. E. (2010). “A Comparison of Latin Hypercube and Grid Ensemble Designs for the Multivariate Emulation of an Earth System Model.” Computers and Geosciences, 36: 746–755.
  • Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications. New York, NY: Springer.
  • Xu, Z., Yan, F., and Qi, Y. (2012). “Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis.” In: Proceedings of the 29th International Conference on Machine Learning, ICML’12.
  • Yu, K., Tresp, V., and Schwaighofer, A. (2005). “Learning Gaussian Processes from Multiple Tasks.” In: Proceedings of the 22nd International Conference on Machine Learning, ICML’05, 1012–1019. New York, NY, USA: ACM.
  • Zhe, S., Qi, Y., Youngja, P., Molloy, I., and Chari, S. (2013). “Dintucker: Scaling Up Gaussian Process Models on Multidimensional Arrays with Billions of Elements.” Technical report, arXiv:1311.2663.
  • Zhe, S., Xu, Z., Chu, X., Qi, Y., and Park, Y. (2015). “Scalable Nonparametric Multiway Data Analysis.” Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 1125–1134.