Open Access
December 2019 Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction
John R. Tipton, Mevin B. Hooten, Connor Nolan, Robert K. Booth, Jason McLachlan
Ann. Appl. Stat. 13(4): 2363-2388 (December 2019). DOI: 10.1214/19-AOAS1281

Abstract

Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framework for modeling functional responses with respect to a covariate without assuming a functional form. Many scientific disciplines use Gaussian process approximations to improve prediction and make inference on latent processes and parameters. When prediction is desired on unobserved covariates given realizations of the response variable, this is called inverse prediction. Because inverse prediction is often mathematically and computationally challenging, predicting unobserved covariates often requires fitting models that are different from the hypothesized generative model. We present a novel computational framework that allows for efficient inverse prediction using a Gaussian process approximation to generative models. Our framework enables scientific learning about how the latent processes co-vary with respect to covariates while simultaneously providing predictions of missing covariates. The proposed framework is capable of efficiently exploring the high dimensional, multi-modal latent spaces that arise in the inverse problem. To demonstrate flexibility, we apply our method in a generalized linear model framework to predict latent climate states given multivariate count data. Based on cross-validation, our model has predictive skill competitive with current methods while simultaneously providing formal, statistical inference on the underlying community dynamics of the biological system previously not available.

Citation

Download Citation

John R. Tipton. Mevin B. Hooten. Connor Nolan. Robert K. Booth. Jason McLachlan. "Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction." Ann. Appl. Stat. 13 (4) 2363 - 2388, December 2019. https://doi.org/10.1214/19-AOAS1281

Information

Received: 1 March 2019; Published: December 2019
First available in Project Euclid: 28 November 2019

zbMATH: 07160943
MathSciNet: MR4037434
Digital Object Identifier: 10.1214/19-AOAS1281

Keywords: Bayesian hierarchical models , ecological functional response model , model comparison , predictive validation

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 4 • December 2019
Back to Top