Bayesian Analysis

Additive Multivariate Gaussian Processes for Joint Species Distribution Modeling with Heterogeneous Data

Jarno Vanhatalo, Marcelo Hartmann, and Lari Veneranta

Full-text: Open access


Species distribution models (SDM) are a key tool in ecology, conservation and management of natural resources. Two key components of the state-of-the-art SDMs are the description for species distribution response along environmental covariates and the spatial random effect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distribution models (JSDMs) additionally include interspecific correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical generalized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non-linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models.

Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspecific correlations between the responses. We propose also an efficient approach for inference with Laplace approximation and parameterization of the interspecific covariance matrices on the Euclidean space. We demonstrate the benefits of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpolation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models.

Article information

Bayesian Anal., Volume 15, Number 2 (2020), 415-447.

First available in Project Euclid: 3 June 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 60G15: Gaussian processes 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]
Secondary: 62P12: Applications to environmental and related topics

linear model of coregionalization hierarchical model heterogeneous data spatial prediction model comparison Laplace approximation covariance transformation

Creative Commons Attribution 4.0 International License.


Vanhatalo, Jarno; Hartmann, Marcelo; Veneranta, Lari. Additive Multivariate Gaussian Processes for Joint Species Distribution Modeling with Heterogeneous Data. Bayesian Anal. 15 (2020), no. 2, 415--447. doi:10.1214/19-BA1158.

Export citation


  • Altartouri, A., Nurminen, L., and Jolma, A. (2014). “Modeling the role of the close-range effect and environmental variables in the occurrence and spread of Phragmites australis in four sites on the Finnish coast of the Gulf of Finland and the Archipelago Sea.” Ecology and Evolution, 4: 987–1005.
  • Alvarez, M. A., Luengo, D., Titsias, M. K., and Lawrence., N. D. (2010). “Efficient multi-output Gaussian processes through variational inducing kernels.” JMLR Workshop and conference proceedings, 9: 25–32.
  • Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015). Hierarchical Modelling and Analysis for Spatial Data. Chapman Hall/CRC, second edition.
  • Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modelling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage.” Statistical Sinica.
  • Bergström, U., Olsson, J., Casini, M., Eriksson, B. K., Fredriksson, R., Wennhage, H., and Appelberg, M. (2015). “Stickleback increase in the Baltic Sea – A thorny issue for coastal predatory fish.” Estuarine, Coastal and Shelf Science, 163: 134–142.
  • Bose, M., Hodges, J. S., and Banerjee, S. (2018). “Toward a diagnostic toolkit for linear models with Gaussian-process distributed random effects.” Biometrics, 74(3): 863–873.
  • Busse, S. and Snoeijs, P. (2002). “Gradient responses of diatom communities in the Bothnian Bay, northern Baltic Sea.” Nova Hedwigia, 3–4: 501–525.
  • Byström, P., Bergström, U., Hjälten, A., Ståhl, S., Jonsson, D., and Olsson, J. (2015). “Declining coastal piscivore populations in the Baltic Sea: Where and when do sticklebacks matter?” Ambio, 44: 462–471.
  • Candolin, U., Engström-Öst, J., and Salesto, T. (2008). “Human-induced eutrophication enhances reproductive success through effects on parenting ability in sticklebacks.” Oikos, 117: 459–465.
  • Chib, S. and Greenberg, E. (1998). “Analysis of multivariate probit models.” Biometrika, 85(2): 347–361.
  • Clark, J. S., Gelfand, A., Woodall, C. W., and Zhu, K. (2014). “More than the sum of the parts: forest climate response from joint species distribution models.” Ecological Applications, 24(5): 990–999.
  • Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P. J., and Zhang, S. (2016). “Generalized joint attribute modeling for biodiversity analysis: median-zero, multivariate, multifarious data.” Ecological Monographs, 87(1): 34–56.
  • Cressie, N. and Wikle, C. K. (2011). Statistics for Spatial-Temporal Data. Wiley Series in Probability and Statistics.
  • Dunstan, P. K., Foster, S. D., Hui, F. K. C., and Warton, D. I. (2013). “Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology.” Journal of Agricultural, Biological, and Environmental Statistics, 18(3): 357–375.
  • Duvenaud, D., Nickisch, H., and Rasmussen, C. (2011). “Additive Gaussian Processes.” Neural Information Processing Systems.
  • Elith, J. and Leathwich, J. R. (2009). “Species Distributions Models: Ecological Explanation and Predictions Across Space and Time.” The Annual Review of Ecology, Evolution and Systematics, 40: 677–697.
  • Eriksson, B. K., Ljunggren, L., Sandstrom, A., Johansson, G., Mattila, J., Rubach, A. E., Råberg, S., and Snickars, M. (2009). “Declines in predatory fish promote bloomforming macroalgae.” Ecological Applications, 19: 1975–1988.
  • Eriksson, B. K., Rubach, A., Batsleer, J., and Hillebrand, H. (2012). “Cascading predator control interacts with productivity to determine the trophic level of biomass accumulation in a benthic food web.” Ecological Research, 27: 203–210.
  • Fox, E. B. and Dunson, D. B. (2015). “Bayesian Nonparametric Covariance Regression.” Journal of Machine Learning research, 16(1): 2501–2542.
  • Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2018). “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association, 1(1): 1–8.
  • Gelfand, A. E., Silander, Jr, J. A., Wu, S., Latimer, A., Lewis, P. O., Rebelo, A. G., and Holder, M. (2006). “Explaining Species Distribution Patterns through Hierarchical Modelling.” Bayesian Analysis, 1(1): 41–92.
  • Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multivariate process modeling through spatially varying coregionalization.” Test, 13(2): 263–312.
  • Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis, 1(3): 515–534.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC, third edition.
  • Golding, N. and Purse, B. V. (2016). “Fast and flexible Bayesian species distribution modelling using Gaussian processes.” Methods in Ecology and Evolution, 7: 598–608.
  • Guisan, A., Edwards, T. C., and Hastie, T. (2002). “Generalized linear and generalized additive models in studies of species distributions: setting the scene.” Ecological Modelling, 157(2–3): 89–100.
  • Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I. T., Regan, T. J., Brotons, L., McDonald-Madden, E., Mantyka-Pringle, C., Martin, T. G., Rhodes, J. R., Maggini, R., Setterfield, S. A., Elith, J., Schwartz, M. W., Wintle, B. A., Broennimann, O., Austin, M., Ferrier, S., Kearney, M. R., Possingham, H. P., and Buckley, Y. M. (2013). “Predicting species distributions for conservation decisions.” Ecology Letters, 16(12): 1424–1435.
  • Hartmann, M., Hosack, G. R., Hillary, R. M., and Vanhatalo, J. (2017). “Gaussian process framework for temporal dependence and discrepancy functions in Ricker-type population growth models.” Annals of Applied Statistics, 11(3): 1375–1402.
  • Hastie, T. and Tibishirani, R. (1986). “Generalize additive models.” Statistical Science, 1(3): 297–318.
  • Hodges, J. S. and Reich, B. J. (2010). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love.” The American Statistician, 64(4): 325–334.
  • Hudd, R., Lehtonen, H., and Kurttila, I. (1988). “Growth and abundance of fry; factors which influence the year-class strength of whitefish (Coregonus lavaretus widegreni) in the southern Bothnian Bay (Baltic).” Finnish Fisheries Research, 9: 213–220.
  • Hui, F. K. C., Warton, D. I., Foster, S. D., and Dunstan, P. K. (2013). “To mix or not to mix: Comparing the predictive performance of mixture models vs. separate species distribution models.” Ecology, 94(9): 1913–1919.
  • Kallasvuo, M., Vanhatalo, J., and Veneranta, L. (2017). “Modeling the spatial distribution of larval fish abundance provides essential information for management.” Canadian Journal of Fisheries and Aquatic Sciences, 74: 636–649.
  • Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association, 90(430): 773–795.
  • Kotta, J., Vanhatalo, J., Jänes, H., Orav-Kotta, H., Rugiu, L., Jormalainen, V., Bobsien, I., Viitasalo, M., Virtanen, E., Sandman, A. N., Isaeus, M., Leidenberger, S., Jonsson, P., and Johannesson, K. (2019). “Integrating experimental and distribution data to predict future species patterns.” Scientific Reports, 9(1): 1821.
  • Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive definite matrices in terms of partial correlation vines.” Linear Algebra and its Applications, 372(Supplement C): 225–251.
  • Latimer, A. M., Wu, S., Gelfand, A. E., and Silander, Jr., J. A. (2006). “Building Statistical Models to Analyze Species Distributions.” Ecological Applications, 16(1): 33–50.
  • Lefébure, R., Larsson, S., and Byström, P. (2011). “A temperature dependent growth model for the three-spined stickleback Gasterosteus aculeatus.” Journal of fish biology, 79: 1815–1827.
  • Leskelä, A., Hudd, R., Lehtonen, H., Huhmarniemi, A., and Sandström, O. (1991). “Habitats of whitefish (Coregonus lavaretus (L.) s.l.) larvae in the Gulf of Bothnia.” Aqua Fennica, 21: 145–151.
  • Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correlation matrices based on vines and extended onion method.” Journal of Multivariate Analysis, 100(9): 1989–2001.
  • Lindley, D. V. (2002). “Seeing and Doing: The Concept of Causation.” International Statistical Review, 70(2): 191–214.
  • Liu, J. and Vanhatalo, J. (2018). “Bayesian model-based spatiotemporal survey design for log-Gaussian Cox process.” arXiv e-prints, arXiv:1808.09200.
  • Lopez, H. F. (2000). “Bayesian Analysis in Latent Factor and Longitudinal Models.” Ph.D. thesis, Institute of Statistics and Decision Sciences, Duke University.
  • Lopez, H. F., Salazar, E., and Gamerman, D. (2008). “Spatial Dynamic Factor Analysis.” Bayesian Statistics, 3(4): 759–792.
  • MacKenzie, B. R., Gislason, H., Möllmann, C., and Köster, F. W. (2007). “Impact of 21st century climate change on the Baltic Sea fish community and fisheries.” Global Change Biology, 13: 1348–1367.
  • Mardia, K. V. and Goodall, C. R. (1993). “Spatio-Temporal analysis of Multivariate Environmental Monitoring Data.” Multivariate Environmental Statistics, 347–386.
  • Cingi, S., Keinänen, M., and Vuorinen, P. J. (2010). “Elevated water temperature impairs fertilization and embryonic development of whitefish Coregonus lavaretus.” Journal of Fish Biology, 76: 502–521.
  • Meier, H. E. M., Hordoir, R., Andersson, H. C., Dieterich, C., Eilola, K., Gustafsson, B. G., Höglund, A., and Schimanke, S. (2012). “Modeling the combined impact of changing climate and changing nutrient loads on the Baltic Sea environment in an ensemble of transient simulations for 1961–2099.” Climate Dynamics, 39: 2421–2441.
  • Müller, R. (1992). “Trophic state and its implications for natural reproduction of salmonid fish.” In Dynamics and Use of Lacustrine Ecosystems. Springer, Dordrecht.
  • Nelder, J. A. and Wedderburn, R. W. M. (1972). “Generalized Linear Models.” Journal of the Royal Statistical Association, 135(3): 370–384.
  • Nickisch, H. and Rasmussen, C. E. (2008). “Approximations for binary Gaussian process classification.” Journal of Machine learning, 9: 2035–2078.
  • Odum, E. P. (1953). Fundamentals of ecology. Wiley Subscription Services, Inc., A Wiley Company.
  • O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.” Journal of Royal Statistical Society B, 40(1): 1–42.
  • Ovaskainen, O., de Knegt, H. J., and del Mar Delgado, M. (2016). Quantitative Ecology and Evolutionary Biology – Integrating models with data. Oxford University Press.
  • Ovaskainen, O., Hottola, J., and Siitonen, J. (2010). “Modeling species co-occurence by multivariate logistic regression generates new hypotheses on fungal interaction.” Ecology, 91(9): 2514–2521.
  • Ovaskainen, O. and Soininen, J. (2011). “Making more out of sparse data: Hierarchical modelling of species communities.” Ecology, 92(2): 289–295.
  • Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., Roslin, T., and Abrego, N. (2017). “How to make more out of community data? A conceptual framework and its implementation as models and software.” Ecology Letters, 20(5): 561–576.
  • Paciorek, C. J. (2010). “The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators.” Statistical Science, 25(1): 107–125.
  • Pearl, J. (2009). “Causal inference in statistics: An overview.” Statistics Surveys, 3: 96–146.
  • Peltonen, H., Vinni, M., Lappalainen, A., and Pönni, J. (2004). “Spatial feeding patterns of herring (Clupea harengus L.), sprat (Sprattus sprattus L.), and the three-spined stickleback (emphGasterosteus aculeatus L.) in the Gulf of Finland, Baltic Sea.” ICES Journal of Marine Science, 61: 966–971.
  • Pitkänen, H., Peuraniemi, M., Westerbom, M., Kilpi, M., and von Numers, M. (2013). “Longterm changes in distribution and frequency of aquatic vascular plants and charophytes in an estuary in the Baltic Sea.” Annales Botanica Fennica, 50: 1–54.
  • Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., Hara, R. B. O., Parris, K. M., Vesk, P. A., and McCarthy, M. A. (2014). “Understanding co-occurence by modelling species simultaneously with joint species distribution model (JSDM).” Methods in Ecology and Evolution, 5: 397–406.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press.
  • Record, S., Fitzpatrick, M. C., Finley, A. O., Veloz, S., and Ellison, A. M. (2013). “Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change.” Global Ecology and Biogeography, 22(6): 760–771.
  • Reusch, T. B. H., Dierking, J., Andersson, H. C., Bonsdorff, E., Carstensen, J., Casini, M., Czajkowski, M., Hasler, B., Hinsby, K., Hyytiäinen, K., Johannesson, K., Jomaa, S., Jormalainen, V., Kuosa, H., Kurland, S., Laikre, L., MacKenzie, B. R., Margonski, P., Melzner, F., Oesterwind, D., Ojaveer, H., Refsgaard, J. C., Sandström, A., Schwarz, G., Tonderski, K., Winder, M., and Zandersen, M. (2018). “The Baltic Sea as a time machine for the future coastal ocean.” Science Advances, 4(5).
  • Rönnberg, C. and Bonsdorff, E. (2004). “Baltic Sea eutrophication: area-specific ecological consequences.” Hydrobiologia, 514: 227–241.
  • Rue, H. and Marino, S. (2009). “Approximate Bayesian Inference for latent Gaussian models by using integrated Laplace approximantions.” Journal of the Royal Statistical Society, 71(2): 319–392.
  • Shelton, A. O., Thorson, J. T., Ward, E. J., and Feist, B. E. (2014). “Spatial semiparametric models improve estimates of species abundance and distribution.” Canadian Journal of Fisheries and Aquatic Sciences, 71(July): 1655–1666.
  • Sieben, K., Ljunggren, L., Bergström, U., and Eriksson, B. K. (2011). “A meso-predator release of stickleback promotes recruitment of macroalgae in the Baltic Sea.” Journal of Experimental Marine Biology and Ecology, 397: 79–84.
  • Simpson, D. P., Rue, H. v., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017). “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science, 32(1): 1–28.
  • Taylor-Rodríguez, D., Kaufeld, K., Schliep, E. M., Clark, J. S., and Gelfand, A. E. (2017). “Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes.” Bayesian Analysis, 12(4): 939–967.
  • Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., and Kristensen, K. (2015). “Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range.” Methods in Ecology and Evolution, 6(6): 627–637.
  • Tikhonov, G., Abrego, N., Dunson, D., and Ovaskainen, O. (2017). “Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context.” Methods in Ecology and Evolution, 8(4): 443–452.
  • Tokuda, T., Goodrich, B., Mechelen, I. V., and Gelman, A. (2012). “Visualizing Distributions of Covariance Matrices.”
  • Vanhatalo, J., Hartmann, M., and Veneranta, L. (2019). “Supplementary Material: Additive multivariate Gaussian process for joint species distribution modeling with heterogeneous data.” Bayesian Analysis.
  • Vanhatalo, J., Pietiläinen, V., and Vehtari, A. (2010). “Approximate inference for disease mapping with sparse Gaussian processes.” Statistics in Medicine, 29(15): 1580–1607.
  • Vanhatalo, J., Riihimäki, J., Hartikainen, J., Jylänki, P., Tolvanen, V., and Vehtari, A. (2013). “GPstuff : Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research, 14: 1175–1179.
  • Vanhatalo, J., Veneranta, L., and Hudd, R. (2012). “Species distribution modelling with Gaussian processes: A case study with youngest stages of sea spawning whitefish (Coregonus lavatus L. s.l.) larvae.” Ecological Modelling, (228): 49–58.
  • Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. (2016). “Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models.” Journal of Machine Learning Research, 17: 1–38.
  • Vehtari, A. and Ojanen, J. (2012). “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys, 6: 141–228.
  • Veneranta, L., Hudd, R., and Vanhatalo, J. (2013). “Reproduction areas of sea-spawning coregionids reflect the environment in shallow coastal areas.” Marine Ecology Progress Series, 477: 231–250.
  • Warton, D. I., Blanchet, F. G., O’Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. (2015). “So Many Variables: Joint Modeling in Community Ecology.” Trends in Ecology and Evolution, 30(12): 766–779.
  • Warton, D. I. and Shepherd, L. C. (2010). “Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology.” Annals of Applied Statistics, 4(3): 1383–1402.
  • Wikle, C. K. (2003). “Hierarchical Models in Environmental Science.” International Statistical Review, 71(2): 181–199.
  • Zhang, H. (2004). “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics.” Journal of the American Statistical Association, 99(465): 250–261.

Supplemental materials

  • Supplementary Material: Additive multivariate Gaussian process for joint species distribution modeling with heterogeneous data. The supplementary material contains additional mathematical formulation of the methodology proposed in this paper and additional figures and tables for the case study analysis.