Bayesian Analysis

Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes

Daniel Taylor-Rodríguez, Kimberly Kaufeld, Erin M. Schliep, James S. Clark, and Alan E. Gelfand

Full-text: Open access


Species distribution models are used to evaluate the variables that affect the distribution and abundance of species and to predict biodiversity. Historically, such models have been fitted to each species independently. While independent models can provide useful information regarding distribution and abundance, they ignore the fact that, after accounting for environmental covariates, residual interspecies dependence persists. With stacking of individual models, misleading behaviors, may arise. In particular, individual models often imply too many species per location.

Recently developed joint species distribution models have application to presence–absence, continuous or discrete abundance, abundance with large numbers of zeros, and discrete, ordinal, and compositional data. Here, we deal with the challenge of joint modeling for a large number of species. To appreciate the challenge in the simplest way, with just presence/absence (binary) response and say, S species, we have an S-way contingency table with 2S cell probabilities. Even if S is as small as 100 this is an enormous table, infeasible to work with without some structure to reduce dimension.

We develop a computationally feasible approach to accommodate a large number of species (say order 103) that allows us to: 1) assess the dependence structure across species; 2) identify clusters of species that have similar dependence patterns; and 3) jointly predict species distributions. To do so, we build hierarchical models capturing dependence between species at the first or “data” stage rather than at a second or “mean” stage. We employ the Dirichlet process for clustering in a novel way to reduce dimension in the joint covariance structure. This last step makes computation tractable.

We use Forest Inventory Analysis (FIA) data in the eastern region of the United States to demonstrate our method. It consists of presence–absence measurements for 112 tree species, observed east of the Mississippi. As a proof of concept for our dimension reduction approach, we also include simulations using continuous and binary data.

Article information

Bayesian Anal., Volume 12, Number 4 (2017), 939-967.

First available in Project Euclid: 2 November 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

abundance hierarchical model latent variables Markov chain Monte Carlo presence–absence

Creative Commons Attribution 4.0 International License.


Taylor-Rodríguez, Daniel; Kaufeld, Kimberly; Schliep, Erin M.; Clark, James S.; Gelfand, Alan E. Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes. Bayesian Anal. 12 (2017), no. 4, 939--967. doi:10.1214/16-BA1031.

Export citation


  • Aguilar, O. and West, M. (2010). “Bayesian Dynamic Factor Models and Portfolio Allocation.” Journal of Business & Economic Statistics, 18(3): 338–357. papers2://publication/doi/10.1080/07350015.2000.10524875.
  • Arbel, J., King, C. K., Raymond, B., Winsley, T., and Mengersen, K. L. (2015). “Application of a Bayesian nonparametric model to derive toxicity estimates based on the response of Antarctic microbial communities to fuel-contaminated soil.” Ecology and Evolution, 5(13): 2633–2645.
  • Artemiou, A. and Li, B. (2009). “On principal components and regression: A statistical explanation of a natural phenomenon.” Statistica Sinica, 19(4): 1557.
  • Artemiou, A. and Li, B. (2013). “Predictive power of principal components for single-index model and sufficient dimension reduction.” Journal of Multivariate Analysis, 119: 176–184.
  • Austin, M. and Meyers, J. (1996). “Current approaches to modelling the environmental niche of eucalypts: implication for management of forest biodiversity.” Forest Ecology and Management, 85(1): 95–106.
  • Bechtold, W. A. and Patterson, P. L. (2005). “The enhanced forest inventory and analysis program: national sampling design and estimation procedures.” Technical report, US Department of Agriculture Forest Service, Southern Research Station Asheville, North Carolina.
  • Bhattacharya, A. and Dunson, D. B. (2011). “Sparse Bayesian infinite factor models.” Biometrika, 98(2): 291–306.
  • Blei, D. M., Griffiths, T. L., and Jordan, M. I. (2010). “The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies.” Journal of the ACM (JACM), 57(2): 7.
  • Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). “Latent Dirichlet allocation.” The Journal of Machine Learning Research, 3: 993–1022.
  • Botkin, D. B., Saxe, H., Araujo, M. B., Betts, R., Bradshaw, R. H., Cedhagen, T., Chesson, P., Dawson, T. P., Etterson, J. R., Faith, D. P., et al. (2007). “Forecasting the effects of global warming on biodiversity.” Bioscience, 57(3): 227–236.
  • Bush, C. A. and MacEachern, S. N. (1996). “A semiparametric Bayesian model for randomised block designs.” Biometrika, 83(2): 275–285.
  • Calabrese, J. M., Certain, G., Kraan, C., and Dormann, C. F. (2014). “Stacking species distribution models and adjusting bias by linking them to macroecological models.” Global Ecology and Biogeography, 23(1): 99–112.
  • Cameron, A. C. and Trivedi, P. K. (2005). Microeconometrics: methods and applications. Cambridge University Press.
  • Chakraborty, A., Gelfand, A. E., Wilson, A. M., Latimer, A. M., and Silander, J. A. (2011). “Point pattern modelling for degraded presence-only data over large regions.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 60(5): 757–776.
  • Chib, S. (1998). “Analysis of multivariate probit models.” Biometrika, 85(2): 347–361.
  • Chung, Y. and Dunson, D. B. (2011). “The local Dirichlet process.” Annals of the Institute of Statistical Mathematics, 63(1): 59–80.
  • Clark, J. S., Bell, D. M., Hersh, M. H., Kwit, M. C., Moran, E., Salk, C., Stine, A., Valle, D., and Zhu, K. (2011). “Individual-scale variation, species-scale differences: inference needed to understand diversity.” Ecology Letters, 14(12): 1273–1287.
  • Clark, J. S., Gelfand, A. E., Woodall, C. W., and Zhu, K. (2014). “More than the sum of the parts: forest climate response from joint species distribution models.” Ecological Applications, 24(5): 990–999.
  • Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P., and Zhange, S. (2016). “Median-zero, multivariate, multifarious data: generalized joint attribute modeling for biodiversity analysis.” Ecological Monographs, in press.
  • Dormann, C. F., Schymanski, S. J., Cabral, J., Chuine, I., Graham, C., Hartig, F., Kearney, M., Morin, X., Römermann, C., Schröder, B., et al. (2012). “Correlation and process in species distribution models: bridging a dichotomy.” Journal of Biogeography, 39(12): 2119–2131.
  • Dunson, D. B. and Park, J.-H. (2008). “Kernel stick-breaking processes.” Biometrika, 95(2): 307–323.
  • Elith, J. and Leathwick, J. R. (2009). “Species distribution models: ecological explanation and prediction across space and time.” Annual Review of Ecology, Evolution, and Systematics, 40(1): 677.
  • Escobar, M. D. (1994). “Estimating normal means with a Dirichlet process prior.” Journal of the American Statistical Association, 89(425): 268–277.
  • Escobar, M. D. and West, M. (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association, 90(430): 577–588.
  • Fischlin, A., Midgley, G. F., Hughs, L., Price, J., Leemans, R., Gopal, B., Turley, C., Rounsevell, M., Dube, P., Tarazona, J., et al. (2007). “Ecosystems, their properties, goods and services.”
  • Gelfand, A. E., Schmidt, A. M., Wu, S., Silander, J. A., Latimer, A., and Rebelo, A. G. (2005). “Modelling species diversity through species level hierarchical modelling.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1): 1–20.
  • Gelfand, A. E., Silander, J. A., Wu, S., Latimer, A., Lewis, P. O., Rebelo, A. G., Holder, M., et al. (2006). “Explaining species distribution patterns through hierarchical modeling.” Bayesian Analysis, 1(1): 41–92.
  • Geweke, J. F. and Singleton, K. J. (1980). “Interpreting the likelihood ratio statistic in factor models when sample size is small.” Journal of the American Statistical Association, 75(369): 133–137.
  • Ghahramani, Z. and Griffiths, T. L. (2005). “Infinite latent feature models and the Indian buffet process.” In Advances in Neural Information Processing Systems, 475–482.
  • Guisan, A. and Rahbek, C. (2011). “SESAM–a new framework integrating macroecological and species distribution models for predicting spatio-temporal patterns of species assemblages.” Journal of Biogeography, 38(8): 1433–1444.
  • Guisan, A. and Thuiller, W. (2005). “Predicting species distribution: offering more than simple habitat models.” Ecology Letters, 8(9): 993–1009.
  • Huang, A. and Wand, M. P. (2013). “Simple marginally noninformative prior distributions for covariance matrices.” Bayesian Analysis, 8(2): 439–452.
  • Ishwaran, H. and James, L. F. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96(453): 161–173.
  • Ishwaran, H. and Zarepour, M. (2000). “Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models.” Biometrika, 87(2): 371–390.
  • Iverson, L. R., Prasad, A. M., Matthews, S. N., and Peters, M. (2008). “Estimating potential habitat for 134 eastern US tree species under six climate scenarios.” Forest Ecology and Management, 254(3): 390–406.
  • Latimer, A., Banerjee, S., Sang Jr, H., Mosher, E., and Silander Jr, J. (2009). “Hierarchical models facilitate spatial analysis of large data sets: a case study on invasive plant species in the northeastern United States.” Ecology Letters, 12(2): 144–154.
  • Latimer, A. M., Wu, S., Gelfand, A. E., and Silander Jr, J. A. (2006). “Building statistical models to analyze species distributions.” Ecological Applications, 16(1): 33–50.
  • Lawrence, E., Bingham, D., Liu, C., and Nair, V. N. (2008). “Bayesian inference for multivariate ordinal data using parameter expansion.” Technometrics, 50(2): 182–191.
  • Leathwick, J. (2002). “Intra-generic competition among Nothofagus in New Zealand’s primary indigenous forests.” Biodiversity & Conservation, 11(12): 2177–2187.
  • Li, K.-C. (1991). “Sliced inverse regression for dimension reduction.” Journal of the American Statistical Association, 86(414): 316–327.
  • Liu, J. and Wu, Y. (1999). “Parameter expansion for data augmentation.” Journal of the American Statistical Association, 37–41.
  • Lopes, H. F. and West, M. (2004). “Bayesian model assessment in factor analysis.” 14: 41–67.
  • MacEachern, S. N. (1994). “Estimating normal means with a conjugate style Dirichlet process prior.” Communications in Statistics – Simulation and Computation, 23(3): 727–741.
  • MacKenzie, D. I. and Royle, J. A. (2005). “Designing occupancy studies: general advice and allocating survey effort.” Journal of Applied Ecology, 42(6): 1105–1114.
  • McMahon, S. M., Harrison, S. P., Armbruster, W. S., Bartlein, P. J., Beale, C. M., Edwards, M. E., Kattge, J., Midgley, G., Morin, X., and Prentice, I. C. (2011). “Improving assessment and modelling of climate change impacts on global terrestrial biodiversity.” Trends in Ecology & Evolution, 26(5): 249–259.
  • Midgley, G., Hannah, L., Millar, D., Rutherford, M., and Powrie, L. (2002). “Assessing the vulnerability of species richness to anthropogenic climate change in a biodiversity hotspot.” Global Ecology and Biogeography, 11(6): 445–451.
  • Naik, P. and Tsai, C.-L. (2000). “Partial least squares estimator for single-index models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4): 763–771.
  • Neal, R. M. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9(2): 249–265.
  • Ovaskainen, O., Abrego, N., Halme, P., and Dunson, D. (2015). “Using latent variable models to identify large networks of species-to-species associations at different spatial scales.” Methods in Ecology and Evolution, 7: 549–555.
  • Ovaskainen, O., Hottola, J., and Siitonen, J. (2010). “Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions.” Ecology, 91(9): 2514–2521.
  • Ovaskainen, O. and Soininen, J. (2011). “Making more out of sparse data: hierarchical modeling of species communities.” Ecology, 92(2): 289–295.
  • Papaspiliopoulos, O. and Roberts, G. O. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.” Biometrika, 95(1): 169–186.
  • Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., O’Hara, R. B., Parris, K. M., Vesk, P. A., and McCarthy, M. A. (2014). “Understanding co-occurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM).” Methods in Ecology and Evolution, 5(5): 397–406.
  • Schliep, E. M., Gelfand, A. E., Clark, J. S., and Tomasek, B. J. (2016). “Biomass prediction using density dependent diameter distribution.” Manuscript submitted for publication.
  • Schliep, E. M. and Hoeting, J. A. (2015). “Data augmentation and parameter expansion for independent or spatially correlated ordinal data.” Computational Statistics & Data Analysis, 90: 1–14.
  • Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 4: 639–650.
  • Smith, W. B., Miles, P. D., Vissage, J. S., Pugh, S. A., et al. (2009). “Forest resources of the United States. 2007.” General Technical Report-USDA Forest Service, WO-78, United States Department of Agriculture, Forest Service.
  • Taylor-Rodríguez, D., Kaufeld, K., Schliep, E. M., Clark, J. S., and Gelfand, A. E. (2016). “Appendices: Joint Species distribution modeling: dimension reduction using Dirichlet processes.” Bayesian Analysis.
  • Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., and Kristensen, K. (2015). “Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range.” Methods in Ecology and Evolution.
  • Thuiller, W. (2003). “BIOMOD – optimizing predictions of species distributions and projecting potential future shifts under global change.” Global Change Biology, 9(10): 1353–1362.
  • Thuiller, W., Lavergne, S., Roquet, C., Boulangeat, I., Lafourcade, B., and Araujo, M. B. (2011). “Consequences of climate change on the tree of life in Europe.” Nature, 470(7335): 531–534.
  • Tjur, T. (2009). “Coefficients of determination in logistic regression models? A new proposal: The coefficient of discrimination.” The American Statistician, 63(4): 366–372.
  • Woudenberg, S. W., Conkling, B. L., O’Connell, B. M., LaPoint, E. B., Turner, J. A., Waddell, K. L., et al. (2010). “The Forest Inventory and Analysis Database: Database description and users manual version 4.0 for Phase 2.”

Supplemental materials