## Bayesian Analysis

- Bayesian Anal.
- Volume 12, Number 4 (2017), 939-967.

### Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes

Daniel Taylor-Rodríguez, Kimberly Kaufeld, Erin M. Schliep, James S. Clark, and Alan E. Gelfand

#### Abstract

Species distribution models are used to evaluate the variables that affect the distribution and abundance of species and to predict biodiversity. Historically, such models have been fitted to each species independently. While independent models can provide useful information regarding distribution and abundance, they ignore the fact that, after accounting for environmental covariates, residual interspecies dependence persists. With stacking of individual models, misleading behaviors, may arise. In particular, individual models often imply too many species per location.

Recently developed joint species distribution models have application to presence–absence, continuous or discrete abundance, abundance with large numbers of zeros, and discrete, ordinal, and compositional data. Here, we deal with the challenge of joint modeling for a large number of species. To appreciate the challenge in the simplest way, with just presence/absence (binary) response and say, $S$ species, we have an $S$-way contingency table with ${2}^{S}$ cell probabilities. Even if $S$ is as small as $100$ this is an enormous table, infeasible to work with without some structure to reduce dimension.

We develop a computationally feasible approach to accommodate a large number of species (say order ${10}^{3}$) that allows us to: 1) assess the dependence structure across species; 2) identify clusters of species that have similar dependence patterns; and 3) jointly predict species distributions. To do so, we build hierarchical models capturing dependence between species at the first or “data” stage rather than at a second or “mean” stage. We employ the Dirichlet process for clustering in a novel way to reduce dimension in the joint covariance structure. This last step makes computation tractable.

We use Forest Inventory Analysis (FIA) data in the eastern region of the United States to demonstrate our method. It consists of presence–absence measurements for 112 tree species, observed east of the Mississippi. As a proof of concept for our dimension reduction approach, we also include simulations using continuous and binary data.

#### Article information

**Source**

Bayesian Anal. Volume 12, Number 4 (2017), 939-967.

**Dates**

First available in Project Euclid: 2 November 2016

**Permanent link to this document**

https://projecteuclid.org/euclid.ba/1478073617

**Digital Object Identifier**

doi:10.1214/16-BA1031

**Keywords**

abundance hierarchical model latent variables Markov chain Monte Carlo presence–absence

**Rights**

Creative Commons Attribution 4.0 International License.

#### Citation

Taylor-Rodríguez, Daniel; Kaufeld, Kimberly; Schliep, Erin M.; Clark, James S.; Gelfand, Alan E. Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes. Bayesian Anal. 12 (2017), no. 4, 939--967. doi:10.1214/16-BA1031. https://projecteuclid.org/euclid.ba/1478073617

#### Supplemental materials

- Appendices: Joint Species distribution modeling: dimension reduction using Dirichlet processes. Digital Object Identifier: doi:10.1214/16-BA1031SUPP