Understanding spatial patterns of species diversity and the distributions of individual species is a consuming problem in biogeography and conservation. The Cape Floristic Region (CFR) of South Africa is a global hotspot of diversity and endemism, and the Protea Atlas Project, with some 60,000 site records across the region, provides an extraordinarily rich data set to analyze biodiversity patterns. Analysis for the region is developed at the spatial scale of one minute grid-cells (~37,000$ cells total for the region). We report on results for 40 species of a flowering plant family Proteaceae (of about 330 in the CFR) for a defined subregion.
Using a Bayesian framework, we develop a two stage, spatially explicit, hierarchical logistic regression. Stage one models the suitability or potential presence for each species at each cell, given species attributes along with grid cell (site-level) climate, precipitation, topography and geology data using species-level coefficients, and a spatial random effect. The second level of the hierarchy models, for each species, observed presence$/$absence at a sampling site through a conditional specification of the probability of presence at an arbitrary location in the grid cell given that the location is suitable. Because the atlas data are not evenly distributed across the landscape, grid cells contain variable numbers of sampling localities. Indeed, some grid cells are entirely unsampled; others have been transformed by human intervention (agriculture, urbanization) such that none of the species are there though some may have the potential to be present in the absence of disturbance. Thus the modeling takes the sampling intensity at each site into account by assuming that the total number of times that a particular species was observed within a site follows a binomial distribution.
In fact, a range of models can be examined incorporating different first and second stage specifications. This necessitates model comparison in a misaligned multilevel setting. All models are fitted using MCMC methods. A "best" model is selected. Parameter summaries offer considerable insight. In addition, results are mapped as the model-estimated potential presence for each species across the domain. This probability surface provides an alternative to customary empirical "range of occupancy" displays. Summing yields the predicted species richness over the region. Summaries of the posterior for each environmental coefficient show which variables are most important in explaining species presence. Other biodiversity measures emerge as model unknowns. A considerable range of inference is available. We illustrate with only a portion of the analyses we have conducted, noting that these initial results describe biogeographical patterns over the modeled region remarkably well.
"Explaining species distribution patterns through hierarchical modeling." Bayesian Anal. 1 (1) 41 - 92, March 2006. https://doi.org/10.1214/06-BA102