The Annals of Applied Statistics

Modeling large scale species abundance with latent spatial processes

Avishek Chakraborty, Alan E. Gelfand, Adam M. Wilson, Andrew M. Latimer, and John A. Silander, Jr.

Full-text: Open access


Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our model is specified at areal level, where the CFR is divided into roughly 37, 000 one minute grid cells; species abundance is observed at some locations within some cells. The abundance values are ordinally categorized. Environmental and soil-type factors, likely to influence the abundance pattern, are included in the model. We formulate the empirical abundance pattern as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. An important point in this analysis is that only 28% of the grid cells have been sampled and that, for sampled grid cells, the number of sampled locations ranges from one to more than one hundred. Still, we are able to develop potential and transformed abundance surfaces over the entire region.

In the hierarchical framework, categorical abundance classifications are induced by continuous latent surfaces. The degradation model above is built on the latent scale. On this scale, an areal level spatial regression model was used for modeling the dependence of species abundance on the environmental factors. To capture anticipated similarity in abundance pattern among neighboring regions, spatial random effects with a conditionally autoregressive prior (CAR) were specified. Model fitting is through familiar Markov chain Monte Carlo methods. While models with CAR priors are usually efficiently fitted, even with large data sets, with our modeling and the large number of cells, run times became very long. So a novel parallelized computing strategy was developed to expedite fitting. The model was run for six different species. With categorical data, display of the resultant abundance patterns is a challenge and we offer several different views. The patterns are of importance on their own, comparatively across the region and across species, with implications for species competition and, more generally, for planning and conservation.

Article information

Ann. Appl. Stat. Volume 4, Number 3 (2010), 1403-1429.

First available in Project Euclid: 18 October 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Conditional autoregressive prior latent variables misalignment ordinal categorical data parallel computing


Chakraborty, Avishek; Gelfand, Alan E.; Wilson, Adam M.; Latimer, Andrew M.; Silander, Jr., John A. Modeling large scale species abundance with latent spatial processes. Ann. Appl. Stat. 4 (2010), no. 3, 1403--1429. doi:10.1214/10-AOAS335.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 670–679.
  • Armstrong, M., Galli, A. G., Le Loc’h, G., Geffroy, F. and Eschard R. (2003). PluriGaussian Simulations in Geosciences. Springer, Berlin.
  • Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC, Boca Raton, FL.
  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial datasets. J. Roy. Statist. Soc. Ser. B 70 825–848.
  • Beale, C. M., Lennon, J. J., Elston, D. A., Brewer, M. J. and Yearsley, J. M. (2007). Red herrings remain in geographical ecology: A reply to Hawkins et al. Ecography 30 845–847.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733–746.
  • Busby, J. R. (1991). BIOCLIM: A bioclimatic analysis and predictive system. In Nature Conservation: Cost Effective Biological Surveys and Data Analysis (C. R. Margules and M. P. Austin, eds.) 64–68. CSIRO, Canberra, Australia.
  • Conroy, M. J., Runge, J. P., Barker, R. J., Schofield, M. R. and Fonnesbeck, C. J. (2008). Efficient estimation of abundance for patchily distributed populations via two-phase, adaptive sampling. Ecology 89 3362–3370.
  • Cressie, N., Calder, C. A., Clark, J. S., Ver Hoef, J. M. and Wikle, C. K. (2009). Accounting for uncertainty in ecological analysis: The strengths and limitations of hierarchical statistical modeling. Ecological Applications 19 553–570.
  • De Oliveira, V. (2000). Bayesian prediction of clipped Gaussian random fields. Comput. Statist. Data Anal. 34 299–314.
  • Diggle, P. J., Menezes, R. and Su, T.-L. (2010). Geostatistical analysis under preferential sampling (with discussion). J. Roy. Statist. Soc. Ser. C 59 191–232.
  • Elith, J., Graham, C. H., Anderson, R. P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R. J., Huettmann, F., Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. McC., Peterson, A. T., Phillips, S. J., Richardson, K. S., Scachetti-Pereira, R., Schapire, R. E., Soberón, J., Williams, S., Wisz, M. S. and Zimmermann, N. E. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29 129–151.
  • Fitzpatrick, M. C., Gove, A. D., Nathan, J., Sanders, N. J. and Dunn, R. R. (2008). Climate change, plant migration, and range collapse in a global biodiversity hotspot: The Banksia (Proteaceae) of Western Australia. Global Change Biology 14 1337–1352.
  • Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.
  • Gaston, K. (2003). The structure and dynamics of geographic ranges, 1st ed. Oxford Univ. Press, Oxford.
  • Gelfand, A. E. and Sahu, S. K. (1999). Identifiability, improper priors, and Gibbs sampling for generalized linear models. J. Amer. Statist. Assoc. 94 247–253.
  • Gelfand, A. E. and Vounatsou, P. (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4 11–25.
  • Gelfand, A. E., Silander, J. A., Jr., Wu, S., Latimer, A. M., Lewis, P., Rebelo, A. G. and Holder, M. (2005a). Explaining species distribution patterns through hierarchical modeling. Bayesian Anal. 1 42–92.
  • Gelfand, A. E., Schmidt, A. M., Wu, S., Silander, J. A., Jr., Latimer, A. M. and Rebelo, A. G. (2005b). Modelling species diversity through species level hierarchical modeling. J. Roy. Statist. Soc. Ser. C 54 1–20.
  • Goldblatt, P. and Manning, J. (2000). Cape Plants: A Conspectus of the Cape Flora of South Africa. National Botanical Institute of South Africa, Cape Town.
  • Gorresen, P. M., McMillan, G. P., Camp, R. J. and Pratt, T. K. (2009). A spatial model of bird abundance as adjusted for detection probability. Ecography 32 291–298.
  • Graham, C. H. and Hijmans, R. J. (2006). A comparison of methods for mapping species ranges and species richness. Global Ecology and Biogeography 15 578–587.
  • Guisan, A. and Thuiller, W. (2005). Predicting species distribution: Offering more than simple habitat models. Ecology Letters 8 993–1009.
  • Guisan, A. and Zimmerman, N. E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling 135 147–186.
  • Guisan, A., Lehman, A., Ferrier, S., Austin, M. P., Overton, J. M. C., Aspinall, R. and Hastie, T. (2006). Making better biogeographical predictions of species’ distributions. Journal of Applied Ecology 43 386–392.
  • Higgs, M. D. and Hoeting, J. A. (2010). A clipped latent-variable model for spatially correlated ordered categorical data. Comput. Statist. Data Anal. 54 1999–2011.
  • Hooten, M. B., Larsen, D. R. and Wikle, C. K. (2003). Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model. Landscape Ecology 18 487–502.
  • Ibáñez, I., Silander, J. A., Jr., Allen, J. M., Treanor, S. and Wilson, A. (2009). Identifying hotspots for plant invasions and forecasting focal points of further spread. Journal of Applied Ecology 46 1219–1228.
  • Jin, X., Carlin, B. P. and Banerjee, S. (2005). Generalized hierarchical multivariate CAR models for areal data. Biometrics 61 950–961.
  • Kunin, W. E., Hartley, S. and Lennon, J. (2000). Scaling down: On the challenges of estimating abundance from occurrence patterns. The American Naturalist 156 560–566.
  • Latimer, A. M., Wu, S., Gelfand, A. E. and Silander, J. A., Jr. (2006). Building statistical models to analyze species distributions. Ecological Applications 16 33–50.
  • Le Loc’h, G. and Galli, A. (1997). Truncated plurigaussian method: Theoretical and practical points of view. In Geostatistics Wollongong’96 (E. Y. Baafi and N. A. Schofield, eds.) 1 211–222. Kluwer, Dordrecht, The Netherlands.
  • Loarie, S. R., Carter, B. E., Hayhoe, K., McMahon, S., Moe, R., Knight, C. A. and Ackerly, D. D. (2008). Climate change and the future of California’s endemic flora. PLoS ONE 3 e2502.
  • Mallick, B. and Gelfand, A. E. (1995). Bayesian analysis of semiparametric proportional hazards models. Biometrics 51 843–852.
  • Midgley, G. F. and Thuiller, W. (2007). Potential vulnerability of Namaqualand plant diversity to anthropogenic climate change. Journal of Arid Environments 70 615–628.
  • Mueller-Dombois, D. and Ellenberg, H. (2003). Aims and Methods of Vegetation Ecology. Blackburn Press, Caldwell, NJ.
  • Pearce, J. and Ferrier, S. (2001). The practical value of modelling relative abundance of species for regional conservation planning: A case study. Biological Conservation 98 33–43.
  • Phillips, S. J. and Dudík, M. (2008). Modeling of species distributions with Maxent: New extensions and a comprehensive evaluation. Ecography 31 161–175.
  • Potts, J. M. and Elith, J. (2006). Comparing species abundance models. Ecological Modelling 199 153–163.
  • Pressey, R. L., Cabeza, M., Watts, M. E., Cowling, R. M. and Wilson, K. A. (2007). Conservation planning in a changing world. Trends in Ecology and Evolution 22 583–592.
  • Raxworthy, C. J., Martinez-Meyer, E., Horning, N., Nussbaum, R. A., Schneider, G. E., Ortega-Huerta, M. A. and Peterson, A. T. (2003). Predicting distributions of known and unknown reptile species in madagascar. Nature 426 837–841.
  • Rebelo, A. G. (1991). Protea Atlas Manual: Instruction Booklet to the Protea Atlas Project. Protea Atlas Project, Cape Town.
  • Rebelo, A. G. (2001). Proteas: A Field Guide to the Proteas of Southern Africa, 2nd ed. Fernwood Press, Vlaeberg, South Africa.
  • Rebelo, A. G. (2002). The state of plants in the Cape Flora. In Proceedings of a Conference Held at the Rosebank Hotel in Johannesburg (G. H. Verdoorn and J. Le Roux, eds.) 18. The State of South Africa’s Species, Endangered Wildlife Trust.
  • Rebelo, A. G. (2006). Protea atlas project website. Available at
  • Rebelo, A. G., Boucher, C., Helme, N., Mucina, L. and Rutherford, M. C. (2006). Fynbos biome. In The Vegetation of South Africa, Lesotho and Swaziland. Streltzia (L. Micina and M. C. Rutherford, eds.) 19. South African National Biodiversity Institute, Pretoria, South Africa.
  • Rouget, M., Richardson, D. M., Cowling, R. M., Lloyd, J. W. and Lombard, A. T. (2003). Current patterns of habitat transformation and future threats to biodiversity in terrestrial ecosystems of the Cape Floristic Region, South Africa. Biological Conservation 112 63–83.
  • Royle, J. A. and Link, W. A. (2006). Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87 835–841.
  • Royle, J. A., Kéry, M., Gautier, R. and Schmidt, H. (2007). Hierarchical spatial models of abundance and occurrence from imperfect survey data. Ecological Monographs 77 465–481.
  • Schwartz, M. W., Iverson, L. R., Prasad, A. M., Matthews, S. N. and O’Connor, R. J. (2006). Predicting extinctions as a result of climate change. Ecology 87 1611–1615.
  • Stefanski, L. A. and Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika 74 703–716.
  • Sutherland, W. J. (2006). Ecological Census Techniques, 2nd ed. Cambridge Univ. Press, Cambridge.
  • Ver Hoef, J. M., Cressie, N., Fisher, R. N. and Case, T. J. (2001). Uncertainty and spatial linear models for ecological data. In Spatial Uncertainty in Ecology (C. T. Hunsaker, M. F. Goodchild, M. A. Friedl and T. J. Case, eds.) 214–237. Springer, New York.
  • Ver Hoef, J. M. and Frost, K. (2003). A Bayesian hierarchical model for monitoring harbor seal changes in Prince William Sound, Alaska. Environ. Ecol. Stat. 10 201–209.
  • Wisz, M. S., Hijmans, R. J., Li, J., Peterson, A. T., Graham, C. H. and Guisan, A. (2008). Effects of sample size on the performance of species distribution models. Diversity and Distributions 14 763–773.