The Annals of Applied Statistics

Hierarchical spatial models for predicting tree species assemblages across large domains

Andrew O. Finley, Sudipto Banerjee, and Ronald E. McRoberts

Full-text: Open access


Spatially explicit data layers of tree species assemblages, referred to as forest types or forest type groups, are a key component in large-scale assessments of forest sustainability, biodiversity, timber biomass, carbon sinks and forest health monitoring. This paper explores the utility of coupling georeferenced national forest inventory (NFI) data with readily available and spatially complete environmental predictor variables through spatially-varying multinomial logistic regression models to predict forest type groups across large forested landscapes. These models exploit underlying spatial associations within the NFI plot array and the spatially-varying impact of predictor variables to improve the accuracy of forest type group predictions. The richness of these models incurs onerous computational burdens and we discuss dimension reducing spatial processes that retain the richness in modeling. We illustrate using NFI data from Michigan, USA, where we provide a comprehensive analysis of this large study area and demonstrate improved prediction with associated measures of uncertainty.

Article information

Ann. Appl. Stat. Volume 3, Number 3 (2009), 1052-1079.

First available in Project Euclid: 5 October 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian inference species assemblages logistic regression spatially-varying coefficients Markov chain Monte Carlo spatial predictive process


Finley, Andrew O.; Banerjee, Sudipto; McRoberts, Ronald E. Hierarchical spatial models for predicting tree species assemblages across large domains. Ann. Appl. Stat. 3 (2009), no. 3, 1052--1079. doi:10.1214/09-AOAS250.

Export citation


  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Albert, D. A. (1995). Regional landscape ecosystems of Michigan, Minnesota, and Wisconsin: A working map and classification. Report No. Gen. Tech. Rep. NC-178. USDA Forest Service, North Central Forest Experiment Station, St. Paul, MN.
  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial datasets. J. Roy. Statist. Soc. Ser. B 70 825–848.
  • Bechtold, W. A. and Patterson, P. L. (2005). The enhanced forest inventory and analysis program—national sampling design and estimation procedures. In General Technical Report SRS-80. USDA Forest Service, Southern Research Station 85, Asheville, NC.
  • Begg, C. B. and Gray, R. (1984). Calculation of polytomous logistic regression parameters using individualized regressions. Biometrika 71 11–18.
  • Crainiceanu, C. M., Diggle, P. J. and Rowlingson, B. (2008). Bivariate binomial spatial modeling of Loa loa prevalence in tropical Africa (with discussion). J. Amer. Statist. Assoc. 103 21–37.
  • Cressie, N. (1993). Statistics for Spatial Data, 2nd ed. Wiley, New York.
  • Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. Roy. Statist. Soc. Ser. B 70 209–226.
  • Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L. and Pasteris, P. A. (2000). High-quality spatial climate data sets for the United States and beyond. Transactions of the American Society of Agricultural and Biological Engineers 43 1957–1962.
  • Daniels, M. J. and Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Amer. Statist. Assoc. 94 1254–1263.
  • Diggle, P. J. and Lophaven, S. (2006). Bayesian geostatistical design. Scand. J. Statist. 33 53–64.
  • Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics (with discussion). Appl. Statist. 47 299–350.
  • Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed models based on Markov random field priors. J. Roy. Statist. Soc. Ser. C 50 201–220.
  • Finley, A. O., Banerjee, S., Ek, A. R. and McRoberts, R. E. (2008a). Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics 13 60–83.
  • Finley, A. O., Banerjee, S. and McRoberts, R. E. (2008b). A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environ. Ecol. Statist. 15 241–258.
  • Finley, A. O., Banerjee, S. and McRoberts, R. E. (2009). Supplement to “Hierarchical spatial models for predicting tree species assemblages across large domains.” DOI: 10.1214/09-AOAS250SUPP.
  • Fuentes, M. (2002). A new class of nonstationary spatial models. Biometrika 89 197–210.
  • Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
  • Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
  • Gaspari, G. and Cohn, S. E. (1999). Construction of correlation functions in two and three dimensions. The Quarterly Journal of the Royal Meteorological Society 125 723–757.
  • Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion). Test 13 263–312.
  • Gelfand, A. E., Kim, H., Sirmans, C. F. and Banerjee, S. (2003). Spatial modelling with spatially varying coefficient processes. J. Amer. Statist. Assoc. 98 387–396.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Gneiting, T. (2002). Compactly supported correlation functions. J. Multivariate Anal. 83 493–508.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Grzebyk, M. and Wackernagel, H. (1994). Multivariate analysis and spatial/temporal scales: Real and complex models. In Proceedings of the XVIIth International Biometrics Conference 19–33. Hamilton, Ontario.
  • Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective. Springer, New York.
  • Heagerty, P. J. and Lele, S. R. (1998). A composite likelihood approach to binary spatial data. J. Amer. Statist. Assoc. 93 1099–1111.
  • Henne, P. D., Hu, F. S. and Cleland, D. T. (2007). Lake-effect snow as the dominant control of mesic-forest distribution in Michigan, USA. Journal of Ecology 95 517–529.
  • Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices. SIAM Review 23 53–60.
  • Host, G. E., Pregitzer, K. S., Ramm, C. W., Lusch, D. P. and Cleland, D. T. (1988). Variation in overstory biomass among glacial landforms and ecological land units in northwestern Lower Michigan. Canadian Journal of Forest Research 18 659–668.
  • Jones, R. H. and Zhang, Y. (1997). Models for continuous stationary space-time processes. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and Future Directions. (P. J. Diggle, W. G. Warren and R. D. Wolfinger, eds.). Springer, New York.
  • Kamman, E. E. and Wand, M. P. (2003). Geoadditive models. Appl. Statist. 52 1–18.
  • Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
  • Kneib, T. and Fahrmeir, L. (2006). Structured additive regression for categorical space–time data: A mixed model approach. Biometrics 62 109–118.
  • Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R. and Klein, B. (2000). Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Ann. Statist. 28 1570–1600.
  • McCulloch, R. E., Polson, N. G. and Rossi, P. E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. J. Econometrics 99 173–193.
  • McRoberts, R. E., Nelson, M. D. and Wendt, D. G. (2002). Stratified estimation of forest area using satellite imagery, inventory data, and the k-Nearest Neighbors technique. Remote Sensing of Environment 82 457–468.
  • Paciorek, C. (2007). Computational techniques for spatial logistic regression with large data sets. Comput. Statist. Data Anal. 51 3631–3653.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  • Reich B. J. and Fuentes, M. (2007). A multivariate nonparametric Bayesian spatial frame-work for hurricane surface wind fields. Ann. Appl. Statist. 1 249–264.
  • Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • Royle, J. A. and Nychka, D. (1998). An algorithm for the construction of spatial coverage designs with implementation in SPLUS. Computers and Geosciences 24 479–488.
  • Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion). J. Roy. Statist. Soc. Ser. B 71 1–35.
  • Ruppert, D., Wand, M. P. and Caroll, R. J. (2003). Semiparametric Regressgion. Cambridge Univ. Press.
  • Schaetzl, R. J. (1986). A soilscape analysis of contrasting glacial terrains in Wisconsin. Ann. Assoc. Amer. Geographers 76 414–425.
  • Schmidt, A. and Gelfand, A. E. (2003). A Bayesian coregionalization model for multivariate pollutant data. Journal of Geophysics Research—Atmospheres 108 8783.
  • Stage, A. R. (1969). A growth definition for stocking: Units, sampling, and interpretation. Forest Science 15 255–275.
  • Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory of Kriging. Springer, New York.
  • Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Statist. 1 191–210.
  • Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
  • Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial datasets. J. Roy. Statist. Soc. Ser. B 66 275–296.
  • Tomppo, E. and Halme, M. (2004). Using coarse scale forest variables as ancillary information and weighting of variables in k-NN estimation: A genetic algorithm approach. Remote Sensing of Environment 92 1–20.
  • Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. Roy. Statist. Soc. Ser. B 50 297–312.
  • Ver Hoef, J. M. and Barry, R. D. (1998). Modelling crossvariograms for cokriging and multivariable spatial prediction. J. Statist. Plann. Inference 69 275–294.
  • Wackernagel, H. (2006). Multivariate Geostatistics: An Introduction with Applications, 3rd ed. Springer, New York.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Zhu, Z. and Stein, M. L. (2006). Spatial sampling design for prediction with estimated parameters. J. Agric. Biol. Environ. Statist. 11 24–49.

Supplemental materials