The Annals of Applied Statistics

Hierarchical spatial models for predicting tree species assemblages across large domains

Andrew O. Finley, Sudipto Banerjee, and Ronald E. McRoberts
Source: Ann. Appl. Stat. Volume 3, Number 3 (2009), 1052-1079.

Abstract

Spatially explicit data layers of tree species assemblages, referred to as forest types or forest type groups, are a key component in large-scale assessments of forest sustainability, biodiversity, timber biomass, carbon sinks and forest health monitoring. This paper explores the utility of coupling georeferenced national forest inventory (NFI) data with readily available and spatially complete environmental predictor variables through spatially-varying multinomial logistic regression models to predict forest type groups across large forested landscapes. These models exploit underlying spatial associations within the NFI plot array and the spatially-varying impact of predictor variables to improve the accuracy of forest type group predictions. The richness of these models incurs onerous computational burdens and we discuss dimension reducing spatial processes that retain the richness in modeling. We illustrate using NFI data from Michigan, USA, where we provide a comprehensive analysis of this large study area and demonstrate improved prediction with associated measures of uncertainty.

First Page: Show Hide

Related Works:

Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1254773278
Digital Object Identifier: doi:10.1214/09-AOAS250
Zentralblatt MATH identifier: 05758451
Mathematical Reviews number (MathSciNet): MR2750386

References

Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1914507
Albert, D. A. (1995). Regional landscape ecosystems of Michigan, Minnesota, and Wisconsin: A working map and classification. Report No. Gen. Tech. Rep. NC-178. USDA Forest Service, North Central Forest Experiment Station, St. Paul, MN.
Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial datasets. J. Roy. Statist. Soc. Ser. B 70 825–848.
Mathematical Reviews (MathSciNet): MR2523906
Zentralblatt MATH: 05563371
Digital Object Identifier: doi:10.1111/j.1467-9868.2008.00663.x
Bechtold, W. A. and Patterson, P. L. (2005). The enhanced forest inventory and analysis program—national sampling design and estimation procedures. In General Technical Report SRS-80. USDA Forest Service, Southern Research Station 85, Asheville, NC.
Begg, C. B. and Gray, R. (1984). Calculation of polytomous logistic regression parameters using individualized regressions. Biometrika 71 11–18.
Mathematical Reviews (MathSciNet): MR738320
Zentralblatt MATH: 0533.62089
Digital Object Identifier: doi:10.2307/2336391
Crainiceanu, C. M., Diggle, P. J. and Rowlingson, B. (2008). Bivariate binomial spatial modeling of Loa loa prevalence in tropical Africa (with discussion). J. Amer. Statist. Assoc. 103 21–37.
Mathematical Reviews (MathSciNet): MR2420211
Digital Object Identifier: doi:10.1198/016214507000001409
Cressie, N. (1993). Statistics for Spatial Data, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1239641
Zentralblatt MATH: 0799.62002
Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. Roy. Statist. Soc. Ser. B 70 209–226.
Mathematical Reviews (MathSciNet): MR2412639
Zentralblatt MATH: 05563351
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00633.x
Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L. and Pasteris, P. A. (2000). High-quality spatial climate data sets for the United States and beyond. Transactions of the American Society of Agricultural and Biological Engineers 43 1957–1962.
Daniels, M. J. and Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Amer. Statist. Assoc. 94 1254–1263.
Mathematical Reviews (MathSciNet): MR1731487
Zentralblatt MATH: 1069.62508
Digital Object Identifier: doi:10.2307/2669939
Diggle, P. J. and Lophaven, S. (2006). Bayesian geostatistical design. Scand. J. Statist. 33 53–64.
Mathematical Reviews (MathSciNet): MR2255109
Digital Object Identifier: doi:10.1111/j.1467-9469.2005.00469.x
Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics (with discussion). Appl. Statist. 47 299–350.
Mathematical Reviews (MathSciNet): MR1626544
Digital Object Identifier: doi:10.1111/1467-9876.00113
Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed models based on Markov random field priors. J. Roy. Statist. Soc. Ser. C 50 201–220.
Mathematical Reviews (MathSciNet): MR1833273
Digital Object Identifier: doi:10.1111/1467-9876.00229
Finley, A. O., Banerjee, S., Ek, A. R. and McRoberts, R. E. (2008a). Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics 13 60–83.
Finley, A. O., Banerjee, S. and McRoberts, R. E. (2008b). A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environ. Ecol. Statist. 15 241–258.
Mathematical Reviews (MathSciNet): MR2399081
Digital Object Identifier: doi:10.1007/s10651-007-0049-5
Finley, A. O., Banerjee, S. and McRoberts, R. E. (2009). Supplement to “Hierarchical spatial models for predicting tree species assemblages across large domains.” DOI: 10.1214/09-AOAS250SUPP.
Fuentes, M. (2002). A new class of nonstationary spatial models. Biometrika 89 197–210.
Mathematical Reviews (MathSciNet): MR1888368
Zentralblatt MATH: 0997.62073
Digital Object Identifier: doi:10.1093/biomet/89.1.197
Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321–331.
Mathematical Reviews (MathSciNet): MR2345545
Zentralblatt MATH: 05191571
Digital Object Identifier: doi:10.1198/016214506000000852
Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502–523.
Mathematical Reviews (MathSciNet): MR2291261
Digital Object Identifier: doi:10.1198/106186006X132178
Gaspari, G. and Cohn, S. E. (1999). Construction of correlation functions in two and three dimensions. The Quarterly Journal of the Royal Meteorological Society 125 723–757.
Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion). Test 13 263–312.
Mathematical Reviews (MathSciNet): MR2154003
Zentralblatt MATH: 1069.62074
Digital Object Identifier: doi:10.1007/BF02595775
Gelfand, A. E., Kim, H., Sirmans, C. F. and Banerjee, S. (2003). Spatial modelling with spatially varying coefficient processes. J. Amer. Statist. Assoc. 98 387–396.
Mathematical Reviews (MathSciNet): MR2041483
Zentralblatt MATH: 1045.62096
Digital Object Identifier: doi:10.1198/C16214503000000909
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL.
Mathematical Reviews (MathSciNet): MR2027492
Gneiting, T. (2002). Compactly supported correlation functions. J. Multivariate Anal. 83 493–508.
Mathematical Reviews (MathSciNet): MR1945966
Zentralblatt MATH: 1011.60015
Digital Object Identifier: doi:10.1006/jmva.2001.2056
Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
Mathematical Reviews (MathSciNet): MR2345548
Zentralblatt MATH: 05191574
Digital Object Identifier: doi:10.1198/016214506000001437
Grzebyk, M. and Wackernagel, H. (1994). Multivariate analysis and spatial/temporal scales: Real and complex models. In Proceedings of the XVIIth International Biometrics Conference 19–33. Hamilton, Ontario.
Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective. Springer, New York.
Mathematical Reviews (MathSciNet): MR1467237
Heagerty, P. J. and Lele, S. R. (1998). A composite likelihood approach to binary spatial data. J. Amer. Statist. Assoc. 93 1099–1111.
Mathematical Reviews (MathSciNet): MR1649204
Zentralblatt MATH: 1064.62528
Digital Object Identifier: doi:10.2307/2669853
Henne, P. D., Hu, F. S. and Cleland, D. T. (2007). Lake-effect snow as the dominant control of mesic-forest distribution in Michigan, USA. Journal of Ecology 95 517–529.
Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices. SIAM Review 23 53–60.
Mathematical Reviews (MathSciNet): MR605440
Digital Object Identifier: doi:10.1137/1023004
Host, G. E., Pregitzer, K. S., Ramm, C. W., Lusch, D. P. and Cleland, D. T. (1988). Variation in overstory biomass among glacial landforms and ecological land units in northwestern Lower Michigan. Canadian Journal of Forest Research 18 659–668.
Jones, R. H. and Zhang, Y. (1997). Models for continuous stationary space-time processes. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and Future Directions. (P. J. Diggle, W. G. Warren and R. D. Wolfinger, eds.). Springer, New York.
Kamman, E. E. and Wand, M. P. (2003). Geoadditive models. Appl. Statist. 52 1–18.
Mathematical Reviews (MathSciNet): MR1963210
Zentralblatt MATH: 1111.62346
Digital Object Identifier: doi:10.1111/1467-9876.00385
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1044997
Kneib, T. and Fahrmeir, L. (2006). Structured additive regression for categorical space–time data: A mixed model approach. Biometrics 62 109–118.
Mathematical Reviews (MathSciNet): MR2226563
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00392.x
Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R. and Klein, B. (2000). Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Ann. Statist. 28 1570–1600.
Mathematical Reviews (MathSciNet): MR1835032
Zentralblatt MATH: 1105.62358
Digital Object Identifier: doi:10.1214/aos/1015951996
Project Euclid: euclid.aos/1015957471
McCulloch, R. E., Polson, N. G. and Rossi, P. E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. J. Econometrics 99 173–193.
McRoberts, R. E., Nelson, M. D. and Wendt, D. G. (2002). Stratified estimation of forest area using satellite imagery, inventory data, and the k-Nearest Neighbors technique. Remote Sensing of Environment 82 457–468.
Paciorek, C. (2007). Computational techniques for spatial logistic regression with large data sets. Comput. Statist. Data Anal. 51 3631–3653.
Mathematical Reviews (MathSciNet): MR2364480
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2514435
Zentralblatt MATH: 1177.68165
Reich B. J. and Fuentes, M. (2007). A multivariate nonparametric Bayesian spatial frame-work for hurricane surface wind fields. Ann. Appl. Statist. 1 249–264.
Mathematical Reviews (MathSciNet): MR2393850
Zentralblatt MATH: 1129.62114
Digital Object Identifier: doi:10.1214/07-AOAS108
Project Euclid: euclid.aoas/1183143738
Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR2080278
Royle, J. A. and Nychka, D. (1998). An algorithm for the construction of spatial coverage designs with implementation in SPLUS. Computers and Geosciences 24 479–488.
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion). J. Roy. Statist. Soc. Ser. B 71 1–35.
Ruppert, D., Wand, M. P. and Caroll, R. J. (2003). Semiparametric Regressgion. Cambridge Univ. Press.
Schaetzl, R. J. (1986). A soilscape analysis of contrasting glacial terrains in Wisconsin. Ann. Assoc. Amer. Geographers 76 414–425.
Schmidt, A. and Gelfand, A. E. (2003). A Bayesian coregionalization model for multivariate pollutant data. Journal of Geophysics Research—Atmospheres 108 8783.
Stage, A. R. (1969). A growth definition for stocking: Units, sampling, and interpretation. Forest Science 15 255–275.
Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory of Kriging. Springer, New York.
Mathematical Reviews (MathSciNet): MR1697409
Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Statist. 1 191–210.
Mathematical Reviews (MathSciNet): MR2393847
Zentralblatt MATH: 1129.62115
Digital Object Identifier: doi:10.1214/07-AOAS106
Project Euclid: euclid.aoas/1183143735
Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3–10.
Mathematical Reviews (MathSciNet): MR2420389
Zentralblatt MATH: 05623872
Digital Object Identifier: doi:10.1016/j.jkss.2007.09.001
Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial datasets. J. Roy. Statist. Soc. Ser. B 66 275–296.
Mathematical Reviews (MathSciNet): MR2062376
Zentralblatt MATH: 1062.62094
Digital Object Identifier: doi:10.1046/j.1369-7412.2003.05512.x
Tomppo, E. and Halme, M. (2004). Using coarse scale forest variables as ancillary information and weighting of variables in k-NN estimation: A genetic algorithm approach. Remote Sensing of Environment 92 1–20.
Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. Roy. Statist. Soc. Ser. B 50 297–312.
Mathematical Reviews (MathSciNet): MR964183
Ver Hoef, J. M. and Barry, R. D. (1998). Modelling crossvariograms for cokriging and multivariable spatial prediction. J. Statist. Plann. Inference 69 275–294.
Wackernagel, H. (2006). Multivariate Geostatistics: An Introduction with Applications, 3rd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR2247523
Zentralblatt MATH: 1049.91083
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
Mathematical Reviews (MathSciNet): MR1045442
Zentralblatt MATH: 0813.62001
Zhu, Z. and Stein, M. L. (2006). Spatial sampling design for prediction with estimated parameters. J. Agric. Biol. Environ. Statist. 11 24–49.

2012 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics