The Annals of Applied Statistics

Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis

Stéphane Dray and Thibaut Jombart

Full-text: Open access

Abstract

Standard multivariate analysis methods aim to identify and summarize the main structures in large data sets containing the description of a number of observations by several variables. In many cases, spatial information is also available for each observation, so that a map can be associated to the multivariate data set. Two main objectives are relevant in the analysis of spatial multivariate data: summarizing covariation structures and identifying spatial patterns. In practice, achieving both goals simultaneously is a statistical challenge, and a range of methods have been developed that offer trade-offs between these two objectives. In an applied context, this methodological question has been and remains a major issue in community ecology, where species assemblages (i.e., covariation between species abundances) are often driven by spatial processes (and thus exhibit spatial patterns).

In this paper we review a variety of methods developed in community ecology to investigate multivariate spatial patterns. We present different ways of incorporating spatial constraints in multivariate analysis and illustrate these different approaches using the famous data set on moral statistics in France published by André-Michel Guerry in 1833. We discuss and compare the properties of these different approaches both from a practical and theoretical viewpoint.

Article information

Source
Ann. Appl. Stat. Volume 5, Number 4 (2011), 2278-2299.

Dates
First available in Project Euclid: 20 December 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1324399595

Digital Object Identifier
doi:10.1214/10-AOAS356

Mathematical Reviews number (MathSciNet)
MR2907115

Zentralblatt MATH identifier
1234.62092

Keywords
Autocorrelation duality diagram multivariate analysis spatial weighting matrix

Citation

Dray, Stéphane; Jombart, Thibaut. Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Stat. 5 (2011), no. 4, 2278--2299. doi:10.1214/10-AOAS356. https://projecteuclid.org/euclid.aoas/1324399595.


Export citation

References

  • Anselin, L. (1995). Local indicators of spatial association. Geographical Analysis 27 93–115.
  • Anselin, L. (1996). The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In Spatial Analytical Perspectives on GIS (M. M. Fischer, H. J. Scholten and D. Unwin, eds.) 111–125. Taylor and Francis, London.
  • Anselin, L., Syabri, I. and Smirnov, O. (2002). Visualizing multivariate spatial correlation with dynamically linked windows. In New Tools for Spatial Data Analysis: Proceedings of a Workshop (L. Anselin and S. Rey, eds.). CSISS, Santa-Barbara, CA.
  • Benali, H. and Escofier, B. (1990). Analyse factorielle lissée et analyse factorielle des différences locales. Rev. Statist. Appl. 38 55–76.
  • Bivand, R. (2008). Implementing representations of space in economic geography. Journal of Regional Science 48 1–27.
  • Blanchet, F. G., Legendre, P. and Borcard, D. (2008). Forward selection of explanatory variables. Ecology 89 2623–2632.
  • Borcard, D., Legendre, P. and Drapeau, P. (1992). Partialling out the spatial component of ecological variation. Ecology 73 1045–1055.
  • Chatterjee, S. and Hadi, A. S. (1986). Influential observations, high leverage points, and outliers in linear regression. Statist. Sci. 1 379–393.
  • Cliff, A. D. and Ord, J. K. (1973). Spatial Autocorrelation. Pion, London.
  • de Jong, P., Sprenger, C. and van Veen, F. (1984). On extreme values of Moran’s I and Geary’s c. Geographical Analysis 16 17–24.
  • Dolédec, S. and Chessel, D. (1987). Rythmes saisonniers et composantes stationnelles en milieu aquatique I—Description d’un plan d’observations complet par projection de variables. Acta Oecologica—Oecologia Generalis 8 403–426.
  • Dolédec, S. and Chessel, D. (1994). Co-inertia analysis: An alternative method for studying species-environment relationships. Freshwater Biology 31 277–294.
  • Dray, S. and Jombort, T. (2010). Suplement to “Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis.” DOI:10.1214/10-AOAS356SUPP.
  • Dray, S., Chessel, D. and Thioulouse, J. (2003a). Co-inertia analysis and the linking of ecological data tables. Ecology 84 3078–3089.
  • Dray, S., Chessel, D. and Thioulouse, J. (2003b). Procrustean co-inertia analysis for the linking of multivariate data sets. Ecoscience 10 110–119.
  • Dray, S. and Dufour, A. B. (2007). The ade4 package: Implementing the duality diagram for ecologists. J. Statist. Soft. 22 1–20.
  • Dray, S., Legendre, P. and Peres-Neto, P. R. (2006). Spatial modeling: A comprehensive framework for principal coordinate analysis of neighbor matrices (PCNM). Ecological Modelling 196 483–493.
  • Dray, S., Pettorelli, N. and Chessel, D. (2003). Multivariate analysis of incomplete mapped data. Transactions in GIS 7 411–422.
  • Dray, S., Saïd, S. and Débias, F. (2008). Spatial ordination of vegetation data using a generalization of Wartenberg’s multivariate spatial correlation. Journal of Vegetation Science 19 45–56.
  • Dykes, J. and Brunsdon, C. (2007). Geographically weighted visualization: Interactive graphics for scale-varying exploratory analysis. IEEE Transactions on Visualization and Computer Graphics 13 1161–1168.
  • Escoufier, Y. (1987). The duality diagram: A means of better practical applications. In Developments in Numerical Ecology (P. Legendre and L. Legendre, eds.) 14 139–156. Springer, Berlin.
  • Fall, A., Fortin, M. J., Manseau, M. and O’Brien, D. (2007). Spatial graphs: Principles and applications for habitat connectivity. Ecosystems 10 448–461.
  • Friendly, M. (2007). A.-M. Guerry’s moral statistics of France: Challenges for multivariable spatial analysis. Statist. Sci. 22 368–399.
  • Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika 58 453–467.
  • Geary, R. C. (1954). The contiguity ratio and statistical mapping. The Incorporated Statistician 5 115–145.
  • Getis, A. and Aldstadt, J. (2004). Constructing the spatial weights matrix using a local statistic. Geographical Analysis 36 90–104.
  • Getis, A. and Griffith, D. A. (2002). Comparative spatial filtering in regression analysis. Geographical Analysis 34 130–140.
  • Goodall, D. W. (1954). Objective methods for the classification of vegetation III. An essay on the use of factor analysis. Australian Journal of Botany 2 304–324.
  • Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
  • Griffith, D. A. (1996). Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Canadian Geographer 40 351–367.
  • Griffith, D. A. (2000). A linear regression solution to the spatial autocorrelation problem. Journal of Geographical Systems 2 141–156.
  • Griffith, D. A. (2002). A spatial filtering specification for the auto-Poisson model. Statist. Probab. Lett. 58 245–251.
  • Griffith, D. A. (2003). Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization. Springer, Berlin.
  • Griffith, D. A. (2004). A spatial filtering specification for the autologistic model. Environment and Planning A 36 1791–1811.
  • Guérry, A. M. (1833). Essai sur la Statistique Morale de la France. Crochard, Paris.
  • Haining, R. (1990). Spatial Data Analysis in the Social and Environmental Sciences. Cambridge Univ. Press.
  • Holmes, S. (2006). Multivariate analysis: The French way. In Festschrift for David Freedman (D. Nolan and T. Speed, eds.). IMS, Beachwood, OH.
  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24 417–441.
  • Jaromczyk, J. W. and Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE 80 1502–1517.
  • Jombart, T., Dray, S. and Dufour, A. B. (2009). Finding essential scales of spatial variation in ecological data: A multivariate approach. Ecography 32 161–168.
  • Le Foll, Y. (1982). Pondération des distances en analyse factorielle. Statistique et Analyse des données 7 13–31.
  • Lebart, L. (1969). Analyse statistique de la contiguïté. Publication de l’Institut de Statistiques de l’Université de Paris 28 81–112.
  • Legendre, P. (1993). Spatial autocorrelation: Trouble or new paradigm? Ecology 74 1659–1673.
  • Legendre, P. and Legendre, L. (1998). Numerical Ecology, 2nd ed. Elsevier, Amsterdam.
  • Moran, P. A. P. (1948). The interpretation of statistical maps. J. Roy. Statist. Soc. Ser. B 10 243–251.
  • Méot, A., Chessel, D. and Sabatier, R. (1993). Opérateurs de voisinage et analyse des données spatio-temporelles. In Biométrie et environnement (J. D. Lebreton and B. Asselain, eds.) 45–72. Masson, Paris.
  • Norcliffe, G. B. (1969). On the use and limitations of trend surface models. Canadian Geographer 13 338–348.
  • Peres-Neto, P. R. and Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia 129 169–178.
  • Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā Ser. A 26 329–359.
  • Student (1914). The elimination of spurious correlation due to position in time or space. Biometrika 10 179–180.
  • ter Braak, C. J. F. (1986). Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis. Ecology 67 1167–1179.
  • Tiefelsdorf, M., Griffith, D. A. and Boots, B. (1999). A variance-stabilizing coding scheme for spatial link matrices. Environment and Planning A 31 165–180.
  • Tiefelsdorf, M. and Griffith, D. A. (2007). Semi-parametric filtering of spatial autocorrelation: The eigenvector approach. Environment and Planning A 39 1193–1221.
  • Torre, F. and Chessel, D. (1995). Co-structure de deux tableaux totalement appariés. Revue de Statistique Appliquée 43 109–121.
  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
  • van den Wollenberg, A. L. (1977). Redundancy analysis, an alternative for canonical analysis. Psychometrika 42 207–219.
  • Wartenberg, D. (1985). Multivariate spatial correlation: A method for exploratory geographical analysis. Geographical Analysis 17 263–283.

Supplemental materials

  • Supplementary material: Implementation in R. This website hosts an R package (Guerry) containing the Guerry’s data set (maps and data). The package contains also a tutorial (vignette) showing how to reproduce the analyses and the graphics presented in this paper using mainly the package ade4 [Dray and Dufour (2007)]. The package Guerry is also available on CRAN and can be installed using the install.packages(“Guerry”) command in a R session.