Statistical Science

The Gifi system of descriptive multivariate analysis

George Michailidis and Jan de Leeuw

Full-text: Open access

Abstract

The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. Several examples are used to illustrate the methods. A brief account of stability issues and areas of applications of the techniques is also given.

Article information

Source
Statist. Sci., Volume 13, Number 4 (1998), 307-336.

Dates
First available in Project Euclid: 9 August 2002

Permanent link to this document
https://projecteuclid.org/euclid.ss/1028905828

Digital Object Identifier
doi:10.1214/ss/1028905828

Mathematical Reviews number (MathSciNet)
MR1705265

Zentralblatt MATH identifier
1059.62551

Subjects
Primary: 62-01: Instructional exposition (textbooks, tutorial papers, etc.)
Secondary: 62H99: None of the above, but in this section

Keywords
Optimal scaling alternating least squares multivariate techniques loss functions stability

Citation

Michailidis, George; de Leeuw, Jan. The Gifi system of descriptive multivariate analysis. Statist. Sci. 13 (1998), no. 4, 307--336. doi:10.1214/ss/1028905828. https://projecteuclid.org/euclid.ss/1028905828


Export citation

References

  • [1] Anderson, C. S. (1982). The search for school climate: a review of the research. Review of Educational Research 52 368-420.
  • [2] Anderson, T. W. (1984). An Introduction to Multivariate Analy sis Techniques, 2nd ed. Wiley, New York.
  • [3] Benz´ecri, J. P. (1973). Analy se des Donn´ees. Dunod, Paris.
  • [4] Benz´ecri, J. P. (1992). Handbook of Correspondence Analy sis. Dekker, New York.
  • [5] Bijleveld, C. C. J. H. (1989). Exploratory Linear Dy namic Sy stems Analy sis. DSWO Press, Leiden.
  • [6] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W.
  • (1975). Discrete Multivariate Analy sis: Theory and Practice. MIT Press.
  • [7] Bond, J. and Michailidis, G. (1996). Homogeneity analysis in Lisp-Stat. J. Statistical Software 1.
  • [8] Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580-619.
  • [9] Buja, A. (1990). Remarks on functional canonical variates, alternating least squares methods and ACE. Ann. Statist. 18 1032-1069.
  • [10] Cailliez, F. and Pages, J. P. (1976). Introduction a l'Analy se des Donn´ees. SMASH, Paris.
  • [11] Carnegie Foundation for the Advancement of Teach
  • ing (1988). An Imperiled Generation: Saving Urban Schools. Carnegie Foundation, Princeton, NJ.
  • [12] Carroll, J. D. (1968). Generalization of canonical correlation analysis to three or more sets of variables. In Proceedings of the 76th Convention of the American Psy chological Association 3 227-228.
  • [13] Clogg, C. C. (1981). New developments in latent structure analysis. In Factor Analy sis and Measurement in Sociological Research (Jackson and Borgatta, eds.) 215- 246. Sage, Beverly Hills, CA.
  • [14] Clogg, C. C. (1984). Latent structure analysis of a set of multidimensional contingency tables. J. Amer. Statist. Assoc. 79 762-771.
  • [15] Clogg, C. C. (1986). Statistical modeling versus singular value decomposition. Internat. Statist. Rev. 54 284-288.
  • [16] de Leeuw, J. (1977). Correctness of Kruskal's algorithms for monotone regression with ties. Psy chometrika 42 141-144.
  • [17] de Leeuw, J. (1983). On the prehistory of correspondence analysis. Statist. Neerlandica 37 161-164.
  • [18] de Leeuw, J. (1984). The Gifi-sy stem of nonlinear multivariate analysis. In Data Analy sis and Informatics III (Diday et al., eds.) 415-424. North-Holland, Amsterdam.
  • [19] de Leeuw, J. (1985). Jackknife and bootstrap methods in multinomial situations. Research Report 85-16, Dept. Data Theory, Leiden Univ.
  • [20] de Leeuw, J. (1988). Models and techniques. Statist. Neerlandica 42 91-98.
  • [21] de Leeuw, J. and van der Burg, E. (1984). The permutational limit distribution of generalized canonical correlations. In Data Analy sis and Informatics IV (Diday et al., eds.) 509-521. North-Holland, Amsterdam.
  • [22] de Leeuw, J. and van Rijckevorsel, J. (1980). Homals and princals. Some generalizations of principal components analysis. In Data Analy sis and Informatics II (Diday et al., eds.) 231-242. North-Holland, Amsterdam.
  • [23] Eades, P. and Sugiy ama, K. (1990). How to draw a directed graph. J. Inform. Process. 13 424-437.
  • [24] Eades, P. and Wormald, N. C. (1994). Edge crossings in drawings of bipartite graphs. Algorithmica 11 379-403.
  • [25] Eades, P., Tamassia, R., di Battista, G. and Tollis, I.
  • (1994). Algorithms for drawing graphs: an annotated bibliography. Comput. Geom. 4 235-282.
  • [26] Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
  • [27] Ency clopedia Britannica.
  • [28] Escoufier, Y. (1984). Analy se factorielle en r´ef´erence a un mod ele: application a l'analyse de tableaux d'´echanges. Rev. Statist. Appl. 32 25-36.
  • [29] Escoufier, Y. (1985). L'analyse des correspondences: ses propri´et´es et ses extensions. Bull. Internat. Statist. Inst. 51 1-16.
  • [30] Escoufier, Y. (1988). Bey ond correspondence analysis. In Classification and Related Methods of Data Analy sis (Bock, ed.). North-Holland, Amsterdam.
  • [31] Fisher, R. A. (1938). The precision of discriminant functions. Annals of Eugenics 10 422-429.
  • [32] Freedman, D. A. and Lane, D. (1983). Significance testing in a nonstochastic setting. In A Festschrift for Erich L. Lehmann (P. J. Bickel, K. A. Doksum and J. L. Hodges, Jr., eds.) 185-208. Wadsworth, Belmont, CA.
  • [33] Gifi, A. (1990). Nonlinear Multivariate Analy sis. Wiley, New York.
  • [34] Gilula, Z. (1986). Grouping and association in contingency tables: an exploratory canonical correlation approach. J. Amer. Statist. Assoc. 81 773-779.
  • [35] Gilula, Z. and Haberman, S. J. (1986). Canonical analysis of two-way contingency tables by maximum likelihood. J. Amer. Statist. Assoc. 81 780-788.
  • [36] Gilula, Z. and Ritov, Y. (1990). Inferential ordinal correspondence analysis: motivation, derivation and limitations. Internat. Statist. Rev. 58 99-108.
  • [37] Gittins, R. (1985). Canonical Analy sis: A Review with Applications in Ecology. Springer, Berlin.
  • [38] Gnanadesikan, R. and Kettenring, J. R. (1984). A pragmatic review of multivariate methods in applications. In Statistics: An Appraisal (H. A. David and H. T. David, eds.) Iowa State Univ. Press.
  • [39] Golub, G. H. and van Loan, C. F. (1989). Matrix Computations. Johns Hopkins Univ. Press.
  • [40] Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61 215-231.
  • [41] Goodman, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. Amer. Statist. Assoc. 74 537-552.
  • [42] Goodman, L. A. (1981). Association models and canonical correlation in the analysis of cross-classifications having ordered categories. J. Amer. Statist. Assoc. 76 320-334.
  • [43] Goodman, L. A. (1985). The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models and asy mmetry models for contingency tables with or without missing entries. Ann. Statist. 13 10-69.
  • [44] Goodman, L. A. (1986). Some useful extensions of the usual correspondence analysis approach and the usual log-linear approach in the analysis of contingency tables. Internat. Statist. Rev. 54 243-309.
  • [45] Goodman, L. A. (1994). On quasi-independence and quasidependence in contingency tables, with special reference to ordinal triangular contingency tables. J. Amer. Statist. Assoc. 89 1059-1063.
  • [46] Green, P. J. (1981). Peeling bivariate data. In Interpreting Multivariate Data (Barnett, ed.) Wiley, New York.
  • [47] Green, P. J. and Silverman, B. W. (1979). Constructing the convex hull of a set of points in the plane. Computer Journal 22 262-266.
  • [48] Greenacre, M. J. (1984). Theory and Applications of Correspondence Analy sis. Academic Press, London.
  • [49] Greenacre, M. J. and Hastie, T. (1987). The geometric interpretation of correspondence analysis. J. Amer. Statist. Assoc. 82 437-447.
  • [50] Guttman, L. (1941). The quantification of a class of attributes: A theory and a method of scale construction. The Prediction of Personal Adjustment (Horst et al., eds.) Social Science Research Council, New York.
  • [51] Haberman, S. J. (1970). Analy sis of Qualitative Data. New Developments 2. Academic Press, New York.
  • [52] Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
  • [53] Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73-102.
  • [54] Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis with optimal scoring. J. Amer. Statist. Assoc. 89 1255-1270.
  • [55] Hay ashi, C. (1952). On the prediction of phenomena from qualitative data and the quantification of qualitative data from the mathematico-statistical point of view. Ann. Inst. Statist. Math. 5 121-143.
  • [56] Heiser, W. J. and Meulman, J. J. (1983). Constrained multidimensional scaling. Applied Psy chological Measurement 7 381-404.
  • [57] Hirschfeld, H. O. (1935). A connection between correlation and contingency. Proc. Cambridge Philos. Soc. 31 520-524.
  • [58] Hoffman, D. L. and de Leeuw, J. (1992). Interpreting multiple correspondence analysis as a multidimensional scaling method. Marketing Letters 3 259-272.
  • [59] Horst, P. (1961). Relations among m sets of measures. Psy chometrika 26 129-149.
  • [60] Horst, P. (1961). Generalized canonical correlations and their application to experimental data. Journal of Clinical Psy chology 17 331-347.
  • [61] Hotelling, H. (1935). The most predictable criterion. Journal of Educational Psy chology 26 139-142.
  • [62] Hotelling, H. (1936). Relations between two sets of variables. Biometrika 28 321-377.
  • [63] Kato, T. (1995). Perturbation Theory of Linear Operators. Springer, Berlin.
  • [64] Kaufman, P. and Bradby, D. (1992). Characteristics of at risk students in NELS:88. Report 92-042, National Center for Education Statistics, Washington, DC.
  • [65] Kendall, M. G. (1980). Multivariate Analy sis, 2nd ed. Griffin, London.
  • [66] Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika 58 433-460.
  • [67] Kruskal, J. B. and Shepard, R. N. (1974). A nonmetric variety of linear factor analysis. Psy chometrika 39 123- 157.
  • [68] Kshirsagar, A. N. (1978). Multivariate Analy sis. Dekker, New York.
  • [69] Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analy sis. Houghton Mifflin, Boston.
  • [70] Leamer, E. E. (1978). Specification Searches: Ad Hoc Inferences from Nonexperimental Data. Wiley, New York.
  • [71] Lebart, L., Morineau, A. and Tabard, N. (1977). Technique de la Description Statistique: M´ethodes et Logiciels pour l'Analy se des Grands Tableaux. Dunod, Paris.
  • [72] Liu, R. Y., Singh, K. and Lo, S. (1989). On a representation related to the bootstrap. Sankhy¯a Ser. A 51 168-197.
  • [73] Markus, M. T. (1994). Bootstrap Confidence Regions in Nonlinear Multivariate Analy sis. DSWO Press, Leiden.
  • [74] Meulman, J. J. (1984). Correspondence analysis and stability. Research Report 84-01, Dept. Data Theory, Leiden Univ.
  • [75] Michailidis, G. and de Leeuw, J. (1995). Nonlinear multivariate analysis of NELS:88. UCLA Statistical Series Preprints 175. Univ. California, Los Angeles.
  • [76] Michailidis, G. and de Leeuw, J. (1996). The Gifi sy stem of nonlinear multivariate analysis. UCLA Statistical Series Preprints 204. Univ. California, Los Angeles.
  • [77] Molenaar, I. W. (1988). Formal statistics and informal data analysis, or why laziness should be discouraged. Statist. Neerlandica 42 83-90.
  • [78] National Education Goals Panel (1992). The National Education Goals Report: Building a Nation of Learners. Washington, DC.
  • [79] Nishisato, S. (1980). Analy sis of Categorical Data: Dual Scaling and Its Applications. Toronto Univ. Press.
  • [80] Nishisato, S. (1994). Elements of Dual Scaling. An Introduction to Practical Data Analy sis. Erlbaum, Hillsdale.
  • [81] Nishisato, S. and Nishisato, I. (1984). An Introduction to Dual Scaling. Microstats, Toronto.
  • [82] Oakes, J. (1989). What educational indicators? The case for assessing the school context. Educational Evaluation and Policy Analy sis 11 181-199.
  • [83] Ritov, Y. and Gilula, Z. (1993). Analy sis of contingency tables by correspondence models subject to order constraints. J. Amer. Statist. Assoc. 88 1380-1387.
  • [84] Roskam, E. E. (1968). Metric Analy sis of Ordinal Data in Psy chology. VAM, Voorschoten.
  • [85] Rutishauser, H. (1969). Computational aspects of F. L. Bauers's simultaneous iteration method. Numer. Math. 13 4-13.
  • [86] Saporta, G. (1975). Liaisons entre Plusieurs Ensembles de Variables et Codage de Donn´ees Qualitatives. Univ. Paris
  • VI, Paris.
  • [87] Schriever, B. F. (1983). Scaling of order dependent categorical variables with correspondence analysis. Internat. Statist. Rev. 51 225-238.
  • [88] Shao, J. (1992). Some results for differentiable statistical functionals. In Nonparametric Statistics and Related Topics (Saleh, ed.) 179-188. North-Holland, Amsterdam.
  • [89] Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer, New York.
  • [90] SPSS Inc. SPSS Categories User's Manual.
  • [91] Steel, R. G. D. (1951). Minimum generalized variance for a set of linear functions. Ann. Math. Statist. 22 456-460.
  • [92] Takane, Y. and Shibay ama, T. (1991). Principal component analysis with external information on both subjects and variables. Psy chometrika 56 97-120.
  • [93] van Buuren, S. (1990). Optimal Scaling of Time Series. DSWO Press, Leiden.
  • [94] van Buuren, S. (1994). Groupals analysis of abiotic measures from an environmental study in the archipelago of Hochelage. Research Report 94-14, Dept. Data Theory, Univ. Leiden.
  • [95] van Buuren, S. and Heiser, W. J. (1989). Clustering N objects into K groups under optimal scaling of variables. Psy chometrika 54 699-706.
  • [96] van de Geer, J. P. (1984). Linear relationships among k sets of variables. Psy chometrika 49 79-94.
  • [97] van der Burg, E. (1985). Homals classification of whales, porpoises and dolphins. In Data Analy sis in Real Life Environment: Ins and Outs of Solving Problems (J.-F. Marcotorchino, J.-M. Proth and J. Janssen, eds.) 25-35. North-Holland, Amsterdam.
  • [98] van der Burg, E. and de Leeuw, J. (1988). Use of the multinomial jackknife and bootstrap in generalized canonical correlation analysis. Appl. Stochastic Models Data Anal. 4 154-172.
  • [99] van der Burg, E., de Leeuw, J. and Dijksterhuis, G.
  • (1994). Nonlinear canonical correlation with k sets of variables. Comput. Statist. Data Anal. 18 141-163.
  • [100] van der Burg, E., de Leeuw, J. and Verdegaal, R.
  • (1988). Homogeneity analysis with K sets of variables: an alternating least squares method with optimal scaling features. Psy chometrika 53 177-197.
  • [101] van der Heijden, P. G. M., de Falguerolles, A. and de
  • Leeuw, J. (1989). A combined approach to contingency table analysis using correspondence analysis and loglinear analysis. J. Roy. Statist. Soc. Ser. C 38 249-292.
  • [102] van Rijckevorsel, J. L. A. (1987). The Application of Fuzzy Coding and Horseshoes in Multiple Correspondence Analy sis. DSWO Press, Leiden.
  • [103] Verdegaal, R. (1986). OVERALS. Research Report UG86-01. Dept. Data Theory, Leiden.
  • [104] Weinberg, S. L., Carroll, J. D. and Cohen, H. S.
  • (1984). Confidence regions for INDSCAL using jackknife and bootstrap techniques. Psy chometrika 49 475-491.
  • [105] Young, F. W., de Leeuw, J. and Takane, Y. (1976). Regression with qualitative variables: an alternating least squares method with optimal scaling features. Psy chometrika 41 505-529.
  • [106] Young, F. W. (1981). Quantitative analysis of qualitative data. Psy chometrika 46 357-388.