Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. Several examples are used to illustrate the methods. A brief account of stability issues and areas of applications of the techniques is also given.
Say that a regression method is "unstable" at a data set if a small change in the data can cause a relatively large change in the fitted plane. A well-known example of this is the instability of least squares regression (LS) near (multi)collinear data sets. It is known that least absolute deviation (LAD) and least median of squares (LMS) linear regression can exhibit instability at data sets that are far from collinear. Clear-cut instability occurs at a "singularity"--a data set, arbitrarily small changes to which can substantially change the fit. For example, the collinear data sets are the singularities of LS. One way to measure the extent of instability of a regression method is to measure the size of its "singular set" (set of singularities). The dimension of the singular set is a tractable measure of its size that can be estimated without distributional assumptions or asymptotics.
By applying a general theorem on the dimension of singular sets, we find that the singular sets of LAD and LMS are at least as large as that of LS and often much larger. Thus, prima facie, LAD and LMS are frequently unstable. This casts doubt on the trustworthiness of LAD and LMS as exploratory regression tools.
The $2\times2$ table is used as a vehicle for discussing different approaches to statistical inference. Several of these approaches (both classical and Bayesian) are compared, and difficulties with them are highlighted. More frequent use of one-sided tests is advocated. Given independent samples from two binomial distributions, and taking independent Jeffreys priors, we note that the posterior probability that the proportion of successes in the first population is larger than in the second can be estimated from the standard (uncorrected) chi-square significance level. An exact formula for this probability is derived. However, we argue that usually it will be more appropriate to use dependent priors, and we suggest a particular "standard prior" for the $2\times2$ table. For small numbers of observations this is more conservative than Fisher's exact test, but it is less conservative for larger sample sizes. Several examples are given.
KEYWORDS: measures of dispersion, standard deviation, mean deviation, median absolute deviation, mean difference, range, interquartile distance, order statistics, chi-squared distribution, Gauss, Laplace, Bienaymé, Abbe, Helmert, 62-01, 62-03
This paper attempts a brief account of the history of sample measures of dispersion, with major emphasis on early developments. The statistics considered include standard deviation, mean deviation, median absolute deviation, mean difference, range, interquartile distance and linear functions of order statistics. The multiplicity of measures is seen to result from constant efforts to strike a balance between efficiency and ease of computation, with some recognition also of the desirability of robustness and theoretical convenience. Many individuals shaped this history, especially Gauss. The main contributors to our story are in chronological order, Lambert, Laplace, Gauss, Bienaymé, Abbe, Helmert and Galton
Ching Chun Li was born on October 27, 1912, in Tianjin, China. He received his B.S. degree in agronomy from the University of Nanjing, China, in 1936 and a Ph.D. in plant breeding and genetics from Cornell University in 1940. He did postgraduate work in mathematics, mathematical statistics and experimental statistics at the University of Chicago, Columbia University and North Carolina State College, 1940-1941. He is a Fellow of the American Statistical Association (elected 1969), an elected member of the International Statistical Institute, a Fellow of the American Association for the Advancement of Science and an elected member of Academia Sinica (Chinese Academy). He served as President of the American Society of Human Genetics in 1960. His tenure at the University of Pittsburgh began in 1951. He was Professor and Department Chairman, Biostatistics, from 1969 to 1975, and he was promoted to University Professor in 1975. Although he retired in 1983, he has remained active in research.