Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
Ann. Statist. 30 (6), 1535-1575, (December 2002) DOI: 10.1214/aos/1043351246
KEYWORDS: Bell Labs, computational mathematics, computers, cryptography, data analysis, History, information theory, John W. Tukey, mathematics, neologisms, Princeton University, statistics, 01A70, 54-03, 62-03, 68-03
This article provides a historical overview of the philosophical, theoretical and practical contributions made by John Tukey to the field of simultaneous inference. His early work, culminating in the monograph "The Problem of Multiple Comparisons," established him as one of the pioneers in the field, investing it with both academic respectability and a focus on practical problems. For many years afterward, Tukey only published sporadically in the area but remained convinced that multiplicity issues were of fundamental importance. During the last decade of his life, Tukey again devoted substantial attention to multiplicity, experimenting with different graphical representations of multiple comparison procedures and exploring the implications of new approaches to controlling family-wise error rates. He leaves a rich legacy that should engage and inspire statisticians for many years to come.
The contributions of John W. Tukey to time series analysis, particularly spectrum analysis, are reviewed and discussed. The contributions include: methods, their properties, terminology, popularization, philosophy, applications and education. Much of Tukey's early work on spectrum analysis remained unpublished for many years, but the 1959 book by Blackman and Tukey made his approach accessible to a wide audience. In 1965 the Cooley-Tukey paper on the Fast Fourier Transform spurred a rapid change in signal processing. That year serves as a boundary between the two main parts of this article, a chronological review of JWT's contributions, decade by decade. The time series work of Tukey and others led to the appearance of kernel and nonparametric estimation in mainstream statistics and to the recognition of the consequent difficulties arising in naive uses of the techniques.
Although not a traditional philosopher, John Tukey contributed much to our understanding of statistical science and empirical science more broadly. The former is represented by the light he shed on the relation of drawing conclusions to making decisions, and of how simple concepts like significance and confidence serve to back up or "confirm" empirical findings. Less successfully, he attempted inconclusively to sort out the ambiguities of R. A. Fisher's fiducial argument. His main effort, however, went to creating "exploratory data analysis" or EDA as a subfieldof statistics with much to offer to ongoing developments in data mining and data visualization.
If there ever was a tool that could stimulate the imagination and profit from the intuition and creativity of John Tukey, it was computer graphics. John always saw graphics a being central to exploratory data analysis: "Since the aim of exploratory data analysis is to learn what seems to be, it should be no surprise that pictures play a vital role in doing it well. There is nothing better than a picture for making you think of questions you hadforgotten to ask (even mentally)." Much of his work focused on static displays designed to be easily drawn by hand, but he realized that if one wanted to effectively explore multivariate data, computer graphics would be an ideal tool. PRIM-9, the first program to use interactive, dynamic graphics for viewing and dissecting multivariate data, was conceived by John during a four month visit to the Computation Research Group of the Stanford Linear Accelerator Center in early 1972. PRIM-9 opened up a fundamentally new way of exploring multivariate data. Its basic operations--Picturing, Rotation, Isolation and Masking--have stood the test of time and form the core of numerous follow-on systems.
John's experiences with PRIM-9 gave rise to a slew of other ideas for the analysis of multivariate data, many of them not tied to interactive graphics. The most well known of those is "Projection Pursuit"--automatically finding interesting low-dimensional projections of multivariate data by optimizing a projection index. John was also keenly interested in ways of detecting and modeling nonlinear structures in multivariate data which might not bemanifest in projections, such as concentration of data near nonlinear lower dimensional manifolds. Many of his proposals exist only in the form of hand-written notes and appear "far out" even today.
John's work on Prim-9 and Projection Pursuit lent respectability to computationally oriented, non mathematical research in Statistics. He moved the center of gravity away from an (over)emphasis on mathematical theory to a greater balance between methodology, theory, and applications and thereby helped revitalize the discipline of Statistics.
John Tukey connected the theory underlying simple random sampling without replacement, cumulants, expected mean squares and spectrum analysis. He gave us one degree of freedom for nonadditivity, and he pioneered finite population models for understanding ANOVA. He wrote widely on the nature and purpose of ANOVA, and he illustrated his approach. In this appreciation of Tukey's work on ANOVA we summarize and comment on his contributions, and refer to some relevant recent literature.
For a general definition of depth in data analysis a differential-like calculus is constructed in which the location case (the framework of Tukey's median) plays a fundamental role similar to that of linear functions in the mathematical analysis. As an application, a lower bound for maximal regression depth is proved in the general multidimensional case--as conjectured by Rousseeuw and Hubert and others. This lower bound is demonstrated to have an impact on the breakdown point of the maximum depth estimator.
Tukey's median is among one of the earliest known high breakdown point multivariate location statistics. Aside from its breakdown point, though, little else appears to be known about its robustness properties. In this paper we investigate other aspects of Tukey's median, and in particular we derive and study its influence function and its maximum contamination bias function. When judged by these other robustness criteria, Tukey's median again proves to be highly robust.
In this paper we study the maximum asymptotic bias of the projection estimate for multivariate location based on univariate estimates of location and dispersion. In particular we study the projection estimate that uses the median and median absolute deviation about the median (MAD) as univariate location and dispersion estimates respectively. This estimator may be considered a natural affine equivariant multivariate median. For spherical distributions the maximum bias of this estimate depends only on the marginal distributions, and not on the dimension, and is approximately twice the maximum bias of the univariate median. We also show that for multivariate normal distributions, its maximum bias compares favorably with those of the Donoho-Stahel, minimum volume ellipsoid and minimum covariance determinant estimates. In all these cases the maximum bias increases with the dimension p.
The paper presents a unified jackknife theory for a fairly general class of mixed models which includes some of the widely used mixed linear models and generalized linear mixed models as special cases. The paper develops jackknife theory for the important, but so far neglected, prediction problem for the general mixed model. For estimation of fixed parameters, a jackknife method is considered for a general class of M-estimators which includes the maximum likelihood, residual maximum likelihood and ANOVA estimators for mixed linear models and the recently developed method of simulated moments estimators for generalized linear mixed models. For both the prediction and estimation problems, a jackknife method is used to obtain estimators of the mean squared errors (MSE). Asymptotic unbiasedness of the MSE estimators is shown to hold essentially under certain moment conditions. Simulation studies undertaken support our theoretical results.