Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
There has existed controversy about the use of marginal and conditional models, particularly in the analysis of data from longitudinal studies. We show that alleged differences in the behavior of parameters in so-called marginal and conditional models are based on a failure to compare like with like. In particular, these seemingly apparent differences are meaningless because they are mainly caused by preimposed unidentifiable constraints on the random effects in models. We discuss the advantages of conditional models over marginal models. We regard the conditional model as fundamental, from which marginal predictions can be made.
This article presents an exposition and synthesis of the theory and some applications of the so-called indirect method of inference. These ideas have been exploited in the field of econometrics, but less so in other fields such as biostatistics and epidemiology. In the indirect method, statistical inference is based on an intermediate statistic, which typically follows an asymptotic normal distribution, but is not necessarily a consistent estimator of the parameter of interest. This intermediate statistic can be a naive estimator based on a convenient but misspecified model, a sample moment or a solution to an estimating equation. We review a procedure of indirect inference based on the generalized method of moments, which involves adjusting the naive estimator to be consistent and asymptotically normal. The objective function of this procedure is shown to be interpretable as an “indirect likelihood” based on the intermediate statistic. Many properties of the ordinary likelihood function can be extended to this indirect likelihood. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as random effects and measurement error, for example, and it can also serve as a basis for robust inference and model selection, with less stringent assumptions on the data generating mechanism. Many familiar estimation techniques can be viewed as examples of this approach. We describe applications to measurement error, omitted covariates and recurrent events. A dataset concerning prevention of mammary tumors in rats is analyzed using a Poisson regression model with overdispersion. A second dataset from an epidemiological study is analyzed using a logistic regression model with mismeasured covariates. A third dataset of exam scores is used to illustrate robust covariance selection in graphical models.
The Taiwanese–American Occultation Survey (TAOS) will detect objects in the Kuiper Belt by measuring the rate of occultations of stars by these objects, using an array of three to four 50 cm wide-field robotic telescopes. Thousands of stars will be monitored, resulting in hundreds of millions of photometric measurements per night. To optimize the success of TAOS, we have investigated various methods of gathering and processing the data, and developed statistical methods for detecting occultations. In this paper we discuss these methods. The resulting estimated detection efficiencies will be used to guide the choice of various operational parameters that determine the mode of actual observation when the telescopes come on line and begin routine observations. In particular, we show how real-time detection algorithms may be constructed, taking advantage of having multiple telescopes. We also discuss a retrospective method for estimating the rate at which occultations occur.
The Chandra X-Ray Observatory, launched by the space shuttle Columbia in July 1999, has taken its place with the Hubble Space Telescope, the Compton Gamma Ray Observatory and the Spitzer Infrared Space Telescope in NASA’s fleet of state of the art space-based Great Observatories. As the world’s premier X-ray observatory, Chandra gives astronomers a powerful tool to investigate black holes, exploding stars and colliding galaxies in the hot turbulent regions of the universe. Chandra uses four pairs of ultra-smooth high-resolution mirrors and efficient X-ray photon counters to produce images at least 30 times sharper than any previous X-ray telescope. Unlocking the information in these images, however, requires subtle statistical analysis; currently popular statistical methods typically involve Gaussian approximations (e.g., minimum χ2 fitting), which are not justifiable for the high-resolution low-count data. In this article, we employ modern Bayesian computational techniques (e.g., expectation–maximization-type algorithms, the Gibbs sampler and Metropolis–Hastings) to fit new highly structured models that account for the Poisson nature of photon counts, background contamination, image blurring due to instrumental constraints, photon absorption, photon pileup and source features such as spectral emission lines and absorption features. This application demonstrates the flexibility and power of modern Bayesian methodology and algorithms to handle highly structured models that are convolved with complex data collection mechanisms involving nonignorable missing data.
Cosmic microwave background (CMB) radiation can be viewed as a snapshot of the Universe 13 billion years ago, when it had 0.002% of its current age. A flood of data on CMB is becoming available thanks to satellite and balloon-borne missions, and a number of statistical issues have been raised consequently. A very relevant issue is the characterization of the statistical distribution of CMB and, in particular, procedures to test the assumption that the generating random field is Gaussian. Gaussianity tests are of fundamental importance both to validate statistical inference procedures and to discriminate between competing scenarios for the Big Bang dynamics. Several procedures have been proposed in the cosmological literature. This article is an attempt to provide a brief survey of developments in this area.
The cosmic microwave background (CMB), which permeates the entire Universe, is the radiation left over from just 380,000 years after the Big Bang. On very large scales, the CMB radiation field is smooth and isotropic, but the existence of structure in the Universe—stars, galaxies, clusters of galaxies, …—suggests that the field should fluctuate on smaller scales. Recent observations, from the Cosmic Microwave Background Explorer to the Wilkinson Microwave Anisotropy Probe, have strikingly confirmed this prediction.
CMB fluctuations provide clues to the Universe’s structure and composition shortly after the Big Bang that are critical for testing cosmological models. For example, CMB data can be used to determine what portion of the Universe is composed of ordinary matter versus the mysterious dark matter and dark energy. To this end, cosmologists usually summarize the fluctuations by the power spectrum, which gives the variance as a function of angular frequency. The spectrum’s shape, and in particular the location and height of its peaks, relates directly to the parameters in the cosmological models. Thus, a critical statistical question is how accurately can these peaks be estimated.
We use recently developed techniques to construct a nonparametric confidence set for the unknown CMB spectrum. Our estimated spectrum, based on minimal assumptions, closely matches the model-based estimates used by cosmologists, but we can make a wide range of additional inferences. We apply these techniques to test various models and to extract confidence intervals on cosmological parameters of interest. Our analysis shows that, even without parametric assumptions, the first peak is resolved accurately with current data but that the second and third peaks are not.
The data complexity and volume of astronomical findings have increased in recent decades due to major technological improvements in instrumentation and data collection methods. The contemporary astronomer is flooded with terabytes of raw data that produce enormous multidimensional catalogs of objects (stars, galaxies, quasars, etc.) numbering in the billions, with hundreds of measured numbers for each object. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted data sets and the complex links between data and astrophysical theory. In recognition of this task, the National Virtual Observatory (NVO) initiative recently emerged to federate numerous large digital sky archives, and to develop tools to explore and understand these vast volumes of data. The effective use of such integrated massive data sets presents a variety of new challenging statistical and algorithmic problems that require methodological advances. An interdisciplinary team of statisticians, astronomers and computer scientists from The Pennsylvania State University, California Institute of Technology and Carnegie Mellon University is developing statistical methodology for the NVO. A brief glimpse into the Virtual Observatory and the work of the Penn State-led team is provided here.
Sufficiency has long been regarded as the primary reduction procedure to simplify a statistical model, and the assessment of the procedure involves an implicit global repeated sampling principle. By contrast, conditional procedures are almost as old and yet appear only occasionally in the central statistical literature. Recent likelihood theory examines the form of a general large sample statistical model and finds that certain natural conditional procedures provide, in wide generality, the definitive reduction from the initial variable to a variable of the same dimension as the parameter, a variable that can be viewed as directly measuring the parameter. We begin with a discussion of two intriguing examples from the literature that compare conditional and global inference methods, and come quite extraordinarily to opposite assessments concerning the appropriateness and validity of the two approaches. We then take two simple normal examples, with and without known scaling, and progressively replace the restrictive normal location assumption by more general distributional assumptions. We find that sufficiency typically becomes inapplicable and that conditional procedures from large sample likelihood theory produce the definitive reduction for the analysis. We then examine the vector parameter case and find that the elimination of nuisance parameters requires a marginalization step, not the commonly proffered conditional calculation that is based on exponential model structure. Some general conditioning and modelling criteria are then introduced. This is followed by a survey of common ancillary examples, which are then assessed for conformity to the criteria. In turn, this leads to a discussion of the place for the global repeated sampling principle in statistical inference. It is argued that the principle in conjunction with various optimality criteria has been a primary factor in the long-standing attachment to the sufficiency approach and in the related neglect of the conditioning procedures based directly on available evidence.
Donald A. S. Fraser was born in Toronto in 1925 and spent his early years in Stratford, Ontario. His father and both grandfathers were doctors, and his mother was a nurse. He was a student at St. Andrew’s College in Aurora, north of Toronto, before entering the Mathematics, Physics and Chemistry program at the University of Toronto as an undergraduate. He specialized in mathematics in the upper years and, in his final year, was a member of the winning team in the 1946 Putnam competition, standing among the top five competitors overall. For graduate studies he went to Princeton University to study mathematics, became interested in statistics and obtained a Ph.D. in 1949 under the supervision of Samuel Wilks.
He returned to the University of Toronto as an Assistant Professor in Mathematics in 1949, and stayed at Toronto for most of his career, becoming Professor in 1958 and first Chair of the Department of Statistics in 1977. He has held visiting appointments at Princeton, Stanford, Copenhagen, Wisconsin, Hawaii and Geneva, and is Adjunct Professor at the University of Waterloo. Following his formal retirement from the University of Toronto, he was Professor in the Department of Mathematics and Statistics at York University for several years. Currently at the University of Toronto, he is still teaching and supervising students. Among his more than 50 Ph.D. students are counted many university statisticians, and he has had a profound influence on the way statistics is thought about and taught, particularly in Canadian universities.
Professor Fraser is the author of several books, including The Structure of Inference (1968) and Inference and Linear Models (1979), and author and co-author of more than 200 papers. He was elected a Fellow of the Institute of Mathematical Statistics in 1954, and Member of the International Statistical Institute and Fellow of the American Statistical Association in 1962. In 1967, he was the first statistician to be named a Fellow of the Royal Society of Canada. He was the first recipient of the Gold Medal of the Statistical Society of Canada, inaugurated in 1985. In 1990, he received the R. A. Fisher Award of the Committee of Presidents of Statistical Societies. His award lecture at the Joint Statistical Meetings in Anaheim that year was entitled “Likelihood and Tests of Significance: Linking the Fisher Concepts.” In 1992, he accepted an honorary Doctor of Mathematics degree from the University of Waterloo. In 2002, the degree of Doctor of Science, honoris causa, was conferred to him by the University of Toronto.