Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
In causal inference, principal stratification is a framework for dealing with a posttreatment intermediate variable between a treatment and an outcome. In this framework, the principal strata are defined by the joint potential values of the intermediate variable. Because the principal strata are not fully observable, the causal effects within them, also known as the principal causal effects, are not identifiable without additional assumptions. Several previous empirical studies leveraged auxiliary variables to improve the inference of principal causal effects. We establish a general theory for the identification and estimation of principal causal effects with auxiliary variables, which provides a solid foundation for statistical inference and more insights for model building in empirical research. In particular, we consider two commonly used assumptions for principal stratification problems: principal ignorability and the conditional independence between the auxiliary variable and the outcome given principal strata and covariates. Under each assumption, we give nonparametric and semiparametric identification results without modeling the outcome. When neither assumption is plausible, we propose a large class of flexible parametric and semiparametric models for identifying principal causal effects. Our theory not only establishes formal identification results of several models that have been used in previous empirical studies but also generalizes them to allow for different types of outcomes and intermediate variables.
Confidence and likelihood are fundamental statistical concepts with distinct technical interpretation and usage. Confidence is a meaningful concept of uncertainty within the context of confidence-interval procedure, while likelihood has been used predominantly as a tool for statistical modelling and inference given observed data. Here we show that confidence is in fact an extended likelihood, thus giving a much closer correspondence between the two concepts. This result gives the confidence concept an external meaning outside the confidence-interval context, and vice versa, it gives the confidence interpretation to the likelihood. In addition to the obvious interpretation purposes, this connection suggests two-way transfers of technical information. For example, the extended likelihood theory gives a clear way to update or combine confidence information. On the other hand, the confidence connection means that intervals derived from the extended likelihood have the same status as confidence intervals. This gives the extended likelihood direct access to the frequentist probability, an objective certification not directly available to the classical likelihood.
Gelman and Rubin’s (Statist. Sci.7 (1992) 457–472) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods for estimating variance of Monte Carlo averages. We show that these estimators find immediate use in the Gelman–Rubin statistic, a connection not previously established in the literature. We incorporate these estimators to upgrade both the univariate and multivariate Gelman–Rubin statistics, leading to improved stability in MCMC termination time. An immediate advantage is that our new Gelman–Rubin statistic can be calculated for a single chain. In addition, we establish a one-to-one relationship between the Gelman–Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled termination criterion for the Gelman–Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via examples.
Under the random design of longitudinal data, observation times are irregular, and there are mainly two frameworks for analyzing such kind of longitudinal data. One is the clustered data framework and the other is the counting process framework. In this paper, we give a thorough comparison of these two frameworks in terms of data structure, model assumptions and estimation procedures. We find that modeling the observation times in the counting process framework will not gain any efficiency when the observation times are correlated with covariates but independent of the longitudinal response given covariates. Some simulation studies are conducted to compare the finite sample behaviors of the related estimators, and a real data analysis of the Alzheimer’s disease study is implemented for further comparison.
In the framework of multi-way data analysis, this paper presents symmetrical and non-symmetrical variants of three-way correspondence analysis that are suitable when a three-way contingency table is constructed from ordinal variables. In particular, such variables may be modelled using general recurrence formulae to generate orthogonal polynomial vectors instead of singular vectors coming from one of the possible three-way extensions of the singular value decomposition. As we shall see, these polynomials, that until now have been used to decompose two-way contingency tables with ordered variables, also constitute an alternative orthogonal basis for modelling symmetrical, non-symmetrical associations and predictabilities in three-way contingency tables. Consequences with respect to modelling and graphing will be highlighted.
A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very naïve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the naïve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the naïve confidence interval, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the naïve score test, which can be used to test the hypotheses regarding the full-model regression coefficients.
Using least squares techniques, there is an awareness of the dangers posed by the occurrence of outliers present in the data. In general, outliers may totally spoil an ordinary least squares analysis. To cope with this problem, statistical techniques have been developed that are not so easily affected by outliers. These methods are called robust or resistant. In this overview paper, we illustrate that robust solutions can be acquired by solving a reweighted least squares problem even though the initial solution is not robust. This overview paper relates classical results from robustness to the most recent advances of robustness in least squares kernel based regression, with an emphasis on theoretical results as well as practical examples. Software for iterative reweighting is also made freely available to the user.
The hypothesis of randomness is fundamental in statistical machine learning and in many areas of nonparametric statistics; it says that the observations are assumed to be independent and coming from the same unknown probability distribution. This hypothesis is close, in certain respects, to the hypothesis of exchangeability, which postulates that the distribution of the observations is invariant with respect to their permutations. This paper reviews known methods of testing the two hypotheses concentrating on the online mode of testing, when the observations arrive sequentially. All known online methods for testing these hypotheses are based on conformal martingales, which are defined and studied in detail. An important variety of online testing is change detection, where the use of conformal martingales leads to conformal versions of the CUSUM and Shiryaev–Roberts procedures; these versions work in the nonparametric setting where the data is assumed IID according to a completely unknown distribution before the change. The paper emphasizes conceptual and practical aspects and states two kinds of results. Validity results limit the probability of a false alarm or, in the case of change detection, the frequency of false alarms for various procedures based on conformal martingales. Efficiency results establish connections between randomness, exchangeability, and conformal martingales.
Donald Andrew Dawson (Don Dawson) was born in 1937. He received a bachelor’s degree in 1958 and a master’s degree in 1959 from McGill University and a Ph.D. in 1963 from M.I.T. under the supervision of Henry P. McKean, Jr. Following an appointment at McGill University as professor for 7 years, he joined Carleton University in 1970 where he remained for the rest of his career. Among his many contributions to the theory of stochastic processes, his work leading to the creation of the Dawson–Watanabe superprocess and the analysis of its remarkable properties in describing the evolution in space and time of populations, stand out as milestones of modern probability theory. His numerous papers span the whole gamut of contemporary hot areas, notably the study of stochastic evolution equations, measure-valued processes, McKean–Vlasov limits, hierarchical structures, super-Brownian motion, as well as branching, catalytic and historical processes. He has over 200 refereed publications and 8 monographs, with an impressive number of citations, more than 7000. He is elected Fellow of the Royal Society and of the Royal Society of Canada, as well as Gold medalist of the Statistical Society of Canada and elected Fellow of the Institute of Mathematical Statistics. We realized this interview to celebrate the outstanding contribution of Don Dawson to 50 years of Stochastics at Carleton University.