Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
Consider gambler’s ruin with three players, 1, 2, and 3, having initial capitals A, B, and C units. At each round a pair of players is chosen (uniformly at random) and a fair coin flip is made resulting in the transfer of one unit between these two players. Eventually, one of the players is eliminated and play continues with the remaining two. Let be the elimination order (e.g., means player 1 is eliminated first and player 3 is eliminated second, leaving player 2 with units).
We seek approximations (and exact formulas) for the elimination order probabilities . Exact, as well as arbitrarily precise, computation of these probabilities is possible when is not too large. Linear interpolation can then give reasonable approximations for large N. One frequently used approximation, the independent chip model (ICM), is shown to be inadequate. A regression adjustment is proposed, which seems to give good approximations to the elimination order probabilities.
This paper concerns the construction of confidence intervals in standard seroprevalence surveys. In particular, we discuss methods for constructing confidence intervals for the proportion of individuals in a population infected with a disease using a sample of antibody test results and measurements of the test’s false positive and false negative rates. We begin by documenting erratic behavior in the coverage probabilities of standard Wald and percentile bootstrap intervals when applied to this problem. We then consider two alternative sets of intervals constructed with test inversion. The first set of intervals are approximate, using either asymptotic or bootstrap approximation to the finite-sample distribution of a chosen test statistic. We consider several choices of test statistic, including maximum likelihood estimators and generalized likelihood ratio statistics. We show with simulation that, at empirically relevant parameter values and sample sizes, the coverage probabilities for these intervals are close to their nominal level and are approximately equi-tailed. The second set of intervals are shown to contain the true parameter value with probability at least equal to the nominal level, but can be conservative in finite samples.
In quantum computing, a demonstration of quantum supremacy (or quantum advantage) consists of presenting a task, possibly of no practical value, whose computation is feasible on a quantum device, but cannot be performed by classical computers in any feasible amount of time. The notable claim of quantum supremacy presented by Google’s team in 2019 consists of demonstrating the ability of a quantum circuit to generate, albeit with considerable noise, bitstrings from a distribution that is considered hard to simulate on classical computers. Very recently, in 2020, a quantum supremacy claim was presented by a group from the University of Science and Technology of China, using a different technology and generating a different distribution, but sharing some statistical principles with Google’s demonstration.
Verifying that the generated data is indeed from the claimed distribution and assessing the circuit’s noise level and its fidelity is a statistical undertaking. The objective of this paper is to explain the relations between quantum computing and some of the statistical aspects involved in demonstrating quantum supremacy in terms that are accessible to statisticians, computer scientists, and mathematicians. Starting with the statistical modeling and analysis in Google’s demonstration, which we explain, we study various estimators of the fidelity, and different approaches to testing the distributions generated by the quantum computer. We propose different noise models, and discuss their implications. A preliminary study of the Google data, focusing mostly on circuits of 12 and 14 qubits is given in different parts of the paper.
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in -packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.
The reproducibility crisis has led to an increasing number of replication studies being conducted. Sample sizes for replication studies are often calculated using conditional power based on the effect estimate from the original study. However, this approach is not well suited as it ignores the uncertainty of the original result. Bayesian methods are used in clinical trials to incorporate prior information into power calculations. We propose to adapt this methodology to the replication framework and promote the use of predictive instead of conditional power in the design of replication studies. Moreover, we describe how extensions of the methodology to sequential clinical trials can be tailored to replication studies. Conditional and predictive power calculated at an interim analysis are compared and we argue that predictive power is a useful tool to decide whether to stop a replication study prematurely. A recent project on the replicability of social sciences is used to illustrate the properties of the different methods.
Intention-to-treat (ITT) comparisons have a central place in reporting on randomized controlled trials, though there are typically additional analyses of interest such as those making adjustments for nonadherence. In our ITT reporting of results from the Women’s Health Initiative (WHI) randomized trials, we have relied primarily on highly flexible hazard ratio (Cox) regression methods. However, these methods, especially the proportional hazards special case, have been criticized for being difficult to interpret and frequently oversimplified, and for not being consistent with modern causality theories using potential outcomes. Here we address these topics and extend our use of hazard rate methods for ITT comparisons in the WHI trials.
A delay between the occurrence and the reporting of events often has practical implications such as for the amount of capital to hold for insurance companies, or for taking preventive actions in case of infectious diseases. The accurate estimation of the number of incurred but not (yet) reported events forms an essential part of properly dealing with this phenomenon. We review the current practice for analysing such data and we present a flexible regression framework to jointly estimate the occurrence and reporting of events. By linking this setting to an incomplete data problem, estimation is performed via an expectation-maximization algorithm. The resulting method is elegant, easy to understand and implement, and provides refined insights in the nowcasts. The proposed methodology is applied to a European general liability portfolio in insurance.
Under the linear regression framework, we study the variable selection problem when the underlying model is assumed to have a small number of nonzero coefficients. Nonconvex penalties in specific forms are well studied in the literature for sparse estimation. Recent work pointed out that nearly all existing nonconvex penalties can be represented as difference-of-convex (DC) functions, which are the difference of two convex functions, while itself may not be convex. There is a large existing literature on optimization problems when their objectives and/or constraints involve DC functions. Efficient numerical solutions have been proposed. Under the DC framework, directional-stationary (d-stationary) solutions are considered, and they are usually not unique. In this paper, we show that under some mild conditions, a certain subset of d-stationary solutions in an optimization problem (with a DC objective) has some ideal statistical properties: namely, asymptotic estimation consistency, asymptotic model selection consistency, asymptotic efficiency. Our assumptions are either weaker than or comparable with those conditions that have been adopted in other existing works. This work shows that DC is a nice framework to offer a unified approach to these existing works where nonconvex penalties are involved. Our work bridges the communities of optimization and statistics.
Markov chain Monte Carlo (MCMC) methods have not been broadly adopted in Bayesian neural networks (BNNs). This paper initially reviews the main challenges in sampling from the parameter posterior of a neural network via MCMC. Such challenges culminate to lack of convergence to the parameter posterior. Nevertheless, this paper shows that a nonconverged Markov chain, generated via MCMC sampling from the parameter space of a neural network, can yield via Bayesian marginalization a valuable posterior predictive distribution of the output of the neural network. Classification examples based on multilayer perceptrons showcase highly accurate posterior predictive distributions. The postulate of limited scope for MCMC developments in BNNs is partially valid; an asymptotically exact parameter posterior seems less plausible, yet an accurate posterior predictive distribution is a tenable research avenue.
Steve Portnoy was born in Kankakee, Illinois in 1942. He did his undergraduate studies in mathematics at Massachusetts Institute of Technology, and then earned a master’s degree and a Ph.D. degree from the statistics department at Stanford University in 1966 and 1969, respectively.
Steve Portnoy has had a distinguished career and is widely recognized as a preeminent mathematical statistician. He has made pioneering and influential contributions in several areas in statistics, including asymptotic theory, robust statistics, quantile regression, and statistics in biology. He has published more than 100 research articles. He is a former co-editor of Journal of the American Statistical Association (Theory and Methods), an elected fellow of American Statistical Association (ASA), Institute of Mathematical Statistics (IMS) and American Association for the Advancement of Science (AAAS).
Steve’s professional positions have included being on the faculty of the Department of Statistics at Harvard University and the University of Illinois at Urbana-Champaign for more than 30 years. He was a founding member of the Department of Statistics at the University of Illinois in 1985 and served as the division chair for the Statistics Program in the Mathematics department before the Statistics department was established.