Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
We discuss how the foundations of statistical science have been presented historically in textbooks, with focus on the first half of the twentieth century after the field had become better defined by advances due to Francis Galton, Karl Pearson, and R. A. Fisher. Our main emphasis is on books that presented the theory underlying the subject, often identified as mathematical statistics, with primary focus on books authored by G. Udny Yule, Maurice Kendall, Samuel Wilks, and Harald Cramér. We also discuss influential books on statistical methods by R. A. Fisher and George Snedecor that showed scientists how to implement the theory. We then survey textbooks published in the quarter century after World War 2, as Statistics gathered more visibility as an academic subject and Departments of Statistics were formed at many universities. We also summarize how Bayesian presentations of Statistics emerged. In each section, we describe how the books were evaluated in reviews shortly after their publications. We conclude by briefly discussing the recent past, the present, and the future of textbooks on the foundations of statistical science and include comments about this by several notable statisticians.
The emergence of Big Data raises the question of how to model economic relations when there is a large number of possible explanatory variables. We revisit the issue by comparing the possibility of using dense or sparse models in a Bayesian approach, allowing for variable selection and shrinkage. More specifically, we discuss the results reached by Giannone, Lenza and Primiceri (2020) through a “Spike-and-Slab” prior, which suggest an “illusion of sparsity” in economic data, as no clear patterns of sparsity could be detected. We make a further revision of the posterior distributions of the model, and propose three experiments to evaluate the robustness of the adopted prior distribution. We find that the pattern of sparsity is sensitive to the prior distribution of the regression coefficients, and present evidence that the model indirectly induces variable selection and shrinkage, which suggests that the “illusion of sparsity” could be, itself, an illusion. Code is available on Github ( github.com/bfava/IllusionOfIllusion).
Restring on the fact that the definition of multivariate analogs of the real gamma distribution is replaced by the Wishart distribution on symmetric matrices, and based on the notion of mixture models which is a flexible and powerful tool for treating data taken from multiple subpopulations, we set forward a multivariate analog of the real Lindley distributions of the first and second kinds on the modern framework of symmetric cones which can be used to model waiting and survival times matrix data. Within this framework, we first construct a new probability distributions, named the matrix-variate Lindley distributions. Some fundamental properties of these new distributions are established. Their statistical properties including moments, the coefficient of variation, skewness and the kurtosis are discussed. We then create an iterative hybrid Expectation-Maximization Fisher-Scoring (EM-FS) algorithm to estimate the parameters of the new class of probability distributions. Through simulation as well as comparative studies with respect to the Wishart distribution, the effectiveness and reliability of the proposed distributions are proved. Finally, the usefulness and the applicability of the new models are elaborated and illustrated by means of two real data sets from biological sciences and medical image segmentation which is one of the most important and popular tasks in medical image analysis.
In this paper, we developed a full set of Bayesian inference tools, for zero-and/or-one augmented beta rectangular regression models to analyze limited-augmented data, under a new parameterization. This parameterization: facilitates the development of both regression models and inferential tools as well as make simplifies the respective computational implementations. The proposed Bayesian tools were parameter estimation, model fit assessment, model comparison (information criteria), residual analysis and case influence diagnostics, developed through MCMC algorithms. In addition, we adapted available methods of posterior predictive checking, using appropriate discrepancy measures. We conducted several simulation studies, considering some situations of practical interest, aiming to evaluate: prior sensitivity choice, parameter recovery of the proposed model and estimation method, the impact of transforming the observed zeros and ones, along with the use of non-augmented models, and the behavior of the proposed model fit assessment and model comparison tools. A psychometric real data set was analyzed to illustrate the performance of the developed tools, illustrating the advantages of the developed analysis framework.
Increasing costs and non-response rates of probability samples have provoked the extensive use of non-probability samples. However, non-probability samples are subject to selection bias, resulting in difficulty for inference. Calibration is a popular method to reduce selection bias in non-probability samples. When rich covariate information is available, a key problem is how to select covariates and estimate parameters in calibration for non-probability samples. In this paper, the model-assisted SCAD calibration is proposed to make population inference from non-probability samples. A parametric model between the study variable and covariates is first established. SCAD is then used to estimate the model parameters based on non-probability samples. The modified forward Kullback–Leibler distance is lastly explored to conduct calibration for non-probability samples based on the estimated parametric model. The theoretical properties of the model-assisted SCAD calibration estimator are further derived. Results from simulation studies show that the model-assisted SCAD calibration estimator yields the smallest bias and mean square error compared with other estimators. Also, a real data from the 2017 Netizen Social Awareness Survey (NSAS) is used to demonstrate the proposed methodology.
Castellares, Lemonte, and Santos [Brazilian Journal of Probability and Statistics 34(1) (2020) 90–111] introduced a two-parameter discrete Nielsen distribution, derived its properties, and illustrated the advantages of the model in three data applications. In this note, we will present a corrected version for some results for the particular case .
In this work, we prove that for any dimension and any super-Brownian motion corresponding to the log-Laplace equation
is absolutely continuous with respect to Lebesgue measure at any fixed time . denotes a transition semigroup of a standard Brownian motion. Our proof is based on properties of solutions of the log-Laplace equation. We also prove that when initial datum is a finite, non-zero measure, then the log-Laplace equation has a unique, continuous solution. Moreover this solution continuously depends on initial data.
Inference over tails is usually performed by fitting an appropriate limiting distribution over observations that exceed a fixed threshold. However, the choice of such threshold is critical and can affect the inferential results. Extreme value mixture models have been defined to estimate the threshold using the full dataset and to give accurate tail estimates. Such models assume that the tail behavior is constant for all observations. However, the extreme behavior of financial returns often changes considerably in time and such changes occur by sudden shocks of the market. Here the extreme value mixture model class is extended to formally take into account distributional extreme change-points, by allowing for the presence of regime-dependent parameters modelling the tail of the distribution. This extension formally uses the full dataset to both estimate the thresholds and the extreme change-point locations, giving uncertainty measures for both quantities. Estimation of functions of interest in extreme value analyses is performed via MCMC algorithms. Our approach is evaluated through a series of simulations, applied to real data sets and assessed against competing approaches. Evidence demonstrates that the inclusion of different extreme regimes outperforms both static and dynamic competing approaches in financial applications.
In this paper, we provide results on preservation of various shifted stochastic orders under the operations of mixtures. These results have been established with respect to up (down) shifted hazard rate order, up (down) shifted reversed hazard rate order and up (down) shifted likelihood ratio order. In addition, a discussion on preservation of shifted stochastic orders under the operations of convolutions is also presented. Some examples have been obtained to illustrate applications of these results.
In this paper, we propose a new approach for modeling repeated measures restricted to the interval (or , with known real numbers). More specifically, we developed Generalized Estimating Equations, considering beta rectangular marginal distributions, which is more robust than the usual beta distribution, against the presence of extreme observations. Diagnostic tools, including local influence analysis, were also developed. A simulation study was performed, indicating that our estimation algorithm is efficient, in terms of parameter recovery. The results also suggest that our methodology outperformed the usual beta model, for the most of the considered scenarios, indicating an interesting advantage on the estimation of the dispersion parameter. Furthermore, the developed methodology was illustrated through a real data analysis, concerning to an ophthalmic study.
The joint modeling of mean and dispersion (JMMD) provides an efficient method to obtain useful models for the mean and dispersion, especially in problems of robust design experiments. However, in the literature on JMMD there are few works dedicated to variable selection and this theme is still a challenge. In this article, we propose a procedure for selecting variables in JMMD, based on hypothesis testing and the quality of the model’s fit. A criterion for checking the goodness of fit is used, in each iteration of the selection process, as a filter for choosing the terms that will be evaluated by a hypothesis test. Three types of criteria were considered for checking the quality of the model fit in our variable selection procedure. The criteria used were: the extended Akaike information criterion, the corrected Akaike information criterion and a specific criterion for the JMMD, proposed by us, a type of extended adjusted coefficient of determination. Simulation studies were carried out to verify the efficiency of our variable selection procedure. In all situations considered, the proposed procedure proved to be effective and quite satisfactory. The variable selection process was applied to a real example from an industrial experiment.