Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTOn the prediction loss of the lasso in the partially labeled settinghttps://projecteuclid.org/euclid.ejs/1539676834<strong>Pierre C. Bellec</strong>, <strong>Arnak S. Dalalyan</strong>, <strong>Edwin Grappin</strong>, <strong>Quentin Paris</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3443--3472.</p><p><strong>Abstract:</strong><br/>
In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.
</p>projecteuclid.org/euclid.ejs/1539676834_20181016040106Tue, 16 Oct 2018 04:01 EDTNoise contrastive estimation: Asymptotic properties, formal comparison with MC-MLEhttps://projecteuclid.org/euclid.ejs/1539741651<strong>Lionel Riou-Durand</strong>, <strong>Nicolas Chopin</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3473--3518.</p><p><strong>Abstract:</strong><br/>
A statistical model is said to be un-normalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MC-MLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial data-points to approximate the normalising constant. While the asymptotics of MC-MLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MC-MLE under several asymptotic regimes. In particular, we show that, when $m\rightarrow \infty $ while $n$ is fixed ($m$ and $n$ being respectively the number of artificial data-points, and actual data-points), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial data-points are IID, and when $n\rightarrow \infty $ while $m/n$ converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MC-MLE estimator. We illustrate the variance reduction brought by NCE through a numerical study.
</p>projecteuclid.org/euclid.ejs/1539741651_20181016220131Tue, 16 Oct 2018 22:01 EDTA general family of trimmed estimators for robust high-dimensional data analysishttps://projecteuclid.org/euclid.ejs/1540195547<strong>Eunho Yang</strong>, <strong>Aurélie C. Lozano</strong>, <strong>Aleksandr Aravkin</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3519--3553.</p><p><strong>Abstract:</strong><br/>
We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using convex and non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.
</p>projecteuclid.org/euclid.ejs/1540195547_20181022040627Mon, 22 Oct 2018 04:06 EDTP-splines with an $\ell_{1}$ penalty for repeated measureshttps://projecteuclid.org/euclid.ejs/1540951342<strong>Brian D. Segal</strong>, <strong>Michael R. Elliott</strong>, <strong>Thomas Braun</strong>, <strong>Hui Jiang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3554--3600.</p><p><strong>Abstract:</strong><br/>
P-splines are penalized B-splines, in which finite order differences in coefficients are typically penalized with an $\ell_{2}$ norm. P-splines can be used for semiparametric regression and can include random effects to account for within-subject correlations. In addition to $\ell_{2}$ penalties, $\ell_{1}$-type penalties have been used in nonparametric and semiparametric regression to achieve greater flexibility, such as in locally adaptive regression splines, $\ell_{1}$ trend filtering, and the fused lasso additive model. However, there has been less focus on using $\ell_{1}$ penalties in P-splines, particularly for estimating conditional means.
In this paper, we demonstrate the potential benefits of using an $\ell_{1}$ penalty in P-splines with an emphasis on fitting non-smooth functions. We propose an estimation procedure using the alternating direction method of multipliers and cross validation, and provide degrees of freedom and approximate confidence bands based on a ridge approximation to the $\ell_{1}$ penalized fit. We also demonstrate potential uses through simulations and an application to electrodermal activity data collected as part of a stress study.
</p>projecteuclid.org/euclid.ejs/1540951342_20181030220249Tue, 30 Oct 2018 22:02 EDTD-learning to estimate optimal individual treatment ruleshttps://projecteuclid.org/euclid.ejs/1540951343<strong>Zhengling Qi</strong>, <strong>Yufeng Liu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3601--3638.</p><p><strong>Abstract:</strong><br/>
Recent exploration of the optimal individual treatment rule (ITR) for patients has attracted a lot of attentions due to the potential heterogeneous response of patients to different treatments. An optimal ITR is a decision function based on patients’ characteristics for the treatment that maximizes the expected clinical outcome. Current literature mainly focuses on two types of methods, model-based and classification-based methods. Model-based methods rely on the estimation of conditional mean of outcome instead of directly targeting decision boundaries for the optimal ITR. As a result, they may yield suboptimal decisions. In contrast, although classification based methods directly target the optimal ITR by converting the problem into weighted classification, these methods rely on using correct weights for all subjects, which may cause model misspecification. To overcome the potential drawbacks of these methods, we propose a simple and flexible one-step method to directly learn (D-learning) the optimal ITR without model and weight specifications. Multi-category D-learning is also proposed for the case with multiple treatments. A new effect measure is proposed to quantify the relative strength of an treatment for a patient. We show estimation consistency and establish tight finite sample error bounds for the proposed D-learning. Numerical studies including simulated and real data examples are used to demonstrate the competitive performance of D-learning.
</p>projecteuclid.org/euclid.ejs/1540951343_20181030220249Tue, 30 Oct 2018 22:02 EDTCorrelation structure, quadratic variations and parameter estimation for the solution to the wave equation with fractional noisehttps://projecteuclid.org/euclid.ejs/1540951344<strong>Marwa Khalil</strong>, <strong>Ciprian A. Tudor</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3639--3672.</p><p><strong>Abstract:</strong><br/>
We compute the covariance function of the solution to the linear stochastic wave equation with fractional noise in time and white noise in space. We apply our findings to analyze the correlation structure of this Gaussian process and to study the asymptotic behavior in distribution of its spatial quadratic variation. As an application, we construct a consistent estimator for the Hurst parameter.
</p>projecteuclid.org/euclid.ejs/1540951344_20181030220249Tue, 30 Oct 2018 22:02 EDTOn principal components regression, random projections, and column subsamplinghttps://projecteuclid.org/euclid.ejs/1540951345<strong>Martin Slawski</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3673--3712.</p><p><strong>Abstract:</strong><br/>
Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of Johnson-Lindenstrauss transforms. We provide numerical results based on synthetic and real data as well as basic theory revealing differences and commonalities in terms of statistical performance.
</p>projecteuclid.org/euclid.ejs/1540951345_20181030220249Tue, 30 Oct 2018 22:02 EDTMinimax Euclidean separation rates for testing convex hypotheses in $\mathbb{R}^{d}$https://projecteuclid.org/euclid.ejs/1541559861<strong>Gilles Blanchard</strong>, <strong>Alexandra Carpentier</strong>, <strong>Maurilio Gutzeit</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3713--3735.</p><p><strong>Abstract:</strong><br/>
We consider composite-composite testing problems for the expectation in the Gaussian sequence model where the null hypothesis corresponds to a closed convex subset $\mathcal{C}$ of $\mathbb{R}^{d}$. We adopt a minimax point of view and our primary objective is to describe the smallest Euclidean distance between the null and alternative hypotheses such that there is a test with small total error probability. In particular, we focus on the dependence of this distance on the dimension $d$ and variance $\frac{1}{n}$ giving rise to the minimax separation rate. In this paper we discuss lower and upper bounds on this rate for different smooth and non-smooth choices for $\mathcal{C}$.
</p>projecteuclid.org/euclid.ejs/1541559861_20181106220448Tue, 06 Nov 2018 22:04 ESTOn the post selection inference constant under restricted isometry propertieshttps://projecteuclid.org/euclid.ejs/1542682881<strong>François Bachoc</strong>, <strong>Gilles Blanchard</strong>, <strong>Pierre Neuvial</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3736--3757.</p><p><strong>Abstract:</strong><br/>
Uniformly valid confidence intervals post model selection in regression can be constructed based on Post-Selection Inference (PoSI) constants. PoSI constants are minimal for orthogonal design matrices, and can be upper bounded in function of the sparsity of the set of models under consideration, for generic design matrices.
In order to improve on these generic sparse upper bounds, we consider design matrices satisfying a Restricted Isometry Property (RIP) condition. We provide a new upper bound on the PoSI constant in this setting. This upper bound is an explicit function of the RIP constant of the design matrix, thereby giving an interpolation between the orthogonal setting and the generic sparse setting. We show that this upper bound is asymptotically optimal in many settings by constructing a matching lower bound.
</p>projecteuclid.org/euclid.ejs/1542682881_20181119220146Mon, 19 Nov 2018 22:01 ESTForecast dominance testing via sign randomizationhttps://projecteuclid.org/euclid.ejs/1543374044<strong>Werner Ehm</strong>, <strong>Fabian Krüger</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3758--3793.</p><p><strong>Abstract:</strong><br/>
We propose randomization tests of whether forecast 1 outperforms forecast 2 across a class of scoring functions. This hypothesis is of applied interest: While the prediction context often prescribes a certain class of scoring functions, it is typically hard to motivate a specific choice on statistical or substantive grounds. We investigate the asymptotic behavior of the test statistics under mild conditions, avoiding the need to assume particular dynamic properties of forecasts and realizations. The properties of the one-sided tests depend on a corresponding version of Anderson’s inequality, which we state as a conjecture of independent interest. Numerical experiments and a data example indicate that the tests have good size and power properties in practically relevant situations.
</p>projecteuclid.org/euclid.ejs/1543374044_20181127220116Tue, 27 Nov 2018 22:01 ESTAssessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous datahttps://projecteuclid.org/euclid.ejs/1543568429<strong>Andreas Anastasiou</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3794--3828.</p><p><strong>Abstract:</strong><br/>
The asymptotic normality of the maximum likelihood estimator (MLE) under regularity conditions is a cornerstone of statistical theory. In this paper, we give explicit upper bounds on the distributional distance between the distribution of the MLE of a vector parameter, and the multivariate normal distribution. We work with possibly high-dimensional, independent but not necessarily identically distributed random vectors. In addition, we obtain upper bounds in cases where the MLE cannot be expressed analytically.
</p>projecteuclid.org/euclid.ejs/1543568429_20181130040112Fri, 30 Nov 2018 04:01 ESTMeasuring distributional asymmetry with Wasserstein distance and Rademacher symmetrizationhttps://projecteuclid.org/euclid.ejs/1531468822<strong>Adam B. Kashlak</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2091--2113.</p><p><strong>Abstract:</strong><br/>
We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the asymmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequality. Lastly, a wide range of applications are detailed including testing for data symmetry, constructing nonasymptotic high dimensional confidence sets, bounding the variance of an empirical process, and improving constants in Nemirovski style inequalities for Banach space valued random variables.
</p>projecteuclid.org/euclid.ejs/1531468822_20181203220324Mon, 03 Dec 2018 22:03 ESTGaussian process bandits with adaptive discretizationhttps://projecteuclid.org/euclid.ejs/1543892564<strong>Shubhanshu Shekhar</strong>, <strong>Tara Javidi</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3829--3874.</p><p><strong>Abstract:</strong><br/>
In this paper, the problem of maximizing a black-box function $f:\mathcal{X}\to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally, an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.
</p>projecteuclid.org/euclid.ejs/1543892564_20181203220324Mon, 03 Dec 2018 22:03 ESTPrincipal quantile regression for sufficient dimension reduction with heteroscedasticityhttps://projecteuclid.org/euclid.ejs/1531468823<strong>Chong Wang</strong>, <strong>Seung Jun Shin</strong>, <strong>Yichao Wu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2114--2140.</p><p><strong>Abstract:</strong><br/>
Sufficient dimension reduction (SDR) is a successful tool for reducing data dimensionality without stringent model assumptions. In practice, data often display heteroscedasticity which is of scientific importance in general but frequently overlooked since a primal goal of most existing statistical methods is to identify conditional mean relationship among variables. In this article, we propose a new SDR method called principal quantile regression (PQR) that efficiently tackles heteroscedasticity. PQR can naturally be extended to a nonlinear version via kernel trick. Asymptotic properties are established and an efficient solution path-based algorithm is provided. Numerical examples based on both simulated and real data demonstrate the PQR’s advantageous performance over existing SDR methods. PQR still performs very competitively even for the case without heteroscedasticity.
</p>projecteuclid.org/euclid.ejs/1531468823_20181204220431Tue, 04 Dec 2018 22:04 ESTFast learning rate of non-sparse multiple kernel learning and optimal regularization strategieshttps://projecteuclid.org/euclid.ejs/1531468825<strong>Taiji Suzuki</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2141--2192.</p><p><strong>Abstract:</strong><br/>
In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including $\ell_{p}$-MKL. According to the numerical experiments, it is known that the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy ($\ell_{1}$ or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, $\ell_{1}$-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse $\ell_{1}$-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.
</p>projecteuclid.org/euclid.ejs/1531468825_20181204220431Tue, 04 Dec 2018 22:04 ESTGeneralized subsampling procedure for non-stationary time serieshttps://projecteuclid.org/euclid.ejs/1543979029<strong>Łukasz Lenart</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3875--3907.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a generalization of the subsampling procedure for non-stationary time series. The proposed generalization is simply related to the usual subsampling procedure. We formulate the sufficient conditions for the consistency of such a generalization. These sufficient conditions are a generalization of those presented for the usual subsampling procedure for non-stationary time series. Finally, we demonstrate the consistency of the generalized subsampling procedure for the Fourier coefficient in mean expansion of Almost Periodically Correlated time series.
</p>projecteuclid.org/euclid.ejs/1543979029_20181204220431Tue, 04 Dec 2018 22:04 ESTHeterogeneity adjustment with applications to graphical model inferencehttps://projecteuclid.org/euclid.ejs/1543979030<strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Weichen Wang</strong>, <strong>Ziwei Zhu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3908--3952.</p><p><strong>Abstract:</strong><br/>
Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for A daptive L ow-rank P rincipal H eterogeneity A djustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the ‘Bless of Dimensionality’. As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.
</p>projecteuclid.org/euclid.ejs/1543979030_20181204220431Tue, 04 Dec 2018 22:04 ESTModel-free envelope dimension selectionhttps://projecteuclid.org/euclid.ejs/1531814505<strong>Xin Zhang</strong>, <strong>Qing Mai</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2193--2216.</p><p><strong>Abstract:</strong><br/>
An envelope is a targeted dimension reduction subspace for simultaneously achieving dimension reduction and improving parameter estimation efficiency. While many envelope methods have been proposed in recent years, all envelope methods hinge on the knowledge of a key hyperparameter, the structural dimension of the envelope. How to estimate the envelope dimension consistently is of substantial interest from both theoretical and practical aspects. Moreover, very recent advances in the literature have generalized envelope as a model-free method, which makes selecting the envelope dimension even more challenging. Likelihood-based approaches such as information criteria and likelihood-ratio tests either cannot be directly applied or have no theoretical justification. To address this critical issue of dimension selection, we propose two unified approaches – called FG and 1D selections – for determining the envelope dimension that can be applied to any envelope models and methods. The two model-free selection approaches are based on the two different envelope optimization procedures: the full Grassmannian (FG) optimization and the 1D algorithm [11], and are shown to be capable of correctly identifying the structural dimension with a probability tending to 1 under mild moment conditions as the sample size increases. While the FG selection unifies and generalizes the BIC and modified BIC approaches that existing in the literature, and hence provides the theoretical justification of them under weak moment condition and model-free context, the 1D selection is computationally more stable and efficient in finite sample. Extensive simulations and a real data analysis demonstrate the superb performance of our proposals.
</p>projecteuclid.org/euclid.ejs/1531814505_20181207220235Fri, 07 Dec 2018 22:02 ESTPrediction of dynamical time series using kernel based regression and smooth splineshttps://projecteuclid.org/euclid.ejs/1532333003<strong>Raymundo Navarrete</strong>, <strong>Divakar Viswanath</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2217--2237.</p><p><strong>Abstract:</strong><br/>
Prediction of dynamical time series with additive noise using support vector machines or kernel based regression is consistent for certain classes of discrete dynamical systems. Consistency implies that these methods are effective at computing the expected value of a point at a future time given the present coordinates. However, the present coordinates themselves are noisy, and therefore, these methods are not necessarily effective at removing noise. In this article, we consider denoising and prediction as separate problems for flows, as opposed to discrete time dynamical systems, and show that the use of smooth splines is more effective at removing noise. Combination of smooth splines and kernel based regression yields predictors that are more accurate on benchmarks typically by a factor of 2 or more. We prove that kernel based regression in combination with smooth splines converges to the exact predictor for time series extracted from any compact invariant set of any sufficiently smooth flow. As a consequence of convergence, one can find examples where the combination of kernel based regression with smooth splines is superior by even a factor of $100$. The predictors that we analyze and compute operate on delay coordinate data and not the full state vector, which is typically not observable.
</p>projecteuclid.org/euclid.ejs/1532333003_20181207220235Fri, 07 Dec 2018 22:02 ESTConfidence intervals for linear unbiased estimators under constrained dependencehttps://projecteuclid.org/euclid.ejs/1532333004<strong>Peter M. Aronow</strong>, <strong>Forrest W. Crawford</strong>, <strong>José R. Zubizarreta</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2238--2252.</p><p><strong>Abstract:</strong><br/>
We propose an approach for conducting inference for linear unbiased estimators applied to dependent outcomes given constraints on their independence relations, in the form of a dependency graph. We establish the consistency of an oracle variance estimator when a dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when a dependency graph is unknown, but topological or degree-based constraints are available on one such graph. We develop alternative bounds, including a closed-form bound, under an additional homoskedasticity assumption. We establish a basis for Wald-type confidence intervals that are guaranteed to have asymptotically conservative coverage.
</p>projecteuclid.org/euclid.ejs/1532333004_20181207220235Fri, 07 Dec 2018 22:02 ESTUpper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real linehttps://projecteuclid.org/euclid.ejs/1532333005<strong>Jérémie Bigot</strong>, <strong>Raúl Gouet</strong>, <strong>Thierry Klein</strong>, <strong>Alfredo López</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2253--2289.</p><p><strong>Abstract:</strong><br/>
This paper is focused on the statistical analysis of probability measures $\boldsymbol{\nu }_{1},\ldots ,\boldsymbol{\nu }_{n}$ on ${\mathbb{R}}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $\boldsymbol{\nu }_{i}$ are absolutely continuous with densities $\boldsymbol{f}_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1\leq i\leq n;\;1\leq j\leq p_{i}}$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots ,X_{i,p_{i}}$ are iid observations sampled from a random measure $\boldsymbol{\nu }_{i}$ for each $1\leq i\leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1\leq i\leq n}p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
</p>projecteuclid.org/euclid.ejs/1532333005_20181207220235Fri, 07 Dec 2018 22:02 ESTEmpirical Bayes analysis of spike and slab posterior distributionshttps://projecteuclid.org/euclid.ejs/1544238109<strong>Ismaël Castillo</strong>, <strong>Romain Mismer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3953--4001.</p><p><strong>Abstract:</strong><br/>
In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared–$L^{2}$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distributions. Possible choices include standard spike and slab with heavy tailed slab, and the spike and slab LASSO of Ročková and George with heavy tailed slab. Surprisingly, the popular Laplace slab is shown to lead to a suboptimal rate for the empirical Bayes posterior itself. This provides a striking example where convergence of aspects of the empirical Bayes posterior such as the posterior mean or median does not entail convergence of the complete empirical Bayes posterior itself.
</p>projecteuclid.org/euclid.ejs/1544238109_20181207220235Fri, 07 Dec 2018 22:02 ESTExchangeable trait allocationshttps://projecteuclid.org/euclid.ejs/1532484331<strong>Trevor Campbell</strong>, <strong>Diana Cai</strong>, <strong>Tamara Broderick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2290--2322.</p><p><strong>Abstract:</strong><br/>
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering—a special case of trait allocation—exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence “dust” groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.
</p>projecteuclid.org/euclid.ejs/1532484331_20181211040110Tue, 11 Dec 2018 04:01 ESTNon-parametric estimation of time varying AR(1)–processes with local stationarity and periodicityhttps://projecteuclid.org/euclid.ejs/1532484332<strong>Jean-Marc Bardet</strong>, <strong>Paul Doukhan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2323--2354.</p><p><strong>Abstract:</strong><br/>
Extending the ideas of [7], this paper aims at providing a kernel based non-parametric estimation of a new class of time varying AR(1) processes $(X_{t})$, with local stationarity and periodic features (with a known period $T$), inducing the definition $X_{t}=a_{t}(t/nT)X_{t-1}+\xi_{t}$ for $t\in \mathbb{N}$ and with $a_{t+T}\equiv a_{t}$. Central limit theorems are established for kernel estimators $\widehat{a}_{s}(u)$ reaching classical minimax rates and only requiring low order moment conditions of the white noise $(\xi_{t})_{t}$ up to the second order.
</p>projecteuclid.org/euclid.ejs/1532484332_20181211040110Tue, 11 Dec 2018 04:01 ESTScalable methods for Bayesian selective inferencehttps://projecteuclid.org/euclid.ejs/1532484333<strong>Snigdha Panigrahi</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2355--2400.</p><p><strong>Abstract:</strong><br/>
Modeled along the truncated approach in [20], selection-adjusted inference in a Bayesian regime is based on a selective posterior . Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in the use of a truncated likelihood. The normalizer of the truncated law in the adjusted framework is the probability of the selection event; this typically lacks a closed form expression leading to the computational bottleneck in sampling from such a posterior. The current work provides an optimization problem that approximates the otherwise intractable selective posterior and leads to scalable methods that give valid post-selective Bayesian inference. The selection procedures are posed as data-queries that solve a randomized version of a convex learning program which have the advantage of preserving more left-over information for inference.
We propose a randomization scheme under which the approximating optimization has separable constraints that result in a partially separable objective in lower dimensions for many commonly used selective queries. We show that the proposed optimization gives a valid exponential rate of decay for the selection probability on a large deviation scale under a Gaussian randomization scheme. On the implementation side, we offer a primal-dual method to solve the optimization problem leading to an approximate posterior; this allows us to exploit the usual merits of a Bayesian machinery in both low and high dimensional regimes when the underlying signal is effectively sparse. We show that the adjusted estimates empirically demonstrate better frequentist properties in comparison to the unadjusted estimates based on the usual posterior, when applied to a wide range of constrained, convex data queries.
</p>projecteuclid.org/euclid.ejs/1532484333_20181211040110Tue, 11 Dec 2018 04:01 ESTAsymptotic minimum scoring rule predictionhttps://projecteuclid.org/euclid.ejs/1532484334<strong>Federica Giummolè</strong>, <strong>Valentina Mameli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2401--2429.</p><p><strong>Abstract:</strong><br/>
Most of the methods nowadays employed in forecast problems are based on scoring rules. There is a divergence function associated to each scoring rule, that can be used as a measure of discrepancy between probability distributions. This approach is commonly used in the literature for comparing two competing predictive distributions on the basis of their relative expected divergence from the true distribution.
In this paper we focus on the use of scoring rules as a tool for finding predictive distributions for an unknown of interest. The proposed predictive distributions are asymptotic modifications of the estimative solutions, obtained by minimizing the expected divergence related to a general scoring rule.
The asymptotic properties of such predictive distributions are strictly related to the geometry induced by the considered divergence on a regular parametric model. In particular, the existence of a global optimal predictive distribution is guaranteed for invariant divergences, whose local behaviour is similar to well known $\alpha $-divergences.
We show that a wide class of divergences obtained from weighted scoring rules share invariance properties with $\alpha $-divergences. For weighted scoring rules it is thus possible to obtain a global solution to the prediction problem. Unfortunately, the divergences associated to many widely used scoring rules are not invariant. Still for these cases we provide a locally optimal predictive distribution, within a specified parametric model.
</p>projecteuclid.org/euclid.ejs/1532484334_20181211040110Tue, 11 Dec 2018 04:01 ESTOn the role of the overall effect in exponential familieshttps://projecteuclid.org/euclid.ejs/1532484335<strong>Anna Klimova</strong>, <strong>Tamás Rudas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2430--2453.</p><p><strong>Abstract:</strong><br/>
Exponential families of discrete probability distributions when the normalizing constant (or overall effect) is added or removed are compared in this paper. The latter setup, in which the exponential family is curved, is particularly relevant when the sample space is an incomplete Cartesian product or when it is very large, so that the computational burden is significant. The lack or presence of the overall effect has a fundamental impact on the properties of the exponential family. When the overall effect is added, the family becomes the smallest regular exponential family containing the curved one. The procedure is related to the homogenization of an inhomogeneous variety discussed in algebraic geometry, of which a statistical interpretation is given as an augmentation of the sample space. The changes in the kernel basis representation when the overall effect is included or removed are derived. The geometry of maximum likelihood estimates, also allowing zero observed frequencies, is described with and without the overall effect, and various algorithms are compared. The importance of the results is illustrated by an example from cell biology, showing that routinely including the overall effect leads to estimates which are not in the model intended by the researchers.
</p>projecteuclid.org/euclid.ejs/1532484335_20181211040110Tue, 11 Dec 2018 04:01 ESTBayes linear analysis of risks in sequential optimal design problemshttps://projecteuclid.org/euclid.ejs/1544518835<strong>Matthew Jones</strong>, <strong>Michael Goldstein</strong>, <strong>Philip Jonathan</strong>, <strong>David Randell</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4002--4031.</p><p><strong>Abstract:</strong><br/>
In a statistical or physical model, it is often the case that a set of design inputs must be selected in order to perform an experiment to collect data with which to update beliefs about a set of model parameters; frequently, the model also depends on a set of external variables which are unknown before the experiment is carried out, but which cannot be controlled. Sequential optimal design problems are concerned with selecting these design inputs in stages (at different points in time), such that the chosen design is optimal with respect to the set of possible outcomes of all future experiments which might be carried out. Such problems are computationally expensive.
We consider the calculations which must be performed in order to solve a sequential design problem, and we propose a framework using Bayes linear emulators to approximate all difficult calculations which arise; these emulators are designed so that we can easily approximate expectations of the risk by integrating the emulator directly, and so that we can efficiently search the design input space for settings which may be optimal. We also consider how the structure of the design calculation can be exploited to improve the quality of the fitted emulators. Our framework is demonstrated through application to a simple linear modelling problem, and to a more complex airborne sensing problem, in which a sequence of aircraft flight paths must be designed so as to collect data which are informative for the locations of ground-based gas sources.
</p>projecteuclid.org/euclid.ejs/1544518835_20181211040110Tue, 11 Dec 2018 04:01 ESTCategorizing a continuous predictor subject to measurement errorhttps://projecteuclid.org/euclid.ejs/1544518836<strong>Betsabé G. Blas Achic</strong>, <strong>Tianying Wang</strong>, <strong>Ya Su</strong>, <strong>Victor Kipnis</strong>, <strong>Kevin Dodd</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4032--4056.</p><p><strong>Abstract:</strong><br/>
Epidemiologists often categorize a continuous risk predictor, even when the true risk model is not a categorical one. Nonetheless, such categorization is thought to be more robust and interpretable, and thus their goal is to fit the categorical model and interpret the categorical parameters. We address the question: with measurement error and categorization, how can we do what epidemiologists want, namely to estimate the parameters of the categorical model that would have been estimated if the true predictor was observed? We develop a general methodology for such an analysis, and illustrate it in linear and logistic regression. Simulation studies are presented and the methodology is applied to a nutrition data set. Discussion of alternative approaches is also included.
</p>projecteuclid.org/euclid.ejs/1544518836_20181211040110Tue, 11 Dec 2018 04:01 ESTA new design strategy for hypothesis testing under response adaptive randomizationhttps://projecteuclid.org/euclid.ejs/1532484336<strong>Alessandro Baldi Antognini</strong>, <strong>Alessandro Vagheggini</strong>, <strong>Maroussa Zagoraiou</strong>, <strong>Marco Novelli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2454--2481.</p><p><strong>Abstract:</strong><br/>
The aim of this paper is to provide a new design strategy for response adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more efficient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed approach.
</p>projecteuclid.org/euclid.ejs/1532484336_20181211220602Tue, 11 Dec 2018 22:06 ESTWasserstein and total variation distance between marginals of Lévy processeshttps://projecteuclid.org/euclid.ejs/1532657104<strong>Ester Mariucci</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2482--2514.</p><p><strong>Abstract:</strong><br/>
We present upper bounds for the Wasserstein distance of order $p$ between the marginals of Lévy processes, including Gaussian approximations for jumps of infinite activity. Using the convolution structure, we further derive upper bounds for the total variation distance between the marginals of Lévy processes. Connections to other metrics like Zolotarev and Toscani-Fourier distances are established. The theory is illustrated by concrete examples and an application to statistical lower bounds.
</p>projecteuclid.org/euclid.ejs/1532657104_20181211220602Tue, 11 Dec 2018 22:06 ESTA noninformative Bayesian approach for selecting a good post-stratificationhttps://projecteuclid.org/euclid.ejs/1532678418<strong>Patrick Zimmerman</strong>, <strong>Glen Meeden</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2515--2536.</p><p><strong>Abstract:</strong><br/>
In the standard design approach to survey sampling prior information is often used to stratify the population of interest. A good choice of the strata can yield significant improvement in the resulting estimator. However, if there are several possible ways to stratify the population, it might not be clear which is best. Here we assume that before the sample is taken a limited number of possible stratifications have been defined. We will propose an objective Bayesian approach that allows one to consider these several different possible stratifications simultaneously. Given the sample the posterior distribution will assign more weight to the good stratifications and less to the others. Empirical results suggest that the resulting estimator will typically be almost as good as the estimator based on the best stratification and better than the estimator which does not use stratification. It will also have a sensible estimate of precision.
</p>projecteuclid.org/euclid.ejs/1532678418_20181211220602Tue, 11 Dec 2018 22:06 ESTOn kernel methods for covariates that are rankingshttps://projecteuclid.org/euclid.ejs/1534233701<strong>Horia Mania</strong>, <strong>Aaditya Ramdas</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>, <strong>Benjamin Recht</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2537--2577.</p><p><strong>Abstract:</strong><br/>
Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way when numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. We characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that the Mallows kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.
</p>projecteuclid.org/euclid.ejs/1534233701_20181211220602Tue, 11 Dec 2018 22:06 ESTRelevant change points in high dimensional time serieshttps://projecteuclid.org/euclid.ejs/1535681028<strong>Holger Dette</strong>, <strong>Josua Gösmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2578--2636.</p><p><strong>Abstract:</strong><br/>
This paper investigates the problem of detecting relevant change points in the mean vector, say $\mu_{t}=(\mu_{t,1},\ldots ,\mu_{t,d})^{T}$ of a high dimensional time series $(Z_{t})_{t\in \mathbb{Z}}$. While the recent literature on testing for change points in this context considers hypotheses for the equality of the means $\mu_{h}^{(1)}$ and $\mu_{h}^{(2)}$ before and after the change points in the different components, we are interested in a null hypothesis of the form \begin{equation*}H_{0}:|\mu^{(1)}_{h}-\mu^{(2)}_{h}|\leq \Delta_{h}~~~\mbox{ forall }~~h=1,\ldots ,d\end{equation*} where $\Delta_{1},\ldots ,\Delta_{d}$ are given thresholds for which a smaller difference of the means in the $h$-th component is considered to be non-relevant. This formulation of the testing problem is motivated by the fact that in many applications a modification of the statistical analysis might not be necessary, if the differences between the parameters before and after the change points in the individual components are small. This problem is of particular relevance in high dimensional change point analysis, where a small change in only one component can yield a rejection by the classical procedure although all components change only in a non-relevant way.
We propose a new test for this problem based on the maximum of squared and integrated CUSUM statistics and investigate its properties as the sample size $n$ and the dimension $d$ both converge to infinity. In particular, using Gaussian approximations for the maximum of a large number of dependent random variables, we show that on certain points of the boundary of the null hypothesis a standardized version of the maximum converges weakly to a Gumbel distribution. This result is used to construct a consistent asymptotic level $\alpha $ test and a multiplier bootstrap procedure is proposed, which improves the finite sample performance of the test. The finite sample properties of the test are investigated by means of a simulation study and we also illustrate the new approach investigating data from hydrology.
</p>projecteuclid.org/euclid.ejs/1535681028_20181211220602Tue, 11 Dec 2018 22:06 ESTOn inference validity of weighted U-statistics under data heterogeneityhttps://projecteuclid.org/euclid.ejs/1535681029<strong>Fang Han</strong>, <strong>Tianchen Qian</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2637--2708.</p><p><strong>Abstract:</strong><br/>
Motivated by challenges on studying a new correlation measurement being popularized in evaluating online ranking algorithms’ performance, this manuscript explores the validity of uncertainty assessment for weighted U-statistics. Without any commonly adopted assumption, we verify Efron’s bootstrap and a new resampling procedure’s inference validity. Specifically, in its full generality, our theory allows both kernels and weights asymmetric and data points not identically distributed, which are all new issues that historically have not been addressed. For achieving strict generalization, for example, we have to carefully control the order of the “degenerate” term in U-statistics which are no longer degenerate under the empirical measure for non-i.i.d. data. Our result applies to the motivating task, giving the region at which solid statistical inference can be made.
</p>projecteuclid.org/euclid.ejs/1535681029_20181211220602Tue, 11 Dec 2018 22:06 ESTOn the dimension effect of regularized linear discriminant analysishttps://projecteuclid.org/euclid.ejs/1536976838<strong>Cheng Wang</strong>, <strong>Binyan Jiang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2709--2742.</p><p><strong>Abstract:</strong><br/>
This paper studies the dimension effect of the linear discriminant analysis (LDA) and the regularized linear discriminant analysis (RLDA) classifiers for large dimensional data where the observation dimension $p$ is of the same order as the sample size $n$. More specifically, built on properties of the Wishart distribution and recent results in random matrix theory, we derive explicit expressions for the asymptotic misclassification errors of LDA and RLDA respectively, from which we gain insights of how dimension affects the performance of classification and in what sense. Motivated by these results, we propose adjusted classifiers by correcting the bias brought by the unequal sample sizes. The bias-corrected LDA and RLDA classifiers are shown to have smaller misclassification rates than LDA and RLDA respectively. Several interesting examples are discussed in detail and the theoretical results on dimension effect are illustrated via extensive simulation studies.
</p>projecteuclid.org/euclid.ejs/1536976838_20181211220602Tue, 11 Dec 2018 22:06 ESTEstimation of conditional extreme risk measures from heavy-tailed elliptical random vectorshttps://projecteuclid.org/euclid.ejs/1544583931<strong>Antoine Usseglio-Carleve</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4057--4093.</p><p><strong>Abstract:</strong><br/>
In this work, we focus on some conditional extreme risk measures estimation for elliptical random vectors. In a previous paper, we proposed a methodology to approximate extreme quantiles, based on two extremal parameters. We thus propose some estimators for these parameters, and study their consistency and asymptotic normality in the case of heavy-tailed distributions. Thereafter, from these parameters, we construct extreme conditional quantiles estimators, and give some conditions that ensure consistency and asymptotic normality. Using recent results on the asymptotic relationship between quantiles and other risk measures, we deduce estimators for extreme conditional $L_{p}-$quantiles and Haezendonck-Goovaerts risk measures. Under similar conditions, consistency and asymptotic normality are provided. In order to test the effectiveness of our estimators, we propose a simulation study. A financial data example is also proposed.
</p>projecteuclid.org/euclid.ejs/1544583931_20181211220602Tue, 11 Dec 2018 22:06 ESTInference for high-dimensional split-plot-designs: A unified approach for small to large numbers of factor levelshttps://projecteuclid.org/euclid.ejs/1536976839<strong>Paavo Sattler</strong>, <strong>Markus Pauly</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2743--2805.</p><p><strong>Abstract:</strong><br/>
Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, diverse types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$, subjects. In this paper, we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles.
The test procedures are based on standardized quadratic forms involving suitably symmetrized U-statistics-type estimators which are robust against an increasing number of dimensions $d$ and/or groups $a$. We then discuss their limit distributions in a general asymptotic framework and additionally propose improved small sample approximations. Finally, the small sample performance is investigated in simulations and applicability is illustrated by a real data analysis.
</p>projecteuclid.org/euclid.ejs/1536976839_20181212220446Wed, 12 Dec 2018 22:04 ESTMass volume curves and anomaly rankinghttps://projecteuclid.org/euclid.ejs/1537257627<strong>Stephan Clémençon</strong>, <strong>Albert Thomas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2806--2872.</p><p><strong>Abstract:</strong><br/>
This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more ‘abnormal’ as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued ‘scoring’ function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density almost everywhere on the support of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controlling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve.
</p>projecteuclid.org/euclid.ejs/1537257627_20181212220446Wed, 12 Dec 2018 22:04 ESTGoodness-of-fit tests for complete spatial randomness based on Minkowski functionals of binary imageshttps://projecteuclid.org/euclid.ejs/1537257628<strong>Bruno Ebner</strong>, <strong>Norbert Henze</strong>, <strong>Michael A. Klatt</strong>, <strong>Klaus Mecke</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2873--2904.</p><p><strong>Abstract:</strong><br/>
We propose a class of goodness-of-fit tests for complete spatial randomness (CSR). In contrast to standard tests, our procedure utilizes a transformation of the data to a binary image, which is then characterized by geometric functionals. Under a suitable limiting regime, we derive the asymptotic distribution of the test statistics under the null hypothesis and almost sure limits under certain alternatives. The new tests are computationally efficient, and simulations show that they are strong competitors to other tests of CSR. The tests are applied to a real data set in gamma-ray astronomy, and immediate extensions are presented to encourage further work.
</p>projecteuclid.org/euclid.ejs/1537257628_20181212220446Wed, 12 Dec 2018 22:04 ESTPower-law partial correlation network modelshttps://projecteuclid.org/euclid.ejs/1537257629<strong>Matteo Barigozzi</strong>, <strong>Christian Brownlees</strong>, <strong>Gábor Lugosi</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2905--2929.</p><p><strong>Abstract:</strong><br/>
We introduce a class of partial correlation network models whose network structure is determined by a random graph. In particular in this work we focus on a version of the model in which the random graph has a power-law degree distribution. A number of cross-sectional dependence properties of this class of models are derived. The main result we establish is that when the random graph is power-law, the system exhibits a high degree of collinearity. More precisely, the largest eigenvalues of the inverse covariance matrix converge to an affine function of the degrees of the most interconnected vertices in the network. The result implies that the largest eigenvalues of the inverse covariance matrix are approximately power-law distributed, and that, as the system dimension increases, the eigenvalues diverge. As an empirical illustration we analyse two panels of stock returns of companies listed in the S&P 500 and S&P 1500 and show that the covariance matrices of returns exhibits empirical features that are consistent with our power-law model.
</p>projecteuclid.org/euclid.ejs/1537257629_20181212220446Wed, 12 Dec 2018 22:04 ESTOnline natural gradient as a Kalman filterhttps://projecteuclid.org/euclid.ejs/1537257630<strong>Yann Ollivier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2930--2961.</p><p><strong>Abstract:</strong><br/>
We cast Amari’s natural gradient in statistical learning as a specific case of Kalman filtering. Namely, applying an extended Kalman filter to estimate a fixed unknown parameter of a probabilistic model from a series of observations, is rigorously equivalent to estimating this parameter via an online stochastic natural gradient descent on the log-likelihood of the observations.
In the i.i.d. case, this relation is a consequence of the “information filter” phrasing of the extended Kalman filter. In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models.
This exact algebraic correspondence provides relevant interpretations for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix.
</p>projecteuclid.org/euclid.ejs/1537257630_20181212220446Wed, 12 Dec 2018 22:04 ESTMaximum empirical likelihood estimation and related topicshttps://projecteuclid.org/euclid.ejs/1537344589<strong>Hanxiang Peng</strong>, <strong>Anton Schick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2962--2994.</p><p><strong>Abstract:</strong><br/>
This article develops a theory of maximum empirical likelihood estimation and empirical likelihood ratio testing with irregular and estimated constraint functions that parallels the theory for parametric models and is tailored for semiparametric models. The key is a uniform local asymptotic normality condition for the local empirical likelihood ratio. This condition is shown to hold under mild assumptions on the constraint function. Applications of our results are discussed to inference problems about quantiles under possibly additional information on the underlying distribution and to residual-based inference about quantiles.
</p>projecteuclid.org/euclid.ejs/1537344589_20181212220446Wed, 12 Dec 2018 22:04 ESTConsistency of variational Bayes inference for estimation and model selection in mixtureshttps://projecteuclid.org/euclid.ejs/1537344604<strong>Badr-Eddine Chérief-Abdellatif</strong>, <strong>Pierre Alquier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2995--3035.</p><p><strong>Abstract:</strong><br/>
Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.
</p>projecteuclid.org/euclid.ejs/1537344604_20181212220446Wed, 12 Dec 2018 22:04 ESTBayesian variable selection for globally sparse probabilistic PCAhttps://projecteuclid.org/euclid.ejs/1537430424<strong>Charles Bouveyron</strong>, <strong>Pierre Latouche</strong>, <strong>Pierre-Alexandre Mattei</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3036--3070.</p><p><strong>Abstract:</strong><br/>
Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables may be difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify which original variables are most relevant to describe the data. To this end, using Roweis’ probabilistic interpretation of PCA and an isotropic Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. Moreover, in order to avoid the drawbacks of discrete model selection, a simple relaxation of this framework is presented. It allows to find a path of candidate models using a variational expectation-maximization algorithm. The exact marginal likelihood can eventually be maximized over this path, relying on Occam’s razor to select the relevant variables. Since the sparsity pattern is common to all components, we call this approach globally sparse probabilistic PCA (GSPPCA). Its usefulness is illustrated on synthetic data sets and on several real unsupervised feature selection problems coming from signal processing and genomics. In particular, using unlabeled microarray data, GSPPCA is shown to infer biologically relevant subsets of genes. According to a metric based on pathway enrichment, it vastly surpasses in this context the performance of traditional sparse PCA algorithms. An R implementation of the GSPPCA algorithm is available at http://github.com/pamattei/GSPPCA .
</p>projecteuclid.org/euclid.ejs/1537430424_20181212220446Wed, 12 Dec 2018 22:04 ESTDetectability of nonparametric signals: higher criticism versus likelihood ratiohttps://projecteuclid.org/euclid.ejs/1544670253<strong>Marc Ditzhaus</strong>, <strong>Arnold Janssen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4094--4137.</p><p><strong>Abstract:</strong><br/>
We study the signal detection problem in high dimensional noise data (possibly) containing rare and weak signals. Log-likelihood ratio (LLR) tests depend on unknown parameters, but they are needed to judge the quality of detection tests since they determine the detection regions. The popular Tukey’s higher criticism (HC) test was shown to achieve the same completely detectable region as the LLR test does for different (mainly) parametric models. We present a novel technique to prove this result for very general signal models, including even nonparametric $p$-value models. Moreover, we address the following questions which are still pending since the initial paper of Donoho and Jin: What happens on the border of the completely detectable region, the so-called detection boundary? Does HC keep its optimality there? In particular, we give a complete answer for the heteroscedastic normal mixture model. As a byproduct, we give some new insights about the LLR test’s behaviour on the detection boundary by discussing, among others, Pitmans’s asymptotic efficiency as an application of Le Cam’s theory.
</p>projecteuclid.org/euclid.ejs/1544670253_20181212220446Wed, 12 Dec 2018 22:04 ESTEfficient MCMC for Gibbs random fields using pre-computationhttps://projecteuclid.org/euclid.ejs/1544670254<strong>Aidan Boland</strong>, <strong>Nial Friel</strong>, <strong>Florian Maire</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4138--4179.</p><p><strong>Abstract:</strong><br/>
Bayesian inference of Gibbs random fields (GRFs) is often referred to as a doubly intractable problem, since the normalizing constant of both the likelihood function and the posterior distribution are not in closed-form. The exploration of the posterior distribution of such models is typically carried out with a sophisticated Markov chain Monte Carlo (MCMC) method, the exchange algorithm [28], which requires simulations from the likelihood function at each iteration. The purpose of this paper is to consider an approach to dramatically reduce this computational overhead. To this end we introduce a novel class of algorithms which use realizations of the GRF model, simulated offline, at locations specified by a grid that spans the parameter space. This strategy speeds up dramatically the posterior inference, as illustrated on several examples. However, using the pre-computed graphs introduces a noise in the MCMC algorithm, which is no longer exact. We study the theoretical behaviour of the resulting approximate MCMC algorithm and derive convergence bounds using a recent theoretical development on approximate MCMC methods.
</p>projecteuclid.org/euclid.ejs/1544670254_20181212220446Wed, 12 Dec 2018 22:04 ESTThe asymptotic distribution of the isotonic regression estimator over a general countable pre-ordered sethttps://projecteuclid.org/euclid.ejs/1544670255<strong>Dragi Anevski</strong>, <strong>Vladimir Pastukhov</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4180--4208.</p><p><strong>Abstract:</strong><br/>
We study the isotonic regression estimator over a general countable pre-ordered set. We obtain the limiting distribution of the estimator and study its properties. It is proved that, under some general assumptions, the limiting distribution of the isotonized estimator is given by the concatenation of the separate isotonic regressions of the certain subvectors of an unrestrecred estimator’s asymptotic distribution. Also, we show that the isotonization preserves the rate of convergence of the underlying estimator. We apply these results to the problems of estimation of a bimonotone regression function and estimation of a bimonotone probability mass function.
</p>projecteuclid.org/euclid.ejs/1544670255_20181212220446Wed, 12 Dec 2018 22:04 ESTA quasi-Bayesian perspective to online clusteringhttps://projecteuclid.org/euclid.ejs/1537430425<strong>Le Li</strong>, <strong>Benjamin Guedj</strong>, <strong>Sébastien Loustau</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3071--3113.</p><p><strong>Abstract:</strong><br/>
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic ( i.e. , time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.
</p>projecteuclid.org/euclid.ejs/1537430425_20181214220243Fri, 14 Dec 2018 22:02 ESTEstimation of the covariance function of Gaussian isotropic random fields on spheres, related Rosenblatt-type distributions and the cosmic variance problemhttps://projecteuclid.org/euclid.ejs/1537841410<strong>Nikolai N. Leonenko</strong>, <strong>Murad S. Taqqu</strong>, <strong>Gyorgy H. Terdik</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3114--3146.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating the covariance function of an isotropic Gaussian stochastic field on the unit sphere using a single observation at each point of the discretized sphere. The spatial estimator of the covariance function is expressed in a new form which provides, on one hand a way to derive the characteristic function of the estimator, and on the other hand a computationally efficient method to do so. We also describe a methodology for handling the presence of the cosmic variance which can impair the results. In simulation, we use the pixelization scheme HEALPix.
</p>projecteuclid.org/euclid.ejs/1537841410_20181214220243Fri, 14 Dec 2018 22:02 ESTEffective sample size for spatial regression modelshttps://projecteuclid.org/euclid.ejs/1538013686<strong>Jonathan Acosta</strong>, <strong>Ronny Vallejos</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3147--3180.</p><p><strong>Abstract:</strong><br/>
We propose a new definition of effective sample size. Although the recent works of Griffith (2005, 2008) and Vallejos and Osorio (2014) provide a theoretical framework to address the reduction of information in a spatial sample due to spatial autocorrelation, the asymptotic properties of the estimations have not been studied in those studies or in previously ones. In addition, the concept of effective sample size has been developed primarily for spatial regression processes with a constant mean. This paper introduces a new definition of effective sample size for general spatial regression models that is coherent with previous definitions. The asymptotic normality of the maximum likelihood estimation is obtained under an increasing domain framework. In particular, the conditions for which the limiting distribution holds are established for the Matérn covariance family. Illustrative examples accompany the discussion of the limiting results, including some cases where the asymptotic variance has a closed form. The asymptotic normality leads to an approximate hypothesis testing that establishes whether there is redundant information in the sample. Simulation results support the theoretical findings and provide information about the behavior of the power of the suggested test. A real dataset in which a transect sampling scheme has been used is analyzed to estimate the effective sample size when a spatial linear regression model is assumed.
</p>projecteuclid.org/euclid.ejs/1538013686_20181214220243Fri, 14 Dec 2018 22:02 ESTIntensity approximation for pairwise interaction Gibbs point processes using determinantal point processeshttps://projecteuclid.org/euclid.ejs/1538013687<strong>Jean-François Coeurjolly</strong>, <strong>Frédéric Lavancier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3181--3203.</p><p><strong>Abstract:</strong><br/>
The intensity of a Gibbs point process is usually an intractable function of the model parameters. For repulsive pairwise interaction point processes, this intensity can be expressed as the Laplace transform of some particular function. Baddeley and Nair (2012) developped the Poisson-saddlepoint approximation which consists, for basic models, in calculating this Laplace transform with respect to a homogeneous Poisson point process. In this paper, we develop an approximation which consists in calculating the same Laplace transform with respect to a specific determinantal point process. This new approximation is efficiently implemented and turns out to be more accurate than the Poisson-saddlepoint approximation, as demonstrated by some numerical examples.
</p>projecteuclid.org/euclid.ejs/1538013687_20181214220243Fri, 14 Dec 2018 22:02 ESTEarly stopping for statistical inverse problems via truncated SVD estimationhttps://projecteuclid.org/euclid.ejs/1538121641<strong>Gilles Blanchard</strong>, <strong>Marc Hoffmann</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3204--3231.</p><p><strong>Abstract:</strong><br/>
We consider truncated SVD (or spectral cut-off, projection) estimators for a prototypical statistical inverse problem in dimension $D$. Since calculating the singular value decomposition (SVD) only for the largest singular values is much less costly than the full SVD, our aim is to select a data-driven truncation level $\widehat{m}\in \{1,\ldots ,D\}$ only based on the knowledge of the first $\widehat{m}$ singular values and vectors.
We analyse in detail whether sequential early stopping rules of this type can preserve statistical optimality. Information-constrained lower bounds and matching upper bounds for a residual based stopping rule are provided, which give a clear picture in which situation optimal sequential adaptation is feasible. Finally, a hybrid two-step approach is proposed which allows for classical oracle inequalities while considerably reducing numerical complexity.
</p>projecteuclid.org/euclid.ejs/1538121641_20181214220243Fri, 14 Dec 2018 22:02 ESTFalse discovery rate control for effect modification in observational studieshttps://projecteuclid.org/euclid.ejs/1538445643<strong>Bikram Karmakar</strong>, <strong>Ruth Heller</strong>, <strong>Dylan S. Small</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3232--3253.</p><p><strong>Abstract:</strong><br/>
In an observational study, a difference between the treatment and control group’s outcome might reflect the bias in treatment assignment rather than a true treatment effect. A sensitivity analysis determines the magnitude of this bias that would be needed to explain away as non-causal a significant treatment effect from a naive analysis that assumed no bias. Effect modification is the interaction between a treatment and a pretreatment covariate. In an observational study, there are often many possible effect modifiers and it is desirable to be able to look at the data to identify the effect modifiers that will be tested. For observational studies, we address simultaneously the problem of accounting for the multiplicity involved in choosing effect modifiers to test among many possible effect modifiers by looking at the data and conducting a proper sensitivity analysis. We develop an approach that provides finite sample false discovery rate control for a collection of adaptive hypotheses identified from the data on matched-pairs design. Along with simulation studies, an empirical study is presented on the effect of cigarette smoking on lead level in the blood using data from the U.S. National Health and Nutrition Examination Survey. Other applications of the suggested method are briefly discussed.
</p>projecteuclid.org/euclid.ejs/1538445643_20181214220243Fri, 14 Dec 2018 22:02 ESTChange-point detection in high-dimensional covariance structurehttps://projecteuclid.org/euclid.ejs/1538705038<strong>Valeriy Avanesov</strong>, <strong>Nazar Buzun</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3254--3294.</p><p><strong>Abstract:</strong><br/>
In this paper we introduce a novel approach for an important problem of break detection. Specifically, we are interested in detection of an abrupt change in the covariance structure of a high-dimensional random process – a problem, which has applications in many areas e.g., neuroimaging and finance. The developed approach is essentially a testing procedure involving a choice of a critical level. To that end a non-standard bootstrap scheme is proposed and theoretically justified under mild assumptions. Theoretical study features a result providing guaranties for break detection. All the theoretical results are established in a high-dimensional setting (dimensionality $p\gg n$). Multiscale nature of the approach allows for a trade-off between sensitivity of break detection and localization. The approach can be naturally employed in an on-line setting. Simulation study demonstrates that the approach matches the nominal level of false alarm probability and exhibits high power, outperforming a recent approach.
</p>projecteuclid.org/euclid.ejs/1538705038_20181214220243Fri, 14 Dec 2018 22:02 ESTGeometric ergodicity of Pólya-Gamma Gibbs sampler for Bayesian logistic regression with a flat priorhttps://projecteuclid.org/euclid.ejs/1538705039<strong>Xin Wang</strong>, <strong>Vivekananda Roy</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3295--3311.</p><p><strong>Abstract:</strong><br/>
The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson, Scott and Windle’s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson, Scott and Windle’s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates.
</p>projecteuclid.org/euclid.ejs/1538705039_20181214220243Fri, 14 Dec 2018 22:02 ESTSignificance testing in non-sparse high-dimensional linear modelshttps://projecteuclid.org/euclid.ejs/1538791404<strong>Yinchu Zhu</strong>, <strong>Jelena Bradic</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3312--3364.</p><p><strong>Abstract:</strong><br/>
In high-dimensional linear models, the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated; a large number of covariates might be expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust to model misspecification such as heteroscedasticity and lack of sparsity. CorrT is shown to have Type I error approaching the nominal level for any models and Type II error approaching zero for sparse and many dense models. In fact, CorrT is also shown to be optimal in a variety of frameworks: sparse, non-sparse and hybrid models where sparse and dense signals are mixed. Numerical experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.
</p>projecteuclid.org/euclid.ejs/1538791404_20181214220243Fri, 14 Dec 2018 22:02 ESTAdaptive MCMC for multiple changepoint analysis with applications to large datasetshttps://projecteuclid.org/euclid.ejs/1539050490<strong>Alan Benson</strong>, <strong>Nial Friel</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3365--3396.</p><p><strong>Abstract:</strong><br/>
We consider the problem of Bayesian inference for changepoints where the number and position of the changepoints are both unknown. In particular, we consider product partition models where it is possible to integrate out model parameters for the regime between each changepoint, leaving a posterior distribution over a latent vector indicating the presence or not of a changepoint at each observation. The same problem setting has been considered by Fearnhead (2006) where one can use filtering recursions to make exact inference. However, the complexity of this filtering recursions algorithm is quadratic in the number of observations. Our approach relies on an adaptive Markov Chain Monte Carlo (MCMC) method for finite discrete state spaces. We develop an adaptive algorithm which can learn from the past states of the Markov chain in order to build proposal distributions which can quickly discover where changepoint are likely to be located. We prove that our algorithm leaves the posterior distribution ergodic. Crucially, we demonstrate that our adaptive MCMC algorithm is viable for large datasets for which the filtering recursions approach is not. Moreover, we show that inference is possible in a reasonable time thus making Bayesian changepoint detection computationally efficient.
</p>projecteuclid.org/euclid.ejs/1539050490_20181214220243Fri, 14 Dec 2018 22:02 ESTWeighted batch means estimators in Markov chain Monte Carlohttps://projecteuclid.org/euclid.ejs/1539137549<strong>Ying Liu</strong>, <strong>James M. Flegal</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3397--3442.</p><p><strong>Abstract:</strong><br/>
This paper proposes a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.
</p>projecteuclid.org/euclid.ejs/1539137549_20181214220243Fri, 14 Dec 2018 22:02 ESTOn predictive density estimation with additional informationhttps://projecteuclid.org/euclid.ejs/1544842900<strong>Éric Marchand</strong>, <strong>Abdolnasser Sadeghkhani</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4209--4238.</p><p><strong>Abstract:</strong><br/>
Based on independently distributed $X_{1}\sim{N} _{p}(\theta _{1},\sigma ^{2}_{1}I_{p})$ and $X_{2}\sim{N}_{p}(\theta _{2},\sigma ^{2}_{2}I_{p})$, we consider the efficiency of various predictive density estimators for $Y_{1}\sim N_{p}(\theta _{1},\sigma ^{2}_{Y}I_{p})$, with the additional information $\theta _{1}-\theta _{2}\in A$ and known $\sigma ^{2}_{1},\sigma ^{2}_{2},\sigma ^{2}_{Y}$. We provide improvements on benchmark predictive densities such as those obtained by plug-in , by maximum likelihood, or as minimum risk equivariant. Dominance results are obtained for $\alpha -$divergence losses and include Bayesian improvements for Kullback-Leibler (KL) loss in the univariate case ($p=1$). An ensemble of techniques are exploited, including variance expansion, point estimation duality, and concave inequalities. Representations for Bayesian predictive densities, and in particular for $\hat{q}_{\pi_{U,A}}$ associated with a uniform prior for $\theta =(\theta _{1},\theta _{2})$ truncated to $\{\theta\in \mathbb{R}^{2p}:\theta _{1}-\theta _{2}\in A\}$, are established and are used for the Bayesian dominance findings. Finally and interestingly, these Bayesian predictive densities also relate to skew-normal distributions, as well as new forms of such distributions.
</p>projecteuclid.org/euclid.ejs/1544842900_20181214220243Fri, 14 Dec 2018 22:02 ESTA notion of stability for k-means clusteringhttps://projecteuclid.org/euclid.ejs/1544842901<strong>T. Le Gouic</strong>, <strong>Q. Paris</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4239--4263.</p><p><strong>Abstract:</strong><br/>
In this paper, we define and study a new notion of stability for the $k$-means clustering scheme building upon the field of quantization of a probability measure. We connect this definition of stability to a geometric feature of the underlying distribution of the data, named absolute margin condition, inspired by recent works on the subject.
</p>projecteuclid.org/euclid.ejs/1544842901_20181214220243Fri, 14 Dec 2018 22:02 ESTA criterion for privacy protection in data collection and its attainment via randomized response procedureshttps://projecteuclid.org/euclid.ejs/1544842902<strong>Jichong Chai</strong>, <strong>Tapan K. Nayak</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4264--4287.</p><p><strong>Abstract:</strong><br/>
Randomized response (RR) methods have long been suggested for protecting respondents’ privacy in statistical surveys. However, how to set and achieve privacy protection goals have received little attention. We give a full development and analysis of the view that a privacy mechanism should ensure that no intruder would gain much new information about any respondent from his response. Formally, we say that a privacy breach occurs when an intruder’s prior and posterior probabilities about a property of a respondent, denoted $p$ and $p_{*}$, respectively, satisfy $p_{*}<h_{l}(p)$ or $p_{*}>h_{u}(p)$, where $h_{l}$ and $h_{u}$ are two given functions. An RR procedure protects privacy if it does not permit any privacy breach. We explore effects of $(h_{l},h_{u})$ on the resultant privacy demand, and prove that it is precisely attainable only for certain $(h_{l},h_{u})$. This result is used to define a canonical strict privacy protection criterion, and give practical guidance on the choice of $(h_{l},h_{u})$. Then, we characterize all privacy satisfying RR procedures and compare their effects on data utility using sufficiency of experiments and identify the class of all admissible procedures. Finally, we establish an optimality property of a commonly used RR method.
</p>projecteuclid.org/euclid.ejs/1544842902_20181214220243Fri, 14 Dec 2018 22:02 EST