Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTDimension reduction-based significance testing in nonparametric regressionhttps://projecteuclid.org/euclid.ejs/1526695233<strong>Xuehu Zhu</strong>, <strong>Lixing Zhu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1468--1506.</p><p><strong>Abstract:</strong><br/>
A dimension reduction-based adaptive-to-model test is proposed for significance of a subset of covariates in the context of a nonparametric regression model. Unlike existing locally smoothing significance tests, the new test behaves like a locally smoothing test as if the number of covariates was just that under the null hypothesis and it can detect local alternative hypotheses distinct from the null hypothesis at the rate that is only related to the number of covariates under the null hypothesis. Thus, the curse of dimensionality is largely alleviated when nonparametric estimation is inevitably required. In the cases where there are many insignificant covariates, the improvement of the new test is very significant over existing locally smoothing tests on the significance level maintenance and power enhancement. Simulation studies and a real data analysis are conducted to examine the finite sample performance of the proposed test.
</p>projecteuclid.org/euclid.ejs/1526695233_20180621040108Thu, 21 Jun 2018 04:01 EDTSlice inverse regression with score functionshttps://projecteuclid.org/euclid.ejs/1526889626<strong>Dmitry Babichev</strong>, <strong>Francis Bach</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1507--1543.</p><p><strong>Abstract:</strong><br/>
We consider non-linear regression problems where we assume that the response depends non-linearly on a linear projection of the covariates. We propose score function extensions to sliced inverse regression problems, both for the first- order and second-order score functions. We show that they provably improve estimation in the population case over the non-sliced versions and we study finite sample estimators and their consistency given the exact score functions. We also propose to learn the score function as well, in two steps, i.e., first learning the score function and then learning the effective dimension reduction space, or directly, by solving a convex optimization problem regularized by the nuclear norm. We illustrate our results on a series of experiments.
</p>projecteuclid.org/euclid.ejs/1526889626_20180621040108Thu, 21 Jun 2018 04:01 EDTAn extended empirical saddlepoint approximation for intractable likelihoodshttps://projecteuclid.org/euclid.ejs/1527300140<strong>Matteo Fasiolo</strong>, <strong>Simon N. Wood</strong>, <strong>Florian Hartig</strong>, <strong>Mark V. Bravington</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1544--1578.</p><p><strong>Abstract:</strong><br/>
The challenges posed by complex stochastic models used in computational ecology, biology and genetics have stimulated the development of approximate approaches to statistical inference. Here we focus on Synthetic Likelihood (SL), a procedure that reduces the observed and simulated data to a set of summary statistics, and quantifies the discrepancy between them through a synthetic likelihood function. SL requires little tuning, but it relies on the approximate normality of the summary statistics. We relax this assumption by proposing a novel, more flexible, density estimator: the Extended Empirical Saddlepoint approximation. In addition to proving the consistency of SL, under either the new or the Gaussian density estimator, we illustrate the method using three examples. One of these is a complex individual-based forest model for which SL offers one of the few practical possibilities for statistical inference. The examples show that the new density estimator is able to capture large departures from normality, while being scalable to high dimensions, and this in turn leads to more accurate parameter estimates, relative to the Gaussian alternative. The new density estimator is implemented by the esaddle R package, which is freely available on the Comprehensive R Archive Network (CRAN).
</p>projecteuclid.org/euclid.ejs/1527300140_20180621040108Thu, 21 Jun 2018 04:01 EDTModified sequential change point procedures based on estimating functionshttps://projecteuclid.org/euclid.ejs/1527300141<strong>Claudia Kirch</strong>, <strong>Silke Weber</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1579--1613.</p><p><strong>Abstract:</strong><br/>
A large class of sequential change point tests are based on estimating functions where estimation is computationally efficient as (possibly numeric) optimization is restricted to an initial estimation. This includes examples as diverse as mean changes, linear or non-linear autoregressive and binary models. While the standard cumulative-sum-detector (CUSUM) has recently been considered in this general setup, we consider several modifications that have faster detection rates in particular if changes do occur late in the monitoring period. More presicely, we use three different types of detector statistics based on partial sums of a monitoring function, namely the modified moving-sum-statistic (mMOSUM), Page’s cumulative-sum-statistic (Page-CUSUM) and the standard moving-sum-statistic (MOSUM). The statistics only differ in the number of observations included in the partial sum. The mMOSUM uses a bandwidth parameter which multiplicatively scales the lower bound of the moving sum. The MOSUM uses a constant bandwidth parameter, while Page-CUSUM chooses the maximum over all possible lower bounds for the partial sums. So far, the first two schemes have only been studied in a linear model, the MOSUM only for a mean change.
We develop the asymptotics under the null hypothesis and alternatives under mild regularity conditions for each test statistic, which include the existing theory but also many new examples. In a simulation study we compare all four types of test procedures in terms of their size, power and run length. Additionally we illustrate their behavior by applications to exchange rate data as well as the Boston homicide data.
</p>projecteuclid.org/euclid.ejs/1527300141_20180621040108Thu, 21 Jun 2018 04:01 EDTOn penalized estimation for dynamical systems with small noisehttps://projecteuclid.org/euclid.ejs/1527300142<strong>Alessandro De Gregorio</strong>, <strong>Stefano Maria Iacus</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1614--1630.</p><p><strong>Abstract:</strong><br/>
We consider a dynamical system with small noise for which the drift is parametrized by a finite dimensional parameter. For this model, we consider minimum distance estimation from continuous time observations under $l^{p}$-penalty imposed on the parameters in the spirit of the Lasso approach, with the aim of simultaneous estimation and model selection. We study the consistency and the asymptotic distribution of these Lasso-type estimators for different values of $p$. For $p=1,$ we also consider the adaptive version of the Lasso estimator and establish its oracle properties.
</p>projecteuclid.org/euclid.ejs/1527300142_20180621040108Thu, 21 Jun 2018 04:01 EDTBayesian pairwise estimation under dependent informative samplinghttps://projecteuclid.org/euclid.ejs/1527300143<strong>Matthew R. Williams</strong>, <strong>Terrance D. Savitsky</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1631--1661.</p><p><strong>Abstract:</strong><br/>
An informative sampling design leads to the selection of units whose inclusion probabilities are correlated with the response variable of interest. Inference under the population model performed on the resulting observed sample, without adjustment, will be biased for the population generative model. One approach that produces asymptotically unbiased inference employs marginal inclusion probabilities to form sampling weights used to exponentiate each likelihood contribution of a pseudo likelihood used to form a pseudo posterior distribution. Conditions for posterior consistency restrict applicable sampling designs to those under which pairwise inclusion dependencies asymptotically limit to $0$. There are many sampling designs excluded by this restriction; for example, a multi-stage design that samples individuals within households. Viewing each household as a population, the dependence among individuals does not attenuate. We propose a more targeted approach in this paper for inference focused on pairs of individuals or sampled units; for example, the substance use of one spouse in a shared household, conditioned on the substance use of the other spouse. We formulate the pseudo likelihood with weights based on pairwise or second order probabilities and demonstrate consistency, removing the requirement for asymptotic independence and replacing it with restrictions on higher order selection probabilities. Our approach provides a nearly automated estimation procedure applicable to any model specified by the data analyst. We demonstrate our method on the National Survey on Drug Use and Health.
</p>projecteuclid.org/euclid.ejs/1527300143_20180621040108Thu, 21 Jun 2018 04:01 EDTHeritability estimation in case-control studieshttps://projecteuclid.org/euclid.ejs/1527559245<strong>Anna Bonnet</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1662--1716.</p><p><strong>Abstract:</strong><br/>
In the field of genetics, the concept of heritability refers to the proportion of variations of a biological trait or disease that can be explained by genetic factors. Quantifying the heritability of a disease is a fundamental challenge in human genetics, especially when the causes are plural and not clearly identified. Although the literature regarding heritability estimation for binary traits is less rich than for quantitative traits, several methods have been proposed to estimate the heritability of complex diseases. However, to the best of our knowledge, the existing methods are not supported by theoretical grounds. Moreover, most of the methodologies do not take into account a major specificity of the data coming from medical studies, which is the oversampling of the number of patients compared to controls. We propose in this paper to investigate the theoretical properties of the Phenotype Correlation Genotype Correlation (PCGC) regression developed by Golan, Lander and Rosset (2014), which is one of the major techniques used in statistical genetics and which is very efficient in practice, despite the oversampling of patients. Our main result is the proof of the consistency of this estimator, under several assumptions that we will state and discuss. We also provide a numerical study to compare two approximations leading to two heritability estimators.
</p>projecteuclid.org/euclid.ejs/1527559245_20180621040108Thu, 21 Jun 2018 04:01 EDTA deconvolution path for mixtureshttps://projecteuclid.org/euclid.ejs/1527559246<strong>Oscar-Hernan Madrid-Padilla</strong>, <strong>Nicholas G. Polson</strong>, <strong>James Scott</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1717--1751.</p><p><strong>Abstract:</strong><br/>
We propose a class of estimators for deconvolution in mixture models based on a simple two-step “bin-and-smooth” procedure applied to histogram counts. The method is both statistically and computationally efficient: by exploiting recent advances in convex optimization, we are able to provide a full deconvolution path that shows the estimate for the mi-xing distribution across a range of plausible degrees of smoothness, at far less cost than a full Bayesian analysis. This enables practitioners to conduct a sensitivity analysis with minimal effort. This is especially important for applied data analysis, given the ill-posed nature of the deconvolution problem. Our results establish the favorable theoretical properties of our estimator and show that it offers state-of-the-art performance when compared to benchmark methods across a range of scenarios.
</p>projecteuclid.org/euclid.ejs/1527559246_20180621040108Thu, 21 Jun 2018 04:01 EDTHigh-dimensional inference for personalized treatment decisionhttps://projecteuclid.org/euclid.ejs/1529568040<strong>X. Jessie Jeng</strong>, <strong>Wenbin Lu</strong>, <strong>Huimin Peng</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 2074--2089.</p><p><strong>Abstract:</strong><br/>
Recent development in statistical methodology for personalized treatment decision has utilized high-dimensional regression to take into account a large number of patients’ covariates and described personalized treatment decision through interactions between treatment and covariates. While a subset of interaction terms can be obtained by existing variable selection methods to indicate relevant covariates for making treatment decision, there often lacks statistical interpretation of the results. This paper proposes an asymptotically unbiased estimator based on Lasso solution for the interaction coefficients. We derive the limiting distribution of the estimator when baseline function of the regression model is unknown and possibly misspecified. Confidence intervals and p-values are derived to infer the effects of the patients’ covariates in making treatment decision. We confirm the accuracy of the proposed method and its robustness against misspecified function in simulation and apply the method to STAR∗D study for major depression disorder.
</p>projecteuclid.org/euclid.ejs/1529568040_20180621040108Thu, 21 Jun 2018 04:01 EDTMeasuring distributional asymmetry with Wasserstein distance and Rademacher symmetrizationhttps://projecteuclid.org/euclid.ejs/1531468822<strong>Adam B. Kashlak</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2091--2113.</p><p><strong>Abstract:</strong><br/>
We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the asymmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequality. Lastly, a wide range of applications are detailed including testing for data symmetry, constructing nonasymptotic high dimensional confidence sets, bounding the variance of an empirical process, and improving constants in Nemirovski style inequalities for Banach space valued random variables.
</p>projecteuclid.org/euclid.ejs/1531468822_20180713040028Fri, 13 Jul 2018 04:00 EDTPrincipal quantile regression for sufficient dimension reduction with heteroscedasticityhttps://projecteuclid.org/euclid.ejs/1531468823<strong>Chong Wang</strong>, <strong>Seung Jun Shin</strong>, <strong>Yichao Wu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2114--2140.</p><p><strong>Abstract:</strong><br/>
Sufficient dimension reduction (SDR) is a successful tool for reducing data dimensionality without stringent model assumptions. In practice, data often display heteroscedasticity which is of scientific importance in general but frequently overlooked since a primal goal of most existing statistical methods is to identify conditional mean relationship among variables. In this article, we propose a new SDR method called principal quantile regression (PQR) that efficiently tackles heteroscedasticity. PQR can naturally be extended to a nonlinear version via kernel trick. Asymptotic properties are established and an efficient solution path-based algorithm is provided. Numerical examples based on both simulated and real data demonstrate the PQR’s advantageous performance over existing SDR methods. PQR still performs very competitively even for the case without heteroscedasticity.
</p>projecteuclid.org/euclid.ejs/1531468823_20180713040028Fri, 13 Jul 2018 04:00 EDTFast learning rate of non-sparse multiple kernel learning and optimal regularization strategieshttps://projecteuclid.org/euclid.ejs/1531468825<strong>Taiji Suzuki</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2141--2192.</p><p><strong>Abstract:</strong><br/>
In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including $\ell_{p}$-MKL. According to the numerical experiments, it is known that the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy ($\ell_{1}$ or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, $\ell_{1}$-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse $\ell_{1}$-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.
</p>projecteuclid.org/euclid.ejs/1531468825_20180713040028Fri, 13 Jul 2018 04:00 EDTModel-free envelope dimension selectionhttps://projecteuclid.org/euclid.ejs/1531814505<strong>Xin Zhang</strong>, <strong>Qing Mai</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2193--2216.</p><p><strong>Abstract:</strong><br/>
An envelope is a targeted dimension reduction subspace for simultaneously achieving dimension reduction and improving parameter estimation efficiency. While many envelope methods have been proposed in recent years, all envelope methods hinge on the knowledge of a key hyperparameter, the structural dimension of the envelope. How to estimate the envelope dimension consistently is of substantial interest from both theoretical and practical aspects. Moreover, very recent advances in the literature have generalized envelope as a model-free method, which makes selecting the envelope dimension even more challenging. Likelihood-based approaches such as information criteria and likelihood-ratio tests either cannot be directly applied or have no theoretical justification. To address this critical issue of dimension selection, we propose two unified approaches – called FG and 1D selections – for determining the envelope dimension that can be applied to any envelope models and methods. The two model-free selection approaches are based on the two different envelope optimization procedures: the full Grassmannian (FG) optimization and the 1D algorithm [11], and are shown to be capable of correctly identifying the structural dimension with a probability tending to 1 under mild moment conditions as the sample size increases. While the FG selection unifies and generalizes the BIC and modified BIC approaches that existing in the literature, and hence provides the theoretical justification of them under weak moment condition and model-free context, the 1D selection is computationally more stable and efficient in finite sample. Extensive simulations and a real data analysis demonstrate the superb performance of our proposals.
</p>projecteuclid.org/euclid.ejs/1531814505_20180717040149Tue, 17 Jul 2018 04:01 EDTPrediction of dynamical time series using kernel based regression and smooth splineshttps://projecteuclid.org/euclid.ejs/1532333003<strong>Raymundo Navarrete</strong>, <strong>Divakar Viswanath</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2217--2237.</p><p><strong>Abstract:</strong><br/>
Prediction of dynamical time series with additive noise using support vector machines or kernel based regression is consistent for certain classes of discrete dynamical systems. Consistency implies that these methods are effective at computing the expected value of a point at a future time given the present coordinates. However, the present coordinates themselves are noisy, and therefore, these methods are not necessarily effective at removing noise. In this article, we consider denoising and prediction as separate problems for flows, as opposed to discrete time dynamical systems, and show that the use of smooth splines is more effective at removing noise. Combination of smooth splines and kernel based regression yields predictors that are more accurate on benchmarks typically by a factor of 2 or more. We prove that kernel based regression in combination with smooth splines converges to the exact predictor for time series extracted from any compact invariant set of any sufficiently smooth flow. As a consequence of convergence, one can find examples where the combination of kernel based regression with smooth splines is superior by even a factor of $100$. The predictors that we analyze and compute operate on delay coordinate data and not the full state vector, which is typically not observable.
</p>projecteuclid.org/euclid.ejs/1532333003_20180723040326Mon, 23 Jul 2018 04:03 EDTConfidence intervals for linear unbiased estimators under constrained dependencehttps://projecteuclid.org/euclid.ejs/1532333004<strong>Peter M. Aronow</strong>, <strong>Forrest W. Crawford</strong>, <strong>José R. Zubizarreta</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2238--2252.</p><p><strong>Abstract:</strong><br/>
We propose an approach for conducting inference for linear unbiased estimators applied to dependent outcomes given constraints on their independence relations, in the form of a dependency graph. We establish the consistency of an oracle variance estimator when a dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when a dependency graph is unknown, but topological or degree-based constraints are available on one such graph. We develop alternative bounds, including a closed-form bound, under an additional homoskedasticity assumption. We establish a basis for Wald-type confidence intervals that are guaranteed to have asymptotically conservative coverage.
</p>projecteuclid.org/euclid.ejs/1532333004_20180723040326Mon, 23 Jul 2018 04:03 EDTUpper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real linehttps://projecteuclid.org/euclid.ejs/1532333005<strong>Jérémie Bigot</strong>, <strong>Raúl Gouet</strong>, <strong>Thierry Klein</strong>, <strong>Alfredo López</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2253--2289.</p><p><strong>Abstract:</strong><br/>
This paper is focused on the statistical analysis of probability measures $\boldsymbol{\nu }_{1},\ldots ,\boldsymbol{\nu }_{n}$ on ${\mathbb{R}}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $\boldsymbol{\nu }_{i}$ are absolutely continuous with densities $\boldsymbol{f}_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1\leq i\leq n;\;1\leq j\leq p_{i}}$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots ,X_{i,p_{i}}$ are iid observations sampled from a random measure $\boldsymbol{\nu }_{i}$ for each $1\leq i\leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1\leq i\leq n}p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
</p>projecteuclid.org/euclid.ejs/1532333005_20180723040326Mon, 23 Jul 2018 04:03 EDTExchangeable trait allocationshttps://projecteuclid.org/euclid.ejs/1532484331<strong>Trevor Campbell</strong>, <strong>Diana Cai</strong>, <strong>Tamara Broderick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2290--2322.</p><p><strong>Abstract:</strong><br/>
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering—a special case of trait allocation—exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence “dust” groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.
</p>projecteuclid.org/euclid.ejs/1532484331_20180724220542Tue, 24 Jul 2018 22:05 EDTNon-parametric estimation of time varying AR(1)–processes with local stationarity and periodicityhttps://projecteuclid.org/euclid.ejs/1532484332<strong>Jean-Marc Bardet</strong>, <strong>Paul Doukhan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2323--2354.</p><p><strong>Abstract:</strong><br/>
Extending the ideas of [7], this paper aims at providing a kernel based non-parametric estimation of a new class of time varying AR(1) processes $(X_{t})$, with local stationarity and periodic features (with a known period $T$), inducing the definition $X_{t}=a_{t}(t/nT)X_{t-1}+\xi_{t}$ for $t\in \mathbb{N}$ and with $a_{t+T}\equiv a_{t}$. Central limit theorems are established for kernel estimators $\widehat{a}_{s}(u)$ reaching classical minimax rates and only requiring low order moment conditions of the white noise $(\xi_{t})_{t}$ up to the second order.
</p>projecteuclid.org/euclid.ejs/1532484332_20180724220542Tue, 24 Jul 2018 22:05 EDTScalable methods for Bayesian selective inferencehttps://projecteuclid.org/euclid.ejs/1532484333<strong>Snigdha Panigrahi</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2355--2400.</p><p><strong>Abstract:</strong><br/>
Modeled along the truncated approach in [20], selection-adjusted inference in a Bayesian regime is based on a selective posterior . Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in the use of a truncated likelihood. The normalizer of the truncated law in the adjusted framework is the probability of the selection event; this typically lacks a closed form expression leading to the computational bottleneck in sampling from such a posterior. The current work provides an optimization problem that approximates the otherwise intractable selective posterior and leads to scalable methods that give valid post-selective Bayesian inference. The selection procedures are posed as data-queries that solve a randomized version of a convex learning program which have the advantage of preserving more left-over information for inference.
We propose a randomization scheme under which the approximating optimization has separable constraints that result in a partially separable objective in lower dimensions for many commonly used selective queries. We show that the proposed optimization gives a valid exponential rate of decay for the selection probability on a large deviation scale under a Gaussian randomization scheme. On the implementation side, we offer a primal-dual method to solve the optimization problem leading to an approximate posterior; this allows us to exploit the usual merits of a Bayesian machinery in both low and high dimensional regimes when the underlying signal is effectively sparse. We show that the adjusted estimates empirically demonstrate better frequentist properties in comparison to the unadjusted estimates based on the usual posterior, when applied to a wide range of constrained, convex data queries.
</p>projecteuclid.org/euclid.ejs/1532484333_20180724220542Tue, 24 Jul 2018 22:05 EDTAsymptotic minimum scoring rule predictionhttps://projecteuclid.org/euclid.ejs/1532484334<strong>Federica Giummolè</strong>, <strong>Valentina Mameli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2401--2429.</p><p><strong>Abstract:</strong><br/>
Most of the methods nowadays employed in forecast problems are based on scoring rules. There is a divergence function associated to each scoring rule, that can be used as a measure of discrepancy between probability distributions. This approach is commonly used in the literature for comparing two competing predictive distributions on the basis of their relative expected divergence from the true distribution.
In this paper we focus on the use of scoring rules as a tool for finding predictive distributions for an unknown of interest. The proposed predictive distributions are asymptotic modifications of the estimative solutions, obtained by minimizing the expected divergence related to a general scoring rule.
The asymptotic properties of such predictive distributions are strictly related to the geometry induced by the considered divergence on a regular parametric model. In particular, the existence of a global optimal predictive distribution is guaranteed for invariant divergences, whose local behaviour is similar to well known $\alpha $-divergences.
We show that a wide class of divergences obtained from weighted scoring rules share invariance properties with $\alpha $-divergences. For weighted scoring rules it is thus possible to obtain a global solution to the prediction problem. Unfortunately, the divergences associated to many widely used scoring rules are not invariant. Still for these cases we provide a locally optimal predictive distribution, within a specified parametric model.
</p>projecteuclid.org/euclid.ejs/1532484334_20180724220542Tue, 24 Jul 2018 22:05 EDTOn the role of the overall effect in exponential familieshttps://projecteuclid.org/euclid.ejs/1532484335<strong>Anna Klimova</strong>, <strong>Tamás Rudas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2430--2453.</p><p><strong>Abstract:</strong><br/>
Exponential families of discrete probability distributions when the normalizing constant (or overall effect) is added or removed are compared in this paper. The latter setup, in which the exponential family is curved, is particularly relevant when the sample space is an incomplete Cartesian product or when it is very large, so that the computational burden is significant. The lack or presence of the overall effect has a fundamental impact on the properties of the exponential family. When the overall effect is added, the family becomes the smallest regular exponential family containing the curved one. The procedure is related to the homogenization of an inhomogeneous variety discussed in algebraic geometry, of which a statistical interpretation is given as an augmentation of the sample space. The changes in the kernel basis representation when the overall effect is included or removed are derived. The geometry of maximum likelihood estimates, also allowing zero observed frequencies, is described with and without the overall effect, and various algorithms are compared. The importance of the results is illustrated by an example from cell biology, showing that routinely including the overall effect leads to estimates which are not in the model intended by the researchers.
</p>projecteuclid.org/euclid.ejs/1532484335_20180724220542Tue, 24 Jul 2018 22:05 EDTA new design strategy for hypothesis testing under response adaptive randomizationhttps://projecteuclid.org/euclid.ejs/1532484336<strong>Alessandro Baldi Antognini</strong>, <strong>Alessandro Vagheggini</strong>, <strong>Maroussa Zagoraiou</strong>, <strong>Marco Novelli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2454--2481.</p><p><strong>Abstract:</strong><br/>
The aim of this paper is to provide a new design strategy for response adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more efficient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed approach.
</p>projecteuclid.org/euclid.ejs/1532484336_20180724220542Tue, 24 Jul 2018 22:05 EDTWasserstein and total variation distance between marginals of Lévy processeshttps://projecteuclid.org/euclid.ejs/1532657104<strong>Ester Mariucci</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2482--2514.</p><p><strong>Abstract:</strong><br/>
We present upper bounds for the Wasserstein distance of order $p$ between the marginals of Lévy processes, including Gaussian approximations for jumps of infinite activity. Using the convolution structure, we further derive upper bounds for the total variation distance between the marginals of Lévy processes. Connections to other metrics like Zolotarev and Toscani-Fourier distances are established. The theory is illustrated by concrete examples and an application to statistical lower bounds.
</p>projecteuclid.org/euclid.ejs/1532657104_20180726220515Thu, 26 Jul 2018 22:05 EDTA noninformative Bayesian approach for selecting a good post-stratificationhttps://projecteuclid.org/euclid.ejs/1532678418<strong>Patrick Zimmerman</strong>, <strong>Glen Meeden</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2515--2536.</p><p><strong>Abstract:</strong><br/>
In the standard design approach to survey sampling prior information is often used to stratify the population of interest. A good choice of the strata can yield significant improvement in the resulting estimator. However, if there are several possible ways to stratify the population, it might not be clear which is best. Here we assume that before the sample is taken a limited number of possible stratifications have been defined. We will propose an objective Bayesian approach that allows one to consider these several different possible stratifications simultaneously. Given the sample the posterior distribution will assign more weight to the good stratifications and less to the others. Empirical results suggest that the resulting estimator will typically be almost as good as the estimator based on the best stratification and better than the estimator which does not use stratification. It will also have a sensible estimate of precision.
</p>projecteuclid.org/euclid.ejs/1532678418_20180727040030Fri, 27 Jul 2018 04:00 EDTOn kernel methods for covariates that are rankingshttps://projecteuclid.org/euclid.ejs/1534233701<strong>Horia Mania</strong>, <strong>Aaditya Ramdas</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>, <strong>Benjamin Recht</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2537--2577.</p><p><strong>Abstract:</strong><br/>
Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way when numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. We characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that the Mallows kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.
</p>projecteuclid.org/euclid.ejs/1534233701_20180814040157Tue, 14 Aug 2018 04:01 EDTRelevant change points in high dimensional time serieshttps://projecteuclid.org/euclid.ejs/1535681028<strong>Holger Dette</strong>, <strong>Josua Gösmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2578--2636.</p><p><strong>Abstract:</strong><br/>
This paper investigates the problem of detecting relevant change points in the mean vector, say $\mu_{t}=(\mu_{t,1},\ldots ,\mu_{t,d})^{T}$ of a high dimensional time series $(Z_{t})_{t\in \mathbb{Z}}$. While the recent literature on testing for change points in this context considers hypotheses for the equality of the means $\mu_{h}^{(1)}$ and $\mu_{h}^{(2)}$ before and after the change points in the different components, we are interested in a null hypothesis of the form \begin{equation*}H_{0}:|\mu^{(1)}_{h}-\mu^{(2)}_{h}|\leq \Delta_{h}~~~\mbox{ forall }~~h=1,\ldots ,d\end{equation*} where $\Delta_{1},\ldots ,\Delta_{d}$ are given thresholds for which a smaller difference of the means in the $h$-th component is considered to be non-relevant. This formulation of the testing problem is motivated by the fact that in many applications a modification of the statistical analysis might not be necessary, if the differences between the parameters before and after the change points in the individual components are small. This problem is of particular relevance in high dimensional change point analysis, where a small change in only one component can yield a rejection by the classical procedure although all components change only in a non-relevant way.
We propose a new test for this problem based on the maximum of squared and integrated CUSUM statistics and investigate its properties as the sample size $n$ and the dimension $d$ both converge to infinity. In particular, using Gaussian approximations for the maximum of a large number of dependent random variables, we show that on certain points of the boundary of the null hypothesis a standardized version of the maximum converges weakly to a Gumbel distribution. This result is used to construct a consistent asymptotic level $\alpha $ test and a multiplier bootstrap procedure is proposed, which improves the finite sample performance of the test. The finite sample properties of the test are investigated by means of a simulation study and we also illustrate the new approach investigating data from hydrology.
</p>projecteuclid.org/euclid.ejs/1535681028_20180830220403Thu, 30 Aug 2018 22:04 EDTOn inference validity of weighted U-statistics under data heterogeneityhttps://projecteuclid.org/euclid.ejs/1535681029<strong>Fang Han</strong>, <strong>Tianchen Qian</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2637--2708.</p><p><strong>Abstract:</strong><br/>
Motivated by challenges on studying a new correlation measurement being popularized in evaluating online ranking algorithms’ performance, this manuscript explores the validity of uncertainty assessment for weighted U-statistics. Without any commonly adopted assumption, we verify Efron’s bootstrap and a new resampling procedure’s inference validity. Specifically, in its full generality, our theory allows both kernels and weights asymmetric and data points not identically distributed, which are all new issues that historically have not been addressed. For achieving strict generalization, for example, we have to carefully control the order of the “degenerate” term in U-statistics which are no longer degenerate under the empirical measure for non-i.i.d. data. Our result applies to the motivating task, giving the region at which solid statistical inference can be made.
</p>projecteuclid.org/euclid.ejs/1535681029_20180830220403Thu, 30 Aug 2018 22:04 EDTOn the dimension effect of regularized linear discriminant analysishttps://projecteuclid.org/euclid.ejs/1536976838<strong>Cheng Wang</strong>, <strong>Binyan Jiang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2709--2742.</p><p><strong>Abstract:</strong><br/>
This paper studies the dimension effect of the linear discriminant analysis (LDA) and the regularized linear discriminant analysis (RLDA) classifiers for large dimensional data where the observation dimension $p$ is of the same order as the sample size $n$. More specifically, built on properties of the Wishart distribution and recent results in random matrix theory, we derive explicit expressions for the asymptotic misclassification errors of LDA and RLDA respectively, from which we gain insights of how dimension affects the performance of classification and in what sense. Motivated by these results, we propose adjusted classifiers by correcting the bias brought by the unequal sample sizes. The bias-corrected LDA and RLDA classifiers are shown to have smaller misclassification rates than LDA and RLDA respectively. Several interesting examples are discussed in detail and the theoretical results on dimension effect are illustrated via extensive simulation studies.
</p>projecteuclid.org/euclid.ejs/1536976838_20180914220051Fri, 14 Sep 2018 22:00 EDTInference for high-dimensional split-plot-designs: A unified approach for small to large numbers of factor levelshttps://projecteuclid.org/euclid.ejs/1536976839<strong>Paavo Sattler</strong>, <strong>Markus Pauly</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2743--2805.</p><p><strong>Abstract:</strong><br/>
Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, diverse types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$, subjects. In this paper, we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles.
The test procedures are based on standardized quadratic forms involving suitably symmetrized U-statistics-type estimators which are robust against an increasing number of dimensions $d$ and/or groups $a$. We then discuss their limit distributions in a general asymptotic framework and additionally propose improved small sample approximations. Finally, the small sample performance is investigated in simulations and applicability is illustrated by a real data analysis.
</p>projecteuclid.org/euclid.ejs/1536976839_20180914220051Fri, 14 Sep 2018 22:00 EDTMass volume curves and anomaly rankinghttps://projecteuclid.org/euclid.ejs/1537257627<strong>Stephan Clémençon</strong>, <strong>Albert Thomas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2806--2872.</p><p><strong>Abstract:</strong><br/>
This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more ‘abnormal’ as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued ‘scoring’ function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density almost everywhere on the support of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controlling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve.
</p>projecteuclid.org/euclid.ejs/1537257627_20180918040049Tue, 18 Sep 2018 04:00 EDTGoodness-of-fit tests for complete spatial randomness based on Minkowski functionals of binary imageshttps://projecteuclid.org/euclid.ejs/1537257628<strong>Bruno Ebner</strong>, <strong>Norbert Henze</strong>, <strong>Michael A. Klatt</strong>, <strong>Klaus Mecke</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2873--2904.</p><p><strong>Abstract:</strong><br/>
We propose a class of goodness-of-fit tests for complete spatial randomness (CSR). In contrast to standard tests, our procedure utilizes a transformation of the data to a binary image, which is then characterized by geometric functionals. Under a suitable limiting regime, we derive the asymptotic distribution of the test statistics under the null hypothesis and almost sure limits under certain alternatives. The new tests are computationally efficient, and simulations show that they are strong competitors to other tests of CSR. The tests are applied to a real data set in gamma-ray astronomy, and immediate extensions are presented to encourage further work.
</p>projecteuclid.org/euclid.ejs/1537257628_20180918040049Tue, 18 Sep 2018 04:00 EDTPower-law partial correlation network modelshttps://projecteuclid.org/euclid.ejs/1537257629<strong>Matteo Barigozzi</strong>, <strong>Christian Brownlees</strong>, <strong>Gábor Lugosi</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2905--2929.</p><p><strong>Abstract:</strong><br/>
We introduce a class of partial correlation network models whose network structure is determined by a random graph. In particular in this work we focus on a version of the model in which the random graph has a power-law degree distribution. A number of cross-sectional dependence properties of this class of models are derived. The main result we establish is that when the random graph is power-law, the system exhibits a high degree of collinearity. More precisely, the largest eigenvalues of the inverse covariance matrix converge to an affine function of the degrees of the most interconnected vertices in the network. The result implies that the largest eigenvalues of the inverse covariance matrix are approximately power-law distributed, and that, as the system dimension increases, the eigenvalues diverge. As an empirical illustration we analyse two panels of stock returns of companies listed in the S&P 500 and S&P 1500 and show that the covariance matrices of returns exhibits empirical features that are consistent with our power-law model.
</p>projecteuclid.org/euclid.ejs/1537257629_20180918040049Tue, 18 Sep 2018 04:00 EDTOnline natural gradient as a Kalman filterhttps://projecteuclid.org/euclid.ejs/1537257630<strong>Yann Ollivier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2930--2961.</p><p><strong>Abstract:</strong><br/>
We cast Amari’s natural gradient in statistical learning as a specific case of Kalman filtering. Namely, applying an extended Kalman filter to estimate a fixed unknown parameter of a probabilistic model from a series of observations, is rigorously equivalent to estimating this parameter via an online stochastic natural gradient descent on the log-likelihood of the observations.
In the i.i.d. case, this relation is a consequence of the “information filter” phrasing of the extended Kalman filter. In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models.
This exact algebraic correspondence provides relevant interpretations for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix.
</p>projecteuclid.org/euclid.ejs/1537257630_20180918040049Tue, 18 Sep 2018 04:00 EDTMaximum empirical likelihood estimation and related topicshttps://projecteuclid.org/euclid.ejs/1537344589<strong>Hanxiang Peng</strong>, <strong>Anton Schick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2962--2994.</p><p><strong>Abstract:</strong><br/>
This article develops a theory of maximum empirical likelihood estimation and empirical likelihood ratio testing with irregular and estimated constraint functions that parallels the theory for parametric models and is tailored for semiparametric models. The key is a uniform local asymptotic normality condition for the local empirical likelihood ratio. This condition is shown to hold under mild assumptions on the constraint function. Applications of our results are discussed to inference problems about quantiles under possibly additional information on the underlying distribution and to residual-based inference about quantiles.
</p>projecteuclid.org/euclid.ejs/1537344589_20180919042128Wed, 19 Sep 2018 04:21 EDTConsistency of variational Bayes inference for estimation and model selection in mixtureshttps://projecteuclid.org/euclid.ejs/1537344604<strong>Badr-Eddine Chérief-Abdellatif</strong>, <strong>Pierre Alquier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2995--3035.</p><p><strong>Abstract:</strong><br/>
Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.
</p>projecteuclid.org/euclid.ejs/1537344604_20180919042128Wed, 19 Sep 2018 04:21 EDTBayesian variable selection for globally sparse probabilistic PCAhttps://projecteuclid.org/euclid.ejs/1537430424<strong>Charles Bouveyron</strong>, <strong>Pierre Latouche</strong>, <strong>Pierre-Alexandre Mattei</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3036--3070.</p><p><strong>Abstract:</strong><br/>
Sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features of high-dimensional data in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables may be difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure that allows to obtain several sparse components with the same sparsity pattern. This allows the practitioner to identify which original variables are most relevant to describe the data. To this end, using Roweis’ probabilistic interpretation of PCA and an isotropic Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. Moreover, in order to avoid the drawbacks of discrete model selection, a simple relaxation of this framework is presented. It allows to find a path of candidate models using a variational expectation-maximization algorithm. The exact marginal likelihood can eventually be maximized over this path, relying on Occam’s razor to select the relevant variables. Since the sparsity pattern is common to all components, we call this approach globally sparse probabilistic PCA (GSPPCA). Its usefulness is illustrated on synthetic data sets and on several real unsupervised feature selection problems coming from signal processing and genomics. In particular, using unlabeled microarray data, GSPPCA is shown to infer biologically relevant subsets of genes. According to a metric based on pathway enrichment, it vastly surpasses in this context the performance of traditional sparse PCA algorithms. An R implementation of the GSPPCA algorithm is available at http://github.com/pamattei/GSPPCA .
</p>projecteuclid.org/euclid.ejs/1537430424_20180920040056Thu, 20 Sep 2018 04:00 EDTA quasi-Bayesian perspective to online clusteringhttps://projecteuclid.org/euclid.ejs/1537430425<strong>Le Li</strong>, <strong>Benjamin Guedj</strong>, <strong>Sébastien Loustau</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3071--3113.</p><p><strong>Abstract:</strong><br/>
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic ( i.e. , time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.
</p>projecteuclid.org/euclid.ejs/1537430425_20180920040056Thu, 20 Sep 2018 04:00 EDTEstimation of the covariance function of Gaussian isotropic random fields on spheres, related Rosenblatt-type distributions and the cosmic variance problemhttps://projecteuclid.org/euclid.ejs/1537841410<strong>Nikolai N. Leonenko</strong>, <strong>Murad S. Taqqu</strong>, <strong>Gyorgy H. Terdik</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3114--3146.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating the covariance function of an isotropic Gaussian stochastic field on the unit sphere using a single observation at each point of the discretized sphere. The spatial estimator of the covariance function is expressed in a new form which provides, on one hand a way to derive the characteristic function of the estimator, and on the other hand a computationally efficient method to do so. We also describe a methodology for handling the presence of the cosmic variance which can impair the results. In simulation, we use the pixelization scheme HEALPix.
</p>projecteuclid.org/euclid.ejs/1537841410_20180924221023Mon, 24 Sep 2018 22:10 EDTEffective sample size for spatial regression modelshttps://projecteuclid.org/euclid.ejs/1538013686<strong>Jonathan Acosta</strong>, <strong>Ronny Vallejos</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3147--3180.</p><p><strong>Abstract:</strong><br/>
We propose a new definition of effective sample size. Although the recent works of Griffith (2005, 2008) and Vallejos and Osorio (2014) provide a theoretical framework to address the reduction of information in a spatial sample due to spatial autocorrelation, the asymptotic properties of the estimations have not been studied in those studies or in previously ones. In addition, the concept of effective sample size has been developed primarily for spatial regression processes with a constant mean. This paper introduces a new definition of effective sample size for general spatial regression models that is coherent with previous definitions. The asymptotic normality of the maximum likelihood estimation is obtained under an increasing domain framework. In particular, the conditions for which the limiting distribution holds are established for the Matérn covariance family. Illustrative examples accompany the discussion of the limiting results, including some cases where the asymptotic variance has a closed form. The asymptotic normality leads to an approximate hypothesis testing that establishes whether there is redundant information in the sample. Simulation results support the theoretical findings and provide information about the behavior of the power of the suggested test. A real dataset in which a transect sampling scheme has been used is analyzed to estimate the effective sample size when a spatial linear regression model is assumed.
</p>projecteuclid.org/euclid.ejs/1538013686_20180926220212Wed, 26 Sep 2018 22:02 EDTIntensity approximation for pairwise interaction Gibbs point processes using determinantal point processeshttps://projecteuclid.org/euclid.ejs/1538013687<strong>Jean-François Coeurjolly</strong>, <strong>Frédéric Lavancier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3181--3203.</p><p><strong>Abstract:</strong><br/>
The intensity of a Gibbs point process is usually an intractable function of the model parameters. For repulsive pairwise interaction point processes, this intensity can be expressed as the Laplace transform of some particular function. Baddeley and Nair (2012) developped the Poisson-saddlepoint approximation which consists, for basic models, in calculating this Laplace transform with respect to a homogeneous Poisson point process. In this paper, we develop an approximation which consists in calculating the same Laplace transform with respect to a specific determinantal point process. This new approximation is efficiently implemented and turns out to be more accurate than the Poisson-saddlepoint approximation, as demonstrated by some numerical examples.
</p>projecteuclid.org/euclid.ejs/1538013687_20180926220212Wed, 26 Sep 2018 22:02 EDTEarly stopping for statistical inverse problems via truncated SVD estimationhttps://projecteuclid.org/euclid.ejs/1538121641<strong>Gilles Blanchard</strong>, <strong>Marc Hoffmann</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3204--3231.</p><p><strong>Abstract:</strong><br/>
We consider truncated SVD (or spectral cut-off, projection) estimators for a prototypical statistical inverse problem in dimension $D$. Since calculating the singular value decomposition (SVD) only for the largest singular values is much less costly than the full SVD, our aim is to select a data-driven truncation level $\widehat{m}\in \{1,\ldots ,D\}$ only based on the knowledge of the first $\widehat{m}$ singular values and vectors.
We analyse in detail whether sequential early stopping rules of this type can preserve statistical optimality. Information-constrained lower bounds and matching upper bounds for a residual based stopping rule are provided, which give a clear picture in which situation optimal sequential adaptation is feasible. Finally, a hybrid two-step approach is proposed which allows for classical oracle inequalities while considerably reducing numerical complexity.
</p>projecteuclid.org/euclid.ejs/1538121641_20180928040113Fri, 28 Sep 2018 04:01 EDTFalse discovery rate control for effect modification in observational studieshttps://projecteuclid.org/euclid.ejs/1538445643<strong>Bikram Karmakar</strong>, <strong>Ruth Heller</strong>, <strong>Dylan S. Small</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3232--3253.</p><p><strong>Abstract:</strong><br/>
In an observational study, a difference between the treatment and control group’s outcome might reflect the bias in treatment assignment rather than a true treatment effect. A sensitivity analysis determines the magnitude of this bias that would be needed to explain away as non-causal a significant treatment effect from a naive analysis that assumed no bias. Effect modification is the interaction between a treatment and a pretreatment covariate. In an observational study, there are often many possible effect modifiers and it is desirable to be able to look at the data to identify the effect modifiers that will be tested. For observational studies, we address simultaneously the problem of accounting for the multiplicity involved in choosing effect modifiers to test among many possible effect modifiers by looking at the data and conducting a proper sensitivity analysis. We develop an approach that provides finite sample false discovery rate control for a collection of adaptive hypotheses identified from the data on matched-pairs design. Along with simulation studies, an empirical study is presented on the effect of cigarette smoking on lead level in the blood using data from the U.S. National Health and Nutrition Examination Survey. Other applications of the suggested method are briefly discussed.
</p>projecteuclid.org/euclid.ejs/1538445643_20181001220107Mon, 01 Oct 2018 22:01 EDTChange-point detection in high-dimensional covariance structurehttps://projecteuclid.org/euclid.ejs/1538705038<strong>Valeriy Avanesov</strong>, <strong>Nazar Buzun</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3254--3294.</p><p><strong>Abstract:</strong><br/>
In this paper we introduce a novel approach for an important problem of break detection. Specifically, we are interested in detection of an abrupt change in the covariance structure of a high-dimensional random process – a problem, which has applications in many areas e.g., neuroimaging and finance. The developed approach is essentially a testing procedure involving a choice of a critical level. To that end a non-standard bootstrap scheme is proposed and theoretically justified under mild assumptions. Theoretical study features a result providing guaranties for break detection. All the theoretical results are established in a high-dimensional setting (dimensionality $p\gg n$). Multiscale nature of the approach allows for a trade-off between sensitivity of break detection and localization. The approach can be naturally employed in an on-line setting. Simulation study demonstrates that the approach matches the nominal level of false alarm probability and exhibits high power, outperforming a recent approach.
</p>projecteuclid.org/euclid.ejs/1538705038_20181004220421Thu, 04 Oct 2018 22:04 EDTGeometric ergodicity of Pólya-Gamma Gibbs sampler for Bayesian logistic regression with a flat priorhttps://projecteuclid.org/euclid.ejs/1538705039<strong>Xin Wang</strong>, <strong>Vivekananda Roy</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3295--3311.</p><p><strong>Abstract:</strong><br/>
The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson, Scott and Windle’s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson, Scott and Windle’s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates.
</p>projecteuclid.org/euclid.ejs/1538705039_20181004220421Thu, 04 Oct 2018 22:04 EDTSignificance testing in non-sparse high-dimensional linear modelshttps://projecteuclid.org/euclid.ejs/1538791404<strong>Yinchu Zhu</strong>, <strong>Jelena Bradic</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3312--3364.</p><p><strong>Abstract:</strong><br/>
In high-dimensional linear models, the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated; a large number of covariates might be expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust to model misspecification such as heteroscedasticity and lack of sparsity. CorrT is shown to have Type I error approaching the nominal level for any models and Type II error approaching zero for sparse and many dense models. In fact, CorrT is also shown to be optimal in a variety of frameworks: sparse, non-sparse and hybrid models where sparse and dense signals are mixed. Numerical experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.
</p>projecteuclid.org/euclid.ejs/1538791404_20181005220343Fri, 05 Oct 2018 22:03 EDTAdaptive MCMC for multiple changepoint analysis with applications to large datasetshttps://projecteuclid.org/euclid.ejs/1539050490<strong>Alan Benson</strong>, <strong>Nial Friel</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3365--3396.</p><p><strong>Abstract:</strong><br/>
We consider the problem of Bayesian inference for changepoints where the number and position of the changepoints are both unknown. In particular, we consider product partition models where it is possible to integrate out model parameters for the regime between each changepoint, leaving a posterior distribution over a latent vector indicating the presence or not of a changepoint at each observation. The same problem setting has been considered by Fearnhead (2006) where one can use filtering recursions to make exact inference. However, the complexity of this filtering recursions algorithm is quadratic in the number of observations. Our approach relies on an adaptive Markov Chain Monte Carlo (MCMC) method for finite discrete state spaces. We develop an adaptive algorithm which can learn from the past states of the Markov chain in order to build proposal distributions which can quickly discover where changepoint are likely to be located. We prove that our algorithm leaves the posterior distribution ergodic. Crucially, we demonstrate that our adaptive MCMC algorithm is viable for large datasets for which the filtering recursions approach is not. Moreover, we show that inference is possible in a reasonable time thus making Bayesian changepoint detection computationally efficient.
</p>projecteuclid.org/euclid.ejs/1539050490_20181008220200Mon, 08 Oct 2018 22:01 EDTWeighted batch means estimators in Markov chain Monte Carlohttps://projecteuclid.org/euclid.ejs/1539137549<strong>Ying Liu</strong>, <strong>James M. Flegal</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3397--3442.</p><p><strong>Abstract:</strong><br/>
This paper proposes a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.
</p>projecteuclid.org/euclid.ejs/1539137549_20181009221252Tue, 09 Oct 2018 22:12 EDTOn the prediction loss of the lasso in the partially labeled settinghttps://projecteuclid.org/euclid.ejs/1539676834<strong>Pierre C. Bellec</strong>, <strong>Arnak S. Dalalyan</strong>, <strong>Edwin Grappin</strong>, <strong>Quentin Paris</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3443--3472.</p><p><strong>Abstract:</strong><br/>
In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.
</p>projecteuclid.org/euclid.ejs/1539676834_20181016040106Tue, 16 Oct 2018 04:01 EDTNoise contrastive estimation: Asymptotic properties, formal comparison with MC-MLEhttps://projecteuclid.org/euclid.ejs/1539741651<strong>Lionel Riou-Durand</strong>, <strong>Nicolas Chopin</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3473--3518.</p><p><strong>Abstract:</strong><br/>
A statistical model is said to be un-normalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MC-MLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial data-points to approximate the normalising constant. While the asymptotics of MC-MLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MC-MLE under several asymptotic regimes. In particular, we show that, when $m\rightarrow \infty $ while $n$ is fixed ($m$ and $n$ being respectively the number of artificial data-points, and actual data-points), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial data-points are IID, and when $n\rightarrow \infty $ while $m/n$ converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MC-MLE estimator. We illustrate the variance reduction brought by NCE through a numerical study.
</p>projecteuclid.org/euclid.ejs/1539741651_20181016220131Tue, 16 Oct 2018 22:01 EDT