The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTTesting for independence of large dimensional vectorshttps://projecteuclid.org/euclid.aos/1564797870<strong>Taras Bodnar</strong>, <strong>Holger Dette</strong>, <strong>Nestor Parolya</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 5, 2977--3008.</p><p><strong>Abstract:</strong><br/>
In this paper, new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose, we study the weak convergence of linear spectral statistics of central and (conditionally) noncentral Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) noncentral Fisher matrices is derived which is then used to analyse the power of the tests under the alternative.
The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand, the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.
</p>projecteuclid.org/euclid.aos/1564797870_20190802220459Fri, 02 Aug 2019 22:04 EDTDistributed estimation of principal eigenspaceshttps://projecteuclid.org/euclid.aos/1572487381<strong>Jianqing Fan</strong>, <strong>Dong Wang</strong>, <strong>Kaizheng Wang</strong>, <strong>Ziwei Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3009--3031.</p><p><strong>Abstract:</strong><br/>
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top $K$ eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top $K$ eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased, and hence the distributed PCA is “unbiased.” We derive the rate of convergence for distributed PCA estimators, which depends explicitly on the effective rank of covariance, eigengap, and the number of machines. We show that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data. The theoretical results are verified by an extensive simulation study. We also extend our analysis to the heterogeneous case where the population covariance matrices are different across local machines but share similar top eigenstructures.
</p>projecteuclid.org/euclid.aos/1572487381_20191030220349Wed, 30 Oct 2019 22:03 EDTAdditive models with trend filteringhttps://projecteuclid.org/euclid.aos/1572487382<strong>Veeranjaneyulu Sadhanala</strong>, <strong>Ryan J. Tibshirani</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3032--3068.</p><p><strong>Abstract:</strong><br/>
We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $k\geq0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering.
</p>projecteuclid.org/euclid.aos/1572487382_20191030220349Wed, 30 Oct 2019 22:03 EDTSorted concave penalized regressionhttps://projecteuclid.org/euclid.aos/1572487384<strong>Long Feng</strong>, <strong>Cun-Hui Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3069--3098.</p><p><strong>Abstract:</strong><br/>
The Lasso is biased. Concave penalized least squares estimation (PLSE) takes advantage of signal strength to reduce this bias, leading to sharper error bounds in prediction, coefficient estimation and variable selection. For prediction and estimation, the bias of the Lasso can be also reduced by taking a smaller penalty level than what selection consistency requires, but such smaller penalty level depends on the sparsity of the true coefficient vector. The sorted $\ell_{1}$ penalized estimation (Slope) was proposed for adaptation to such smaller penalty levels. However, the advantages of concave PLSE and Slope do not subsume each other. We propose sorted concave penalized estimation to combine the advantages of concave and sorted penalizations. We prove that sorted concave penalties adaptively choose the smaller penalty level and at the same time benefits from signal strength, especially when a significant proportion of signals are stronger than the corresponding adaptively selected penalty levels. A local convex approximation for sorted concave penalties, which extends the local linear and quadratic approximations for separable concave penalties, is developed to facilitate the computation of sorted concave PLSE and proven to possess desired prediction and estimation error bounds. Our analysis of prediction and estimation errors requires the restricted eigenvalue condition on the design, not beyond, and provides selection consistency under a required minimum signal strength condition in addition. Thus, our results also sharpens existing results on concave PLSE by removing the upper sparse eigenvalue component of the sparse Riesz condition.
</p>projecteuclid.org/euclid.aos/1572487384_20191030220349Wed, 30 Oct 2019 22:03 EDTActive ranking from pairwise comparisons and when parametric assumptions do not helphttps://projecteuclid.org/euclid.aos/1572487385<strong>Reinhard Heckel</strong>, <strong>Nihar B. Shah</strong>, <strong>Kannan Ramchandran</strong>, <strong>Martin J. Wainwright</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3099--3126.</p><p><strong>Abstract:</strong><br/>
We consider sequential or active ranking of a set of $n$ items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of prespecified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-$k$ items and the total ordering of the items. We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point. We prove that this algorithm succeeds in recovering the ranking using a number of comparisons that is optimal up to logarithmic factors. This guarantee does depend on whether or not the underlying pairwise probability matrix, satisfies a particular structural property, unlike a significant body of past work on pairwise ranking based on parametric models such as the Thurstone or Bradley–Terry–Luce models. It has been a long-standing open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, our second contribution is to resolve this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons.
</p>projecteuclid.org/euclid.aos/1572487385_20191030220349Wed, 30 Oct 2019 22:03 EDTRandomized incomplete $U$-statistics in high dimensionshttps://projecteuclid.org/euclid.aos/1572487388<strong>Xiaohui Chen</strong>, <strong>Kengo Kato</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3127--3156.</p><p><strong>Abstract:</strong><br/>
This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of big data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive nonasymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both nondegenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.
</p>projecteuclid.org/euclid.aos/1572487388_20191030220349Wed, 30 Oct 2019 22:03 EDTAdaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression modelshttps://projecteuclid.org/euclid.aos/1572487389<strong>Xin Bing</strong>, <strong>Marten H. Wegkamp</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3157--3184.</p><p><strong>Abstract:</strong><br/>
We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion differs notably from the one proposed in Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in that it does not require estimation of the unknown variance of the noise, nor does it depend on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. We perform an extensive simulation study that confirms our theoretical findings. The new method performs better and is more stable than the procedure of Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in both low- and high-dimensional settings.
</p>projecteuclid.org/euclid.aos/1572487389_20191030220349Wed, 30 Oct 2019 22:03 EDTStatistical inference for autoregressive models under heteroscedasticity of unknown formhttps://projecteuclid.org/euclid.aos/1572487390<strong>Ke Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3185--3215.</p><p><strong>Abstract:</strong><br/>
This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets.
</p>projecteuclid.org/euclid.aos/1572487390_20191030220349Wed, 30 Oct 2019 22:03 EDTOn partial-sum processes of ARMAX residualshttps://projecteuclid.org/euclid.aos/1572487391<strong>Steffen Grønneberg</strong>, <strong>Benjamin Holcblat</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3216--3243.</p><p><strong>Abstract:</strong><br/>
We establish general and versatile results regarding the limit behavior of the partial-sum process of ARMAX residuals. Illustrations include ARMA with seasonal dummies, misspecified ARMAX models with autocorrelated errors, nonlinear ARMAX models, ARMA with a structural break, a wide range of ARMAX models with infinite-variance errors, weak GARCH models and the consistency of kernel estimation of the density of ARMAX errors. Our results identify the limit distributions, and provide a general algorithm to obtain pivot statistics for CUSUM tests.
</p>projecteuclid.org/euclid.aos/1572487391_20191030220349Wed, 30 Oct 2019 22:03 EDTQuantile regression under memory constrainthttps://projecteuclid.org/euclid.aos/1572487392<strong>Xi Chen</strong>, <strong>Weidong Liu</strong>, <strong>Yichen Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3244--3273.</p><p><strong>Abstract:</strong><br/>
This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the naive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when $n=o(m^{2})$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.
</p>projecteuclid.org/euclid.aos/1572487392_20191030220349Wed, 30 Oct 2019 22:03 EDTSampling and estimation for (sparse) exchangeable graphshttps://projecteuclid.org/euclid.aos/1572487393<strong>Victor Veitch</strong>, <strong>Daniel M. Roy</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3274--3299.</p><p><strong>Abstract:</strong><br/>
Sparse exchangeable graphs on $\mathbb{R}_{+}$, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on $\mathbb{N}$, and the associated graphon framework for dense graphs. We develop the graphex framework as a tool for statistical network analysis by identifying the sampling scheme that is naturally associated with the models of the framework, formalizing two natural notions of consistent estimation of the parameter (the graphex) underlying these models, and identifying general consistent estimators in each case. The sampling scheme is a modification of independent vertex sampling that throws away vertices that are isolated in the sampled subgraph. The estimators are variants of the empirical graphon estimator, which is known to be a consistent estimator for the distribution of dense exchangeable graphs; both can be understood as graph analogues to the empirical distribution in the i.i.d. sequence setting. Our results may be viewed as a generalization of consistent estimation via the empirical graphon from the dense graph regime to also include sparse graphs.
</p>projecteuclid.org/euclid.aos/1572487393_20191030220349Wed, 30 Oct 2019 22:03 EDTHypothesis testing on linear structures of high-dimensional covariance matrixhttps://projecteuclid.org/euclid.aos/1572487394<strong>Shurong Zheng</strong>, <strong>Zhao Chen</strong>, <strong>Hengjian Cui</strong>, <strong>Runze Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3300--3334.</p><p><strong>Abstract:</strong><br/>
This paper is concerned with test of significance on high-dimensional covariance structures, and aims to develop a unified framework for testing commonly used linear covariance structures. We first construct a consistent estimator for parameters involved in the linear covariance structure, and then develop two tests for the linear covariance structures based on entropy loss and quadratic loss used for covariance matrix estimation. To study the asymptotic properties of the proposed tests, we study related high-dimensional random matrix theory, and establish several highly useful asymptotic results. With the aid of these asymptotic results, we derive the limiting distributions of these two tests under the null and alternative hypotheses. We further show that the quadratic loss based test is asymptotically unbiased. We conduct Monte Carlo simulation study to examine the finite sample performance of the two tests. Our simulation results show that the limiting null distributions approximate their null distributions quite well, and the corresponding asymptotic critical values keep Type I error rate very well. Our numerical comparison implies that the proposed tests outperform existing ones in terms of controlling Type I error rate and power. Our simulation indicates that the test based on quadratic loss seems to have better power than the test based on entropy loss.
</p>projecteuclid.org/euclid.aos/1572487394_20191030220349Wed, 30 Oct 2019 22:03 EDTOn optimal designs for nonregular modelshttps://projecteuclid.org/euclid.aos/1572487395<strong>Yi Lin</strong>, <strong>Ryan Martin</strong>, <strong>Min Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3335--3359.</p><p><strong>Abstract:</strong><br/>
Classically, Fisher information is the relevant object in defining optimal experimental designs. However, for models that lack certain regularity, the Fisher information does not exist, and hence, there is no notion of design optimality available in the literature. This article seeks to fill the gap by proposing a so-called Hellinger information , which generalizes Fisher information in the sense that the two measures agree in regular problems, but the former also exists for certain types of nonregular problems. We derive a Hellinger information inequality, showing that Hellinger information defines a lower bound on the local minimax risk of estimators. This provides a connection between features of the underlying model—in particular, the design—and the performance of estimators, motivating the use of this new Hellinger information for nonregular optimal design problems. Hellinger optimal designs are derived for several nonregular regression problems, with numerical results empirically demonstrating the efficiency of these designs compared to alternatives.
</p>projecteuclid.org/euclid.aos/1572487395_20191030220349Wed, 30 Oct 2019 22:03 EDTA smeary central limit theorem for manifolds with application to high-dimensional sphereshttps://projecteuclid.org/euclid.aos/1572487396<strong>Benjamin Eltzner</strong>, <strong>Stephan F. Huckemann</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3360--3381.</p><p><strong>Abstract:</strong><br/>
The (CLT) central limit theorems for generalized Fréchet means (data descriptors assuming values in manifolds, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Fréchet function converges suitably, as in the proof of the prototypical BP-CLT [ Ann. Statist. 33 (2005) 1225–1259]. This is not valid in many realistic scenarios and we provide for a new very general CLT. In particular, this includes scenarios where, in a suitable chart, the sample mean fluctuates asymptotically at a scale $n^{\alpha }$ with exponents $\alpha <1/2$ with a nonnormal distribution. As the BP-CLT yields only fluctuations that are, rescaled with $n^{1/2}$, asymptotically normal, just as the classical CLT for random vectors, these lower rates, somewhat loosely called smeariness, had to date been observed only on the circle. We make the concept of smeariness on manifolds precise, give an example for two-smeariness on spheres of arbitrary dimension, and show that smeariness, although “almost never” occurring, may have serious statistical implications on a continuum of sample scenarios nearby. In fact, this effect increases with dimension, striking in particular in high dimension low sample size scenarios.
</p>projecteuclid.org/euclid.aos/1572487396_20191030220349Wed, 30 Oct 2019 22:03 EDTOn testing for high-dimensional white noisehttps://projecteuclid.org/euclid.aos/1572487397<strong>Zeng Li</strong>, <strong>Clifford Lam</strong>, <strong>Jianfeng Yao</strong>, <strong>Qiwei Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3382--3412.</p><p><strong>Abstract:</strong><br/>
Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modeling and linear regression. For high-dimensional time series in the sense that the dimension $p$ is large in relation to the sample size $T$, the popular omnibus tests including the multivariate Hosking and Li–McLeod tests are extremely conservative, leading to substantial power loss. To develop more relevant tests for high-dimensional cases, we propose a portmanteau-type test statistic which is the sum of squared singular values of the first $q$ lagged sample autocovariance matrices. It, therefore, encapsulates all the serial correlations (up to the time lag $q$) within and across all component series. Using the tools from random matrix theory and assuming both $p$ and $T$ diverge to infinity, we derive the asymptotic normality of the test statistic under both the null and a specific VMA(1) alternative hypothesis. As the actual implementation of the test requires the knowledge of three characteristic constants of the population cross-sectional covariance matrix and the value of the fourth moment of the standardized innovations, nontrivial estimations are proposed for these parameters and their integration leads to a practically usable test. Extensive simulation confirms the excellent finite-sample performance of the new test with accurate size and satisfactory power for a large range of finite $(p,T)$ combinations, therefore, ensuring wide applicability in practice. In particular, the new tests are consistently superior to the traditional Hosking and Li–McLeod tests.
</p>projecteuclid.org/euclid.aos/1572487397_20191030220349Wed, 30 Oct 2019 22:03 EDTMinimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factorshttps://projecteuclid.org/euclid.aos/1572487398<strong>Kyoungjae Lee</strong>, <strong>Jaeyong Lee</strong>, <strong>Lizhen Lin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3413--3437.</p><p><strong>Abstract:</strong><br/>
In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [ Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods.
</p>projecteuclid.org/euclid.aos/1572487398_20191030220349Wed, 30 Oct 2019 22:03 EDTBootstrapping and sample splitting for high-dimensional, assumption-lean inferencehttps://projecteuclid.org/euclid.aos/1572487399<strong>Alessandro Rinaldo</strong>, <strong>Larry Wasserman</strong>, <strong>Max G’Sell</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3438--3469.</p><p><strong>Abstract:</strong><br/>
Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.
</p>projecteuclid.org/euclid.aos/1572487399_20191030220349Wed, 30 Oct 2019 22:03 EDTJoint convergence of sample autocovariance matrices when $p/n\to 0$ with applicationhttps://projecteuclid.org/euclid.aos/1572487400<strong>Monika Bhattacharjee</strong>, <strong>Arup Bose</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3470--3503.</p><p><strong>Abstract:</strong><br/>
Consider a high-dimensional linear time series model where the dimension $p$ and the sample size $n$ grow in such a way that $p/n\to 0$. Let $\hat{\Gamma }_{u}$ be the $u$th order sample autocovariance matrix. We first show that the LSD of any symmetric polynomial in $\{\hat{\Gamma }_{u},\hat{\Gamma }_{u}^{*},u\geq 0\}$ exists under independence and moment assumptions on the driving sequence together with weak assumptions on the coefficient matrices. This LSD result, with some additional effort, implies the asymptotic normality of the trace of any polynomial in $\{\hat{\Gamma }_{u},\hat{\Gamma }_{u}^{*},u\geq 0\}$. We also study similar results for several independent MA processes.
We show applications of the above results to statistical inference problems such as in estimation of the unknown order of a high-dimensional MA process and in graphical and significance tests for hypotheses on coefficient matrices of one or several such independent processes.
</p>projecteuclid.org/euclid.aos/1572487400_20191030220349Wed, 30 Oct 2019 22:03 EDTTracy–Widom limit for Kendall’s tauhttps://projecteuclid.org/euclid.aos/1572487401<strong>Zhigang Bao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3504--3532.</p><p><strong>Abstract:</strong><br/>
In this paper, we study a high-dimensional random matrix model from nonparametric statistics called the Kendall rank correlation matrix, which is a natural multivariate extension of the Kendall rank correlation coefficient. We establish the Tracy–Widom law for its largest eigenvalue. It is the first Tracy–Widom law for a nonparametric random matrix model, and also the first Tracy–Widom law for a high-dimensional U-statistic.
</p>projecteuclid.org/euclid.aos/1572487401_20191030220349Wed, 30 Oct 2019 22:03 EDTIntrinsic Riemannian functional data analysishttps://projecteuclid.org/euclid.aos/1572487402<strong>Zhenhua Lin</strong>, <strong>Fang Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3533--3577.</p><p><strong>Abstract:</strong><br/>
In this work we develop a novel and foundational framework for analyzing general Riemannian functional data, in particular a new development of tensor Hilbert spaces along curves on a manifold. Such spaces enable us to derive Karhunen–Loève expansion for Riemannian random processes. This framework also features an approach to compare objects from different tensor Hilbert spaces, which paves the way for asymptotic analysis in Riemannian functional data analysis. Built upon intrinsic geometric concepts such as vector field, Levi-Civita connection and parallel transport on Riemannian manifolds, the developed framework applies to not only Euclidean submanifolds but also manifolds without a natural ambient space. As applications of this framework, we develop intrinsic Riemannian functional principal component analysis (iRFPCA) and intrinsic Riemannian functional linear regression (iRFLR) that are distinct from their traditional and ambient counterparts. We also provide estimation procedures for iRFPCA and iRFLR, and investigate their asymptotic properties within the intrinsic geometry. Numerical performance is illustrated by simulated and real examples.
</p>projecteuclid.org/euclid.aos/1572487402_20191030220349Wed, 30 Oct 2019 22:03 EDTDetecting relevant changes in the mean of nonstationary processes—A mass excess approachhttps://projecteuclid.org/euclid.aos/1572487403<strong>Holger Dette</strong>, <strong>Weichi Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 6, 3578--3608.</p><p><strong>Abstract:</strong><br/>
This paper considers the problem of testing if a sequence of means $(\mu_{t})_{t=1,\ldots ,n}$ of a nonstationary time series $(X_{t})_{t=1,\ldots ,n}$ is stable in the sense that the difference of the means $\mu_{1}$ and $\mu_{t}$ between the initial time $t=1$ and any other time is smaller than a given threshold, that is $|\mu_{1}-\mu_{t}|\leq c$ for all $t=1,\ldots ,n$. A test for hypotheses of this type is developed using a bias corrected monotone rearranged local linear estimator and asymptotic normality of the corresponding test statistic is established. As the asymptotic variance depends on the location of the roots of the equation $|\mu_{1}-\mu_{t}|=c$ a new bootstrap procedure is proposed to obtain critical values and its consistency is established. As a consequence we are able to quantitatively describe relevant deviations of a nonstationary sequence from its initial value. The results are illustrated by means of a simulation study and by analyzing data examples.
</p>projecteuclid.org/euclid.aos/1572487403_20191030220349Wed, 30 Oct 2019 22:03 EDTTwo-step semiparametric empirical likelihood inferencehttps://projecteuclid.org/euclid.aos/1581930123<strong>Francesco Bravo</strong>, <strong>Juan Carlos Escanciano</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 1--26.</p><p><strong>Abstract:</strong><br/>
In both parametric and certain nonparametric statistical models, the empirical likelihood ratio satisfies a nonparametric version of Wilks’ theorem. For many semiparametric models, however, the commonly used two-step (plug-in) empirical likelihood ratio is not asymptotically distribution-free, that is, its asymptotic distribution contains unknown quantities, and hence Wilks’ theorem breaks down. This article suggests a general approach to restore Wilks’ phenomenon in two-step semiparametric empirical likelihood inferences. The main insight consists in using as the moment function in the estimating equation the influence function of the plug-in sample moment. The proposed method is general; it leads to a chi-squared limiting distribution with known degrees of freedom; it is efficient; it does not require undersmoothing; and it is less sensitive to the first-step than alternative methods, which is particularly appealing for high-dimensional settings. Several examples and simulation studies illustrate the general applicability of the procedure and its excellent finite sample performance relative to competing methods.
</p>projecteuclid.org/euclid.aos/1581930123_20200217040231Mon, 17 Feb 2020 04:02 ESTThe phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regressionhttps://projecteuclid.org/euclid.aos/1581930124<strong>Emmanuel J. Candès</strong>, <strong>Pragya Sur</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 27--42.</p><p><strong>Abstract:</strong><br/>
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp “phase transition.” We introduce an explicit boundary curve $h_{\mathrm{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n\rightarrow \kappa $, we show that if the problem is sufficiently high dimensional in the sense that $\kappa >h_{\mathrm{MLE}}$, then the MLE does not exist with probability one. Conversely, if $\kappa <h_{\mathrm{MLE}}$, the MLE asymptotically exists with probability one.
</p>projecteuclid.org/euclid.aos/1581930124_20200217040231Mon, 17 Feb 2020 04:02 ESTRerandomization in $2^{K}$ factorial experimentshttps://projecteuclid.org/euclid.aos/1581930125<strong>Xinran Li</strong>, <strong>Peng Ding</strong>, <strong>Donald B. Rubin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 43--63.</p><p><strong>Abstract:</strong><br/>
With many pretreatment covariates and treatment factors, the classical factorial experiment often fails to balance covariates across multiple factorial effects simultaneously. Therefore, it is intuitive to restrict the randomization of the treatment factors to satisfy certain covariate balance criteria, possibly conforming to the tiers of factorial effects and covariates based on their relative importances. This is rerandomization in factorial experiments. We study the asymptotic properties of this experimental design under the randomization inference framework without imposing any distributional or modeling assumptions of the covariates and outcomes. We derive the joint asymptotic sampling distribution of the usual estimators of the factorial effects, and show that it is symmetric, unimodal and more “concentrated” at the true factorial effects under rerandomization than under the classical factorial experiment. We quantify this advantage of rerandomization using the notions of “central convex unimodality” and “peakedness” of the joint asymptotic sampling distribution. We also construct conservative large-sample confidence sets for the factorial effects.
</p>projecteuclid.org/euclid.aos/1581930125_20200217040231Mon, 17 Feb 2020 04:02 ESTSparse SIR: Optimal rates and adaptive estimationhttps://projecteuclid.org/euclid.aos/1581930126<strong>Kai Tan</strong>, <strong>Lei Shi</strong>, <strong>Zhou Yu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 64--85.</p><p><strong>Abstract:</strong><br/>
Sliced inverse regression (SIR) is an innovative and effective method for sufficient dimension reduction and data visualization. Recently, an impressive range of penalized SIR methods has been proposed to estimate the central subspace in a sparse fashion. Nonetheless, few of them considered the sparse sufficient dimension reduction from a decision-theoretic point of view. To address this issue, we in this paper establish the minimax rates of convergence for estimating the sparse SIR directions under various commonly used loss functions in the literature of sufficient dimension reduction. We also discover the possible trade-off between statistical guarantee and computational performance for sparse SIR. We finally propose an adaptive estimation scheme for sparse SIR which is computationally tractable and rate optimal. Numerical studies are carried out to confirm the theoretical properties of our proposed methods.
</p>projecteuclid.org/euclid.aos/1581930126_20200217040231Mon, 17 Feb 2020 04:02 ESTRobust sparse covariance estimation by thresholding Tyler’s M-estimatorhttps://projecteuclid.org/euclid.aos/1581930127<strong>John Goes</strong>, <strong>Gilad Lerman</strong>, <strong>Boaz Nadler</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 86--110.</p><p><strong>Abstract:</strong><br/>
Estimating a high-dimensional sparse covariance matrix from a limited number of samples is a fundamental task in contemporary data analysis. Most proposals to date, however, are not robust to outliers or heavy tails. Toward bridging this gap, in this work we consider estimating a sparse shape matrix from $n$ samples following a possibly heavy-tailed elliptical distribution. We propose estimators based on thresholding either Tyler’s M-estimator or its regularized variant. We prove that in the joint limit as the dimension $p$ and the sample size $n$ tend to infinity with $p/n\to\gamma>0$, our estimators are minimax rate optimal. Results on simulated data support our theoretical analysis.
</p>projecteuclid.org/euclid.aos/1581930127_20200217040231Mon, 17 Feb 2020 04:02 ESTModel assisted variable clustering: Minimax-optimal recovery and algorithmshttps://projecteuclid.org/euclid.aos/1581930128<strong>Florentina Bunea</strong>, <strong>Christophe Giraud</strong>, <strong>Xi Luo</strong>, <strong>Martin Royer</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 111--137.</p><p><strong>Abstract:</strong><br/>
The problem of variable clustering is that of estimating groups of similar components of a $p$-dimensional vector $X=(X_{1},\ldots ,X_{p})$ from $n$ independent copies of $X$. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of $G$-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a $G$-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to $G$-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular $K$-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.
</p>projecteuclid.org/euclid.aos/1581930128_20200217040231Mon, 17 Feb 2020 04:02 ESTNew $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inferencehttps://projecteuclid.org/euclid.aos/1581930129<strong>Xiaoqin Wang</strong>, <strong>Li Yin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 138--160.</p><p><strong>Abstract:</strong><br/>
In sequential causal inference, two types of causal effects are of practical interest, namely, the causal effect of the treatment regime (called the sequential causal effect) and the blip effect of treatment on the potential outcome after the last treatment. The well-known $G$-formula expresses these causal effects in terms of the standard parameters. In this article, we obtain a new $G$-formula that expresses these causal effects in terms of the point observable effects of treatments similar to treatment in the framework of single-point causal inference. Based on the new $G$-formula, we estimate these causal effects by maximum likelihood via point observable effects with methods extended from single-point causal inference. We are able to increase precision of the estimation without introducing biases by an unsaturated model imposing constraints on the point observable effects. We are also able to reduce the number of point observable effects in the estimation by treatment assignment conditions.
</p>projecteuclid.org/euclid.aos/1581930129_20200217040231Mon, 17 Feb 2020 04:02 ESTEnvelope-based sparse partial least squareshttps://projecteuclid.org/euclid.aos/1581930130<strong>Guangyu Zhu</strong>, <strong>Zhihua Su</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 161--182.</p><p><strong>Abstract:</strong><br/>
Sparse partial least squares (SPLS) is widely used in applied sciences as a method that performs dimension reduction and variable selection simultaneously in linear regression. Several implementations of SPLS have been derived, among which the SPLS proposed in Chun and Keleş ( J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 (2010) 3–25) is very popular and highly cited. However, for all of these implementations, the theoretical properties of SPLS are largely unknown. In this paper, we propose a new version of SPLS, called the envelope-based SPLS, using a connection between envelope models and partial least squares (PLS). We establish the consistency, oracle property and asymptotic normality of the envelope-based SPLS estimator. The large-sample scenario and high-dimensional scenario are both considered. We also develop the envelope-based SPLS estimators under the context of generalized linear models, and discuss its theoretical properties including consistency, oracle property and asymptotic distribution. Numerical experiments and examples show that the envelope-based SPLS estimator has better variable selection and prediction performance over the SPLS estimator ( J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 (2010) 3–25).
</p>projecteuclid.org/euclid.aos/1581930130_20200217040231Mon, 17 Feb 2020 04:02 ESTOptimal rates for community estimation in the weighted stochastic block modelhttps://projecteuclid.org/euclid.aos/1581930131<strong>Min Xu</strong>, <strong>Varun Jog</strong>, <strong>Po-Ling Loh</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 183--204.</p><p><strong>Abstract:</strong><br/>
Community identification in a network is an important problem in fields such as social science, neuroscience and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; in various scientific applications, disregarding the edge weights may result in a loss of valuable information. We study a weighted generalization of the SBM, in which observations are collected in the form of a weighted adjacency matrix and the weight of each edge is generated independently from an unknown probability density determined by the community membership of its endpoints. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Renyi divergence of order 1/2 between the weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that achieves the optimal error rate. Our method is adaptive in the sense that the algorithm, without assuming knowledge of the weight densities, performs as well as the best algorithm that knows the weight densities.
</p>projecteuclid.org/euclid.aos/1581930131_20200217040231Mon, 17 Feb 2020 04:02 ESTAdaptive risk bounds in univariate total variation denoising and trend filteringhttps://projecteuclid.org/euclid.aos/1581930132<strong>Adityanand Guntuboyina</strong>, <strong>Donovan Lieu</strong>, <strong>Sabyasachi Chatterjee</strong>, <strong>Bodhisattva Sen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 205--229.</p><p><strong>Abstract:</strong><br/>
We study trend filtering, a relatively recent method for univariate nonparametric regression. For a given integer $r\geq1$, the $r$th order trend filtering estimator is defined as the minimizer of the sum of squared errors when we constrain (or penalize) the sum of the absolute $r$th order discrete derivatives of the fitted function at the design points. For $r=1$, the estimator reduces to total variation regularization which has received much attention in the statistics and image processing literature. In this paper, we study the performance of the trend filtering estimator for every $r\geq1$, both in the constrained and penalized forms. Our main results show that in the strong sparsity setting when the underlying function is a (discrete) spline with few “knots,” the risk (under the global squared error loss) of the trend filtering estimator (with an appropriate choice of the tuning parameter) achieves the parametric $n^{-1}$-rate, up to a logarithmic (multiplicative) factor. Our results therefore provide support for the use of trend filtering, for every $r\geq1$, in the strong sparsity setting.
</p>projecteuclid.org/euclid.aos/1581930132_20200217040231Mon, 17 Feb 2020 04:02 ESTSpectral and matrix factorization methods for consistent community detection in multi-layer networkshttps://projecteuclid.org/euclid.aos/1581930133<strong>Subhadeep Paul</strong>, <strong>Yuguo Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 230--250.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.
</p>projecteuclid.org/euclid.aos/1581930133_20200217040231Mon, 17 Feb 2020 04:02 ESTStatistical inference for model parameters in stochastic gradient descenthttps://projecteuclid.org/euclid.aos/1581930134<strong>Xi Chen</strong>, <strong>Jason D. Lee</strong>, <strong>Xin T. Tong</strong>, <strong>Yichen Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 251--273.</p><p><strong>Abstract:</strong><br/>
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions.
Our main contributions are twofold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests.
Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.
</p>projecteuclid.org/euclid.aos/1581930134_20200217040231Mon, 17 Feb 2020 04:02 ESTBootstrap confidence regions based on M-estimators under nonstandard conditionshttps://projecteuclid.org/euclid.aos/1581930135<strong>Stephen M. S. Lee</strong>, <strong>Puyudi Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 274--299.</p><p><strong>Abstract:</strong><br/>
Suppose that a confidence region is desired for a subvector $\theta $ of a multidimensional parameter $\xi =(\theta ,\psi )$, based on an M-estimator $\hat{\xi }_{n}=(\hat{\theta }_{n},\hat{\psi }_{n})$ calculated from a random sample of size $n$. Under nonstandard conditions $\hat{\xi }_{n}$ often converges at a nonregular rate $r_{n}$, in which case consistent estimation of the distribution of $r_{n}(\hat{\theta }_{n}-\theta )$, a pivot commonly chosen for confidence region construction, is most conveniently effected by the $m$ out of $n$ bootstrap. The above choice of pivot has three drawbacks: (i) the shape of the region is either subjectively prescribed or controlled by a computationally intensive depth function; (ii) the region is not transformation equivariant; (iii) $\hat{\xi }_{n}$ may not be uniquely defined. To resolve the above difficulties, we propose a one-dimensional pivot derived from the criterion function, and prove that its distribution can be consistently estimated by the $m$ out of $n$ bootstrap, or by a modified version of the perturbation bootstrap. This leads to a new method for constructing confidence regions which are transformation equivariant and have shapes driven solely by the criterion function. A subsampling procedure is proposed for selecting $m$ in practice. Empirical performance of the new method is illustrated with examples drawn from different nonstandard M-estimation settings. Extension of our theory to row-wise independent triangular arrays is also explored.
</p>projecteuclid.org/euclid.aos/1581930135_20200217040231Mon, 17 Feb 2020 04:02 ESTSparse high-dimensional regression: Exact scalable algorithms and phase transitionshttps://projecteuclid.org/euclid.aos/1581930136<strong>Dimitris Bertsimas</strong>, <strong>Bart Van Parys</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 300--323.</p><p><strong>Abstract:</strong><br/>
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.
</p>projecteuclid.org/euclid.aos/1581930136_20200217040231Mon, 17 Feb 2020 04:02 ESTTesting for principal component directions under weak identifiabilityhttps://projecteuclid.org/euclid.aos/1581930137<strong>Davy Paindaveine</strong>, <strong>Julien Remy</strong>, <strong>Thomas Verdebout</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 324--345.</p><p><strong>Abstract:</strong><br/>
We consider the problem of testing, on the basis of a $p$-variate Gaussian random sample, the null hypothesis $\mathcal{H}_{0}:\boldsymbol{\theta}_{1}=\boldsymbol{\theta}_{1}^{0}$ against the alternative $\mathcal{H}_{1}:\boldsymbol{\theta}_{1}\neq \boldsymbol{\theta}_{1}^{0}$, where $\boldsymbol{\theta}_{1}$ is the “first” eigenvector of the underlying covariance matrix and $\boldsymbol{\theta}_{1}^{0}$ is a fixed unit $p$-vector. In the classical setup where eigenvalues $\lambda_{1}>\lambda_{2}\geq \cdots \geq \lambda_{p}$ are fixed, the Anderson ( Ann. Math. Stat. 34 (1963) 122–148) likelihood ratio test (LRT) and the Hallin, Paindaveine and Verdebout ( Ann. Statist. 38 (2010) 3245–3299) Le Cam optimal test for this problem are asymptotically equivalent under the null hypothesis, hence also under sequences of contiguous alternatives. We show that this equivalence does not survive asymptotic scenarios where $\lambda_{n1}/\lambda_{n2}=1+O(r_{n})$ with $r_{n}=O(1/\sqrt{n})$. For such scenarios, the Le Cam optimal test still asymptotically meets the nominal level constraint, whereas the LRT severely overrejects the null hypothesis. Consequently, the former test should be favored over the latter one whenever the two largest sample eigenvalues are close to each other. By relying on the Le Cam’s asymptotic theory of statistical experiments, we study the non-null and optimality properties of the Le Cam optimal test in the aforementioned asymptotic scenarios and show that the null robustness of this test is not obtained at the expense of power. Our asymptotic investigation is extensive in the sense that it allows $r_{n}$ to converge to zero at an arbitrary rate. While we restrict to single-spiked spectra of the form $\lambda_{n1}>\lambda_{n2}=\cdots =\lambda_{np}$ to make our results as striking as possible, we extend our results to the more general elliptical case. Finally, we present an illustrative real data example.
</p>projecteuclid.org/euclid.aos/1581930137_20200217040231Mon, 17 Feb 2020 04:02 ESTThe multi-armed bandit problem: An efficient nonparametric solutionhttps://projecteuclid.org/euclid.aos/1581930138<strong>Hock Peng Chan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 346--373.</p><p><strong>Abstract:</strong><br/>
Lai and Robbins ( Adv. in Appl. Math. 6 (1985) 4–22) and Lai ( Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like $\epsilon $-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.
</p>projecteuclid.org/euclid.aos/1581930138_20200217040231Mon, 17 Feb 2020 04:02 ESTConcentration and consistency results for canonical and curved exponential-family models of random graphshttps://projecteuclid.org/euclid.aos/1581930139<strong>Michael Schweinberger</strong>, <strong>Jonathan Stewart</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 374--396.</p><p><strong>Abstract:</strong><br/>
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.
</p>projecteuclid.org/euclid.aos/1581930139_20200217040231Mon, 17 Feb 2020 04:02 ESTThe numerical bootstraphttps://projecteuclid.org/euclid.aos/1581930140<strong>Han Hong</strong>, <strong>Jessie Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 397--412.</p><p><strong>Abstract:</strong><br/>
This paper proposes a numerical bootstrap method that is consistent in many cases where the standard bootstrap is known to fail and where the $m$-out-of-$n$ bootstrap and subsampling have been the most commonly used inference approaches. We provide asymptotic analysis under both fixed and drifting parameter sequences, and we compare the approximation error of the numerical bootstrap with that of the $m$-out-of-$n$ bootstrap and subsampling. Finally, we discuss applications of the numerical bootstrap, such as constrained and unconstrained M-estimators converging at both regular and nonstandard rates, Laplace-type estimators, and test statistics for partially identified models.
</p>projecteuclid.org/euclid.aos/1581930140_20200217040231Mon, 17 Feb 2020 04:02 ESTConsistent selection of the number of change-points via sample-splittinghttps://projecteuclid.org/euclid.aos/1581930141<strong>Changliang Zou</strong>, <strong>Guanghui Wang</strong>, <strong>Runze Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 413--439.</p><p><strong>Abstract:</strong><br/>
In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.
</p>projecteuclid.org/euclid.aos/1581930141_20200217040231Mon, 17 Feb 2020 04:02 ESTUniformly valid confidence intervals post-model-selectionhttps://projecteuclid.org/euclid.aos/1581930142<strong>François Bachoc</strong>, <strong>David Preinerstorfer</strong>, <strong>Lukas Steinberger</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 440--463.</p><p><strong>Abstract:</strong><br/>
We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. ( Ann. Statist. 41 (2013) 802–837). In particular, the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After developing a general theory, we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures.
</p>projecteuclid.org/euclid.aos/1581930142_20200217040231Mon, 17 Feb 2020 04:02 ESTEfficient estimation of linear functionals of principal componentshttps://projecteuclid.org/euclid.aos/1581930143<strong>Vladimir Koltchinskii</strong>, <strong>Matthias Löffler</strong>, <strong>Richard Nickl</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 464--490.</p><p><strong>Abstract:</strong><br/>
We study principal component analysis (PCA) for mean zero i.i.d. Gaussian observations $X_{1},\dots,X_{n}$ in a separable Hilbert space $\mathbb{H}$ with unknown covariance operator $\Sigma $. The complexity of the problem is characterized by its effective rank $\mathbf{r}(\Sigma):=\frac{\operatorname{tr}(\Sigma)}{\|\Sigma \|}$, where $\mathrm{tr}(\Sigma)$ denotes the trace of $\Sigma $ and $\|\Sigma\|$ denotes its operator norm. We develop a method of bias reduction in the problem of estimation of linear functionals of eigenvectors of $\Sigma $. Under the assumption that $\mathbf{r}(\Sigma)=o(n)$, we establish the asymptotic normality and asymptotic properties of the risk of the resulting estimators and prove matching minimax lower bounds, showing their semiparametric optimality.
</p>projecteuclid.org/euclid.aos/1581930143_20200217040231Mon, 17 Feb 2020 04:02 ESTOptimal prediction in the linearly transformed spiked modelhttps://projecteuclid.org/euclid.aos/1581930144<strong>Edgar Dobriban</strong>, <strong>William Leeb</strong>, <strong>Amit Singer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 491--513.</p><p><strong>Abstract:</strong><br/>
We consider the linearly transformed spiked model , where the observations $Y_{i}$ are noisy linear transforms of unobserved signals of interest $X_{i}$: \begin{equation*}Y_{i}=A_{i}X_{i}+\varepsilon_{i},\end{equation*} for $i=1,\ldots ,n$. The transform matrices $A_{i}$ are also observed. We model the unobserved signals (or regression coefficients) $X_{i}$ as vectors lying on an unknown low-dimensional space. Given only $Y_{i}$ and $A_{i}$ how should we predict or recover their values?
The naive approach of performing regression for each observation separately is inaccurate due to the large noise level. Instead, we develop optimal methods for predicting $X_{i}$ by “borrowing strength” across the different samples. Our linear empirical Bayes methods scale to large datasets and rely on weak moment assumptions.
We show that this model has wide-ranging applications in signal processing, deconvolution, cryo-electron microscopy, and missing data with noise. For missing data, we show in simulations that our methods are more robust to noise and to unequal sampling than well-known matrix completion methods.
</p>projecteuclid.org/euclid.aos/1581930144_20200217040231Mon, 17 Feb 2020 04:02 ESTAverages of unlabeled networks: Geometric characterization and asymptotic behaviorhttps://projecteuclid.org/euclid.aos/1581930145<strong>Eric D. Kolaczyk</strong>, <strong>Lizhen Lin</strong>, <strong>Steven Rosenberg</strong>, <strong>Jackson Walters</strong>, <strong>Jie Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 514--538.</p><p><strong>Abstract:</strong><br/>
It is becoming increasingly common to see large collections of network data objects, that is, data sets in which a network is viewed as a fundamental unit of observation. As a result, there is a pressing need to develop network-based analogues of even many of the most basic tools already standard for scalar and vector data. In this paper, our focus is on averages of unlabeled, undirected networks with edge weights. Specifically, we (i) characterize a certain notion of the space of all such networks, (ii) describe key topological and geometric properties of this space relevant to doing probability and statistics thereupon, and (iii) use these properties to establish the asymptotic behavior of a generalized notion of an empirical mean under sampling from a distribution supported on this space. Our results rely on a combination of tools from geometry, probability theory and statistical shape analysis. In particular, the lack of vertex labeling necessitates working with a quotient space modding out permutations of labels. This results in a nontrivial geometry for the space of unlabeled networks, which in turn is found to have important implications on the types of probabilistic and statistical results that may be obtained and the techniques needed to obtain them.
</p>projecteuclid.org/euclid.aos/1581930145_20200217040231Mon, 17 Feb 2020 04:02 ESTMarkov equivalence of marginalized local independence graphshttps://projecteuclid.org/euclid.aos/1581930146<strong>Søren Wengel Mogensen</strong>, <strong>Niels Richard Hansen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 539--559.</p><p><strong>Abstract:</strong><br/>
Symmetric independence relations are often studied using graphical representations. Ancestral graphs or acyclic directed mixed graphs with $m$-separation provide classes of symmetric graphical independence models that are closed under marginalization. Asymmetric independence relations appear naturally for multivariate stochastic processes, for instance, in terms of local independence. However, no class of graphs representing such asymmetric independence relations, which is also closed under marginalization, has been developed. We develop the theory of directed mixed graphs with $\mu $-separation and show that this provides a graphical independence model class which is closed under marginalization and which generalizes previously considered graphical representations of local independence.
Several graphs may encode the same set of independence relations and this means that in many cases only an equivalence class of graphs can be identified from observational data. For statistical applications, it is therefore pivotal to characterize graphs that induce the same independence relations. Our main result is that for directed mixed graphs with $\mu $-separation each equivalence class contains a maximal element which can be constructed from the independence relations alone. Moreover, we introduce the directed mixed equivalence graph as the maximal graph with dashed and solid edges. This graph encodes all information about the edges that is identifiable from the independence relations, and furthermore it can be computed efficiently from the maximal graph.
</p>projecteuclid.org/euclid.aos/1581930146_20200217040231Mon, 17 Feb 2020 04:02 ESTAsymptotic genealogies of interacting particle systems with an application to sequential Monte Carlohttps://projecteuclid.org/euclid.aos/1581930147<strong>Jere Koskela</strong>, <strong>Paul A. Jenkins</strong>, <strong>Adam M. Johansen</strong>, <strong>Dario Spanò</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 560--583.</p><p><strong>Abstract:</strong><br/>
We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman $n$-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable $n$-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary.
</p>projecteuclid.org/euclid.aos/1581930147_20200217040231Mon, 17 Feb 2020 04:02 ESTAlmost sure uniqueness of a global minimum without convexityhttps://projecteuclid.org/euclid.aos/1581930148<strong>Gregory Cox</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 584--606.</p><p><strong>Abstract:</strong><br/>
This paper establishes the argmin of a random objective function to be unique almost surely. This paper first formulates a general result that proves almost sure uniqueness without convexity of the objective function. The general result is then applied to a variety of applications in statistics. Four applications are discussed, including uniqueness of M-estimators, both classical likelihood and penalized likelihood estimators, and two applications of the argmin theorem, threshold regression and weak identification.
</p>projecteuclid.org/euclid.aos/1581930148_20200217040231Mon, 17 Feb 2020 04:02 ESTPenalized generalized empirical likelihood with a diverging number of general estimating equations for censored datahttps://projecteuclid.org/euclid.aos/1581930149<strong>Niansheng Tang</strong>, <strong>Xiaodong Yan</strong>, <strong>Xingqiu Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 48, Number 1, 607--627.</p><p><strong>Abstract:</strong><br/>
This article considers simultaneous variable selection and parameter estimation as well as hypothesis testing in censored survival models where a parametric likelihood is not available. For the problem, we utilize certain growing dimensional general estimating equations and propose a penalized generalized empirical likelihood, where the general estimating equations are constructed based on the semiparametric efficiency bound of estimation with given moment conditions. The proposed penalized generalized empirical likelihood estimators enjoy the oracle properties, and the estimator of any fixed dimensional vector of nonzero parameters achieves the semiparametric efficiency bound asymptotically. Furthermore, we show that the penalized generalized empirical likelihood ratio test statistic has an asymptotic central chi-square distribution. The conditions of local and restricted global optimality of weighted penalized generalized empirical likelihood estimators are also discussed. We present a two-layer iterative algorithm for efficient implementation, and investigate its convergence property. The performance of the proposed methods is demonstrated by extensive simulation studies, and a real data example is provided for illustration.
</p>projecteuclid.org/euclid.aos/1581930149_20200217040231Mon, 17 Feb 2020 04:02 ESTOn estimation of isotonic piecewise constant signalshttps://projecteuclid.org/euclid.aos/1590480028<strong>Chao Gao</strong>, <strong>Fang Han</strong>, <strong>Cun-Hui Zhang</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 629--654.</p><p><strong>Abstract:</strong><br/>
Consider a sequence of real data points $X_{1},\ldots ,X_{n}$ with underlying means $\theta ^{*}_{1},\dots ,\theta ^{*}_{n}$. This paper starts from studying the setting that $\theta ^{*}_{i}$ is both piecewise constant and monotone as a function of the index $i$. For this, we establish the exact minimax rate of estimating such monotone functions, and thus give a nontrivial answer to an open problem in the shape-constrained analysis literature. The minimax rate under the loss of the sum of squared errors involves an interesting iterated logarithmic dependence on the dimension, a phenomenon that is revealed through characterizing the interplay between the isotonic shape constraint and model selection complexity. We then develop a penalized least-squares procedure for estimating the vector $\theta ^{*}=(\theta^{*}_{1},\dots ,\theta ^{*}_{n})^{\mathsf{T}}$. This estimator is shown to achieve the derived minimax rate adaptively. For the proposed estimator, we further allow the model to be misspecified and derive oracle inequalities with the optimal rates, and show there exists a computationally efficient algorithm to compute the exact solution.
</p>projecteuclid.org/euclid.aos/1590480028_20200526040045Tue, 26 May 2020 04:00 EDTMultidimensional multiscale scanning in exponential families: Limit theory and statistical consequenceshttps://projecteuclid.org/euclid.aos/1590480029<strong>Claudia König</strong>, <strong>Axel Munk</strong>, <strong>Frank Werner</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 655--678.</p><p><strong>Abstract:</strong><br/>
We consider the problem of finding anomalies in a $d$-dimensional field of independent random variables $\{Y_{i}\}_{i\in \{1,\ldots,n\}^{d}}$, each distributed according to a one-dimensional natural exponential family $\mathcal{F}=\{F_{\theta }\}_{\theta \in \Theta }$. Given some baseline parameter $\theta _{0}\in \Theta $, the field is scanned using local likelihood ratio tests to detect from a (large) given system of regions $\mathcal{R}$ those regions $R\subset \{1,\ldots,n\}^{d}$ with $\theta _{i}\neq \theta _{0}$ for some $i\in R$. We provide a unified methodology which controls the overall familywise error (FWER) to make a wrong detection at a given error rate.
Fundamental to our method is a Gaussian approximation of the distribution of the underlying multiscale test statistic with explicit rate of convergence. From this, we obtain a weak limit theorem which can be seen as a generalized weak invariance principle to nonidentically distributed data and is of independent interest. Furthermore, we give an asymptotic expansion of the procedures power, which yields minimax optimality in case of Gaussian observations.
</p>projecteuclid.org/euclid.aos/1590480029_20200526040045Tue, 26 May 2020 04:00 EDTDesigns for estimating the treatment effect in networks with interferencehttps://projecteuclid.org/euclid.aos/1590480030<strong>Ravi Jagadeesan</strong>, <strong>Natesh S. Pillai</strong>, <strong>Alexander Volfovsky</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 679--712.</p><p><strong>Abstract:</strong><br/>
In this paper, we introduce new, easily implementable designs for drawing causal inference from randomized experiments on networks with interference. Inspired by the idea of matching in observational studies, we introduce the notion of considering a treatment assignment as a “quasi-coloring” on a graph. Our idea of a perfect quasi-coloring strives to match every treated unit on a given network with a distinct control unit that has identical number of treated and control neighbors. For a wide range of interference functions encountered in applications, we show both by theory and simulations that the classical Neymanian estimator for the direct effect has desirable properties for our designs.
</p>projecteuclid.org/euclid.aos/1590480030_20200526040045Tue, 26 May 2020 04:00 EDTLearning a tree-structured Ising model in order to make predictionshttps://projecteuclid.org/euclid.aos/1590480031<strong>Guy Bresler</strong>, <strong>Mina Karzand</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 713--737.</p><p><strong>Abstract:</strong><br/>
We study the problem of learning a tree Ising model from samples such that subsequent predictions made using the model are accurate. The prediction task considered in this paper is that of predicting the values of a subset of variables given values of some other subset of variables. Virtually all previous work on graphical model learning has focused on recovering the true underlying graph. We define a distance (“small set TV” or ssTV) between distributions $P$ and $Q$ by taking the maximum, over all subsets $\mathcal{S}$ of a given size, of the total variation between the marginals of $P$ and $Q$ on $\mathcal{S}$; this distance captures the accuracy of the prediction task of interest. We derive nonasymptotic bounds on the number of samples needed to get a distribution (from the same class) with small ssTV relative to the one generating the samples. One of the main messages of this paper is that far fewer samples are needed than for recovering the underlying tree, which means that accurate predictions are possible using the wrong tree.
</p>projecteuclid.org/euclid.aos/1590480031_20200526040045Tue, 26 May 2020 04:00 EDTOn the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoisinghttps://projecteuclid.org/euclid.aos/1590480032<strong>Sujayam Saha</strong>, <strong>Adityanand Guntuboyina</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 738--762.</p><p><strong>Abstract:</strong><br/>
We study the nonparametric maximum likelihood estimator (NPMLE) for estimating Gaussian location mixture densities in $d$-dimensions from independent observations. Unlike usual likelihood-based methods for fitting mixtures, NPMLEs are based on convex optimization. We prove finite sample results on the Hellinger accuracy of every NPMLE. Our results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components. NPMLEs can naturally be used to yield empirical Bayes estimates of the oracle Bayes estimator in the Gaussian denoising problem. We prove bounds for the accuracy of the empirical Bayes estimate as an approximation to the oracle Bayes estimator. Here our results imply that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising in clustering situations without any prior knowledge of the number of clusters.
</p>projecteuclid.org/euclid.aos/1590480032_20200526040045Tue, 26 May 2020 04:00 EDTPrediction error after model searchhttps://projecteuclid.org/euclid.aos/1590480033<strong>Xiaoying Tian</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 763--784.</p><p><strong>Abstract:</strong><br/>
Estimation of the prediction error of a linear estimation rule is difficult if the data analyst also uses data to select a set of variables and constructs the estimation rule using only the selected variables. In this work, we propose an asymptotically unbiased estimator for the prediction error after model search. Under some additional mild assumptions, we show that our estimator converges to the true prediction error in $L^{2}$ at the rate of $O(n^{-1/2})$, with $n$ being the number of data points. Our estimator applies to general selection procedures, not requiring analytical forms for the selection. The number of variables to select from can grow as an exponential factor of $n$, allowing applications in high-dimensional data. It also allows model misspecifications, not requiring linear underlying models. One application of our method is that it provides an estimator for the degrees of freedom for many discontinuous estimation rules like best subset selection or relaxed Lasso. Connection to Stein’s Unbiased Risk Estimator is discussed. We consider in-sample prediction errors in this work, with some extension to out-of-sample errors in low-dimensional, linear models. Examples such as best subset selection and relaxed Lasso are considered in simulations, where our estimator outperforms both $C_{p}$ and cross validation in various settings.
</p>projecteuclid.org/euclid.aos/1590480033_20200526040045Tue, 26 May 2020 04:00 EDTJoint estimation of parameters in Ising modelhttps://projecteuclid.org/euclid.aos/1590480034<strong>Promit Ghosal</strong>, <strong>Sumit Mukherjee</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 785--810.</p><p><strong>Abstract:</strong><br/>
We study joint estimation of the inverse temperature and magnetization parameters $(\beta ,B)$ of an Ising model with a nonnegative coupling matrix $A_{n}$ of size $n\times n$, given one sample from the Ising model. We give a general bound on the rate of consistency of the bi-variate pseudo-likelihood estimator. Using this, we show that estimation at rate $n^{-1/2}$ is always possible if $A_{n}$ is the adjacency matrix of a bounded degree graph. If $A_{n}$ is the scaled adjacency matrix of a graph whose average degree goes to $+\infty $, the situation is a bit more delicate. In this case, estimation at rate $n^{-1/2}$ is still possible if the graph is not regular (in an asymptotic sense). Finally, we show that consistent estimation of both parameters is impossible if the graph is Erdős–Renyi with parameter $p>0$ independent of $n$, thus confirming that estimation is harder on approximately regular graphs with large degree.
</p>projecteuclid.org/euclid.aos/1590480034_20200526040045Tue, 26 May 2020 04:00 EDTModel-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional datahttps://projecteuclid.org/euclid.aos/1590480035<strong>Zhiqiang Tan</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 811--837.</p><p><strong>Abstract:</strong><br/>
Consider the problem of estimating average treatment effects when a large number of covariates are used to adjust for possible confounding through outcome regression and propensity score models. We develop new methods and theory to obtain not only doubly robust point estimators for average treatment effects, which remain consistent if either the propensity score model or the outcome regression model is correctly specified, but also model-assisted confidence intervals, which are valid when the propensity score model is correctly specified but the outcome model may be misspecified. With a linear outcome model, the confidence intervals are doubly robust, that is, being also valid when the outcome model is correctly specified but the propensity score model may be misspecified. Our methods involve regularized calibrated estimators with Lasso penalties but carefully chosen loss functions, for fitting propensity score and outcome regression models. We provide high-dimensional analysis to establish the desired properties of our methods under comparable sparsity conditions to previous results, which give valid confidence intervals when both the propensity score and outcome models are correctly specified. We present simulation studies and an empirical application which demonstrate advantages of the proposed methods compared with related methods based on regularized maximum likelihood estimation. The methods are implemented in the R package $\mathtt{RCAL}$.
</p>projecteuclid.org/euclid.aos/1590480035_20200526040045Tue, 26 May 2020 04:00 EDTHurst function estimationhttps://projecteuclid.org/euclid.aos/1590480036<strong>Jinqi Shen</strong>, <strong>Tailen Hsing</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 838--862.</p><p><strong>Abstract:</strong><br/>
This paper considers a wide range of issues concerning the estimation of the Hurst function of a multifractional Brownian motion when the process is observed on a regular grid. A theoretical lower bound for the minimax risk of this inference problem is established for a wide class of smooth Hurst functions. We also propose a new nonparametric estimator and show that it is rate optimal. Implementation issues of the estimator including how to overcome the presence of a nuisance parameter and choose the tuning parameter from data will be considered. An extensive numerical study is conducted to compare our approach with other approaches.
</p>projecteuclid.org/euclid.aos/1590480036_20200526040045Tue, 26 May 2020 04:00 EDTFundamental limits of detection in the spiked Wigner modelhttps://projecteuclid.org/euclid.aos/1590480037<strong>Ahmed El Alaoui</strong>, <strong>Florent Krzakala</strong>, <strong>Michael Jordan</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 863--885.</p><p><strong>Abstract:</strong><br/>
We study the fundamental limits of detecting the presence of an additive rank-one perturbation, or spike, to a Wigner matrix. When the spike comes from a prior that is i.i.d. across coordinates, we prove that the log-likelihood ratio of the spiked model against the nonspiked one is asymptotically normal below a certain reconstruction threshold which is not necessarily of a “spectral” nature, and that it is degenerate above. This establishes the maximal region of contiguity between the planted and null models. It is known that this threshold also marks a phase transition for estimating the spike: the latter task is possible above the threshold and impossible below. Therefore, both estimation and detection undergo the same transition in this random matrix model. Further information on the performance of the optimal test is also provided. Our proofs are based on Gaussian interpolation methods and a rigorous incarnation of the cavity method, as devised by Guerra and Talagrand in their study of the Sherrington–Kirkpatrick spin-glass model.
</p>projecteuclid.org/euclid.aos/1590480037_20200526040045Tue, 26 May 2020 04:00 EDT$\alpha $-variational inference with statistical guaranteeshttps://projecteuclid.org/euclid.aos/1590480038<strong>Yun Yang</strong>, <strong>Debdeep Pati</strong>, <strong>Anirban Bhattacharya</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 886--905.</p><p><strong>Abstract:</strong><br/>
We provide statistical guarantees for a family of variational approximations to Bayesian posterior distributions, called $\alpha $-VB, which has close connections with variational approximations of tempered posteriors in the literature. The standard variational approximation is a special case of $\alpha $-VB with $\alpha =1$. When $\alpha \in (0,1]$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational optimization problem, implying that maximizing the evidence lower bound in variational inference has the effect of minimizing the Bayes risk within the variational density family. Operating in a frequentist setup, the variational inequalities imply that point estimates constructed from the $\alpha $-VB procedure converge at an optimal rate to the true parameter in a wide range of problems. We illustrate our general theory with a number of examples, including the mean-field variational approximation to (low)-high-dimensional Bayesian linear regression with spike and slab priors, Gaussian mixture models and latent Dirichlet allocation.
</p>projecteuclid.org/euclid.aos/1590480038_20200526040045Tue, 26 May 2020 04:00 EDTRobust machine learning by median-of-means: Theory and practicehttps://projecteuclid.org/euclid.aos/1590480039<strong>Guillaume Lecué</strong>, <strong>Matthieu Lerasle</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 906--931.</p><p><strong>Abstract:</strong><br/>
Median-of-means (MOM) based procedures have been recently introduced in learning theory (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)). These estimators outperform classical least-squares estimators when data are heavy-tailed and/or are corrupted. None of these procedures can be implemented, which is the major issue of current MOM procedures ( Ann. Statist. 47 (2019) 783–794).
In this paper, we introduce minmax MOM estimators and show that they achieve the same sub-Gaussian deviation bounds as the alternatives (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)), both in small and high-dimensional statistics. In particular, these estimators are efficient under moments assumptions on data that may have been corrupted by a few outliers.
Besides these theoretical guarantees, the definition of minmax MOM estimators suggests simple and systematic modifications of standard algorithms used to approximate least-squares estimators and their regularized versions. As a proof of concept, we perform an extensive simulation study of these algorithms for robust versions of the LASSO.
</p>projecteuclid.org/euclid.aos/1590480039_20200526040045Tue, 26 May 2020 04:00 EDTConsistent maximum likelihood estimation using subsets with applications to multivariate mixed modelshttps://projecteuclid.org/euclid.aos/1590480040<strong>Karl Oskar Ekvall</strong>, <strong>Galin L. Jones</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 932--952.</p><p><strong>Abstract:</strong><br/>
We present new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models. Our theory builds on the idea of using subsets of the full data to establish consistency of estimators based on the full data. It requires neither that the data consist of independent observations, nor that the observations can be modeled as a stationary stochastic process. Compared to existing asymptotic theory using the idea of subsets, we substantially weaken the assumptions, bringing them closer to what suffices in classical settings. We apply our theory in two multivariate mixed models for which it was unknown whether maximum likelihood estimators are consistent. The models we consider have nonstochastic predictors and multivariate responses which are possibly mixed-type (some discrete and some continuous).
</p>projecteuclid.org/euclid.aos/1590480040_20200526040045Tue, 26 May 2020 04:00 EDTConvergence of eigenvector empirical spectral distribution of sample covariance matriceshttps://projecteuclid.org/euclid.aos/1590480041<strong>Haokai Xi</strong>, <strong>Fan Yang</strong>, <strong>Jun Yin</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 953--982.</p><p><strong>Abstract:</strong><br/>
The eigenvector empirical spectral distribution (VESD) is a useful tool in studying the limiting behavior of eigenvalues and eigenvectors of covariance matrices. In this paper, we study the convergence rate of the VESD of sample covariance matrices to the deformed Marčenko–Pastur (MP) distribution. Consider sample covariance matrices of the form $\Sigma ^{1/2}XX^{*}\Sigma ^{1/2}$, where $X=(x_{ij})$ is an $M\times N$ random matrix whose entries are independent random variables with mean zero and variance $N^{-1}$, and $\Sigma $ is a deterministic positive-definite matrix. We prove that the Kolmogorov distance between the expected VESD and the deformed MP distribution is bounded by $N^{-1+\epsilon }$ for any fixed $\epsilon >0$, provided that the entries $\sqrt{N}x_{ij}$ have uniformly bounded 6th moments and $|N/M-1|\ge \tau $ for some constant $\tau >0$. This result improves the previous one obtained in ( Ann. Statist. 41 (2013) 2572–2607), which gave the convergence rate $O(N^{-1/2})$ assuming i.i.d. $X$ entries, bounded 10th moment, $\Sigma =I$ and $M<N$. Moreover, we also prove that under the finite $8$th moment assumption, the convergence rate of the VESD is $O(N^{-1/2+\epsilon })$ almost surely for any fixed $\epsilon >0$, which improves the previous bound $N^{-1/4+\epsilon }$ in ( Ann. Statist. 41 (2013) 2572–2607).
</p>projecteuclid.org/euclid.aos/1590480041_20200526040045Tue, 26 May 2020 04:00 EDTD-optimal designs for multinomial logistic modelshttps://projecteuclid.org/euclid.aos/1590480042<strong>Xianwei Bu</strong>, <strong>Dibyen Majumdar</strong>, <strong>Jie Yang</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 983--1000.</p><p><strong>Abstract:</strong><br/>
We consider optimal designs for general multinomial logistic models, which cover baseline-category, cumulative, adjacent-categories and continuation-ratio logit models, with proportional odds, nonproportional odds or partial proportional odds assumption. We derive the corresponding Fisher information matrices in three different forms to facilitate their calculations, determine the conditions for their positive definiteness, and search for optimal designs. We conclude that, unlike the designs for binary responses, a feasible design for a multinomial logistic model may contain less experimental settings than parameters, which is of practical significance. We also conclude that even for a minimally supported design, a uniform allocation, which is typically used in practice, is not optimal in general for a multinomial logistic model. We develop efficient algorithms for searching D-optimal designs. Using examples based on real experiments, we show that the efficiency of an experiment can be significantly improved if our designs are adopted.
</p>projecteuclid.org/euclid.aos/1590480042_20200526040045Tue, 26 May 2020 04:00 EDTA unified study of nonparametric inference for monotone functionshttps://projecteuclid.org/euclid.aos/1590480043<strong>Ted Westling</strong>, <strong>Marco Carone</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1001--1024.</p><p><strong>Abstract:</strong><br/>
The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise convergence in distribution of a class of generalized Grenander-type estimators of a monotone function. This broad class allows the minorization or majoratization operation to be performed on a data-dependent transformation of the domain, possibly yielding benefits in practice. Additionally, we provide simpler conditions and more concrete distributional theory in the important case that the primitive estimator and data-dependent transformation function are asymptotically linear. We use our general results in the context of various well-studied problems, and show that we readily recover classical results established separately in each case. More importantly, we show that our results allow us to tackle more challenging problems involving parameters for which the use of flexible learning strategies appears necessary. In particular, we study inference on monotone density and hazard functions using informatively right-censored data, extending the classical work on independent censoring, and on a covariate-marginalized conditional mean function, extending the classical work on monotone regression functions.
</p>projecteuclid.org/euclid.aos/1590480043_20200526040045Tue, 26 May 2020 04:00 EDTInference for Archimax copulashttps://projecteuclid.org/euclid.aos/1590480044<strong>Simon Chatelain</strong>, <strong>Anne-Laure Fougères</strong>, <strong>Johanna G. Nešlehová</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1025--1051.</p><p><strong>Abstract:</strong><br/>
Archimax copula models can account for any type of asymptotic dependence between extremes and at the same time capture joint risks at medium levels. An Archimax copula is characterized by two functional parameters: the stable tail dependence function $\ell $, and the Archimedean generator $\psi $ which distorts the extreme-value dependence structure. This article develops semiparametric inference for Archimax copulas: a nonparametric estimator of $\ell $ and a moment-based estimator of $\psi $ assuming the latter belongs to a parametric family. Conditions under which $\psi $ and $\ell $ are identifiable are derived. The asymptotic behavior of the estimators is then established under broad regularity conditions; performance in small samples is assessed through a comprehensive simulation study. The Archimax copula model with the Clayton generator is then used to analyze monthly rainfall maxima at three stations in French Brittany. The model is seen to fit the data very well, both in the lower and in the upper tail. The nonparametric estimator of $\ell $ reveals asymmetric extremal dependence between the stations, which reflects heavy precipitation patterns in the area. Technical proofs, simulation results and $\mathsf{R}$ code are provided in the Online Supplement.
</p>projecteuclid.org/euclid.aos/1590480044_20200526040045Tue, 26 May 2020 04:00 EDTAdmissible Bayes equivariant estimation of location vectors for spherically symmetric distributions with unknown scalehttps://projecteuclid.org/euclid.aos/1590480045<strong>Yuzo Maruyama</strong>, <strong>William E. Strawderman</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1052--1071.</p><p><strong>Abstract:</strong><br/>
This paper investigates estimation of the mean vector under invariant quadratic loss for a spherically symmetric location family with a residual vector with density of the form $f(x,u)=\eta ^{(p+n)/2}f(\eta \{\|x-\theta \|^{2}+\|u\|^{2}\})$, where $\eta $ is unknown. We show that the natural estimator $x$ is admissible for $p=1,2$. Also, for $p\geq 3$, we find classes of generalized Bayes estimators that are admissible within the class of equivariant estimators of the form $\{1-\xi (x/\|u\|)\}x$. In the Gaussian case, a variant of the James–Stein estimator, $[1-\{(p-2)/(n+2)\}/\{\|x\|^{2}/\|u\|^{2}+(p-2)/(n+2)+1\}]x$, which dominates the natural estimator $x$, is also admissible within this class. We also study the related regression model.
</p>projecteuclid.org/euclid.aos/1590480045_20200526040045Tue, 26 May 2020 04:00 EDTWorst-case versus average-case design for estimation from partial pairwise comparisonshttps://projecteuclid.org/euclid.aos/1590480046<strong>Ashwin Pananjady</strong>, <strong>Cheng Mao</strong>, <strong>Vidya Muthukumar</strong>, <strong>Martin J. Wainwright</strong>, <strong>Thomas A. Courtade</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1072--1097.</p><p><strong>Abstract:</strong><br/>
Pairwise comparison data arises in many domains, including tournament rankings, web search and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when the assignment of items to the topology is arbitrary, these permutation-based models, unlike their parametric counterparts, do not admit consistent estimation for most comparison topologies used in practice. We then demonstrate that consistent estimation is possible when the assignment of items to the topology is randomized, thus establishing a dichotomy between worst-case and average-case designs. We propose two computationally efficient estimators in the average-case setting and analyze their risk, showing that it depends on the comparison topology only through the degree sequence of the topology. We also provide explicit classes of graphs for which the rates achieved by these estimators are optimal. Our results are corroborated by simulations on multiple comparison topologies.
</p>projecteuclid.org/euclid.aos/1590480046_20200526040045Tue, 26 May 2020 04:00 EDTNonasymptotic upper bounds for the reconstruction error of PCAhttps://projecteuclid.org/euclid.aos/1590480047<strong>Markus Reiß</strong>, <strong>Martin Wahl</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1098--1123.</p><p><strong>Abstract:</strong><br/>
We analyse the reconstruction error of principal component analysis (PCA) and prove nonasymptotic upper bounds for the corresponding excess risk. These bounds unify and improve existing upper bounds from the literature. In particular, they give oracle inequalities under mild eigenvalue conditions. The bounds reveal that the excess risk differs significantly from usually considered subspace distances based on canonical angles. Our approach relies on the analysis of empirical spectral projectors combined with concentration inequalities for weighted empirical covariance operators and empirical eigenvalues.
</p>projecteuclid.org/euclid.aos/1590480047_20200526040045Tue, 26 May 2020 04:00 EDTLasso guarantees for $\beta$-mixing heavy-tailed time serieshttps://projecteuclid.org/euclid.aos/1590480048<strong>Kam Chung Wong</strong>, <strong>Zifan Li</strong>, <strong>Ambuj Tewari</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1124--1142.</p><p><strong>Abstract:</strong><br/>
Many theoretical results for lasso require the samples to be i.i.d. Recent work has provided guarantees for lasso assuming that the time series is generated by a sparse Vector Autoregressive (VAR) model with Gaussian innovations. Proofs of these results rely critically on the fact that the true data generating mechanism (DGM) is a finite-order Gaussian VAR. This assumption is quite brittle: linear transformations, including selecting a subset of variables, can lead to the violation of this assumption. In order to break free from such assumptions, we derive nonasymptotic inequalities for estimation error and prediction error of lasso estimate of the best linear predictor without assuming any special parametric form of the DGM. Instead, we rely only on (strict) stationarity and geometrically decaying $\beta$-mixing coefficients to establish error bounds for lasso for sub-Weibull random vectors. The class of sub-Weibull random variables that we introduce includes sub-Gaussian and subexponential random variables but also includes random variables with tails heavier than an exponential. We also show that, for Gaussian processes, the $\beta$-mixing condition can be relaxed to summability of the $\alpha$-mixing coefficients. Our work provides an alternative proof of the consistency of lasso for sparse Gaussian VAR models. But the applicability of our results extends to non-Gaussian and nonlinear times series models as the examples we provide demonstrate.
</p>projecteuclid.org/euclid.aos/1590480048_20200526040045Tue, 26 May 2020 04:00 EDTHigh-frequency analysis of parabolic stochastic PDEshttps://projecteuclid.org/euclid.aos/1590480049<strong>Carsten Chong</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1143--1167.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating stochastic volatility for a class of second-order parabolic stochastic PDEs. Assuming that the solution is observed at high temporal frequency, we use limit theorems for multipower variations and related functionals to construct consistent nonparametric estimators and asymptotic confidence bounds for the integrated volatility process. As a byproduct of our analysis, we also obtain feasible estimators for the regularity of the spatial covariance function of the noise.
</p>projecteuclid.org/euclid.aos/1590480049_20200526040045Tue, 26 May 2020 04:00 EDTFunctional data analysis in the Banach space of continuous functionshttps://projecteuclid.org/euclid.aos/1590480050<strong>Holger Dette</strong>, <strong>Kevin Kokot</strong>, <strong>Alexander Aue</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1168--1192.</p><p><strong>Abstract:</strong><br/>
Functional data analysis is typically conducted within the $L^{2}$-Hilbert space framework. There is by now a fully developed statistical toolbox allowing for the principled application of the functional data machinery to real-world problems, often based on dimension reduction techniques such as functional principal component analysis. At the same time, there have recently been a number of publications that sidestep dimension reduction steps and focus on a fully functional $L^{2}$-methodology. This paper goes one step further and develops data analysis methodology for functional time series in the space of all continuous functions. The work is motivated by the fact that objects with rather different shapes may still have a small $L^{2}$-distance and are therefore identified as similar when using a $L^{2}$-metric. However, in applications it is often desirable to use metrics reflecting the visualization of the curves in the statistical analysis. The methodological contributions are focused on developing two-sample and change-point tests as well as confidence bands, as these procedures appear to be conducive to the proposed setting. Particular interest is put on relevant differences; that is, on not trying to test for exact equality, but rather for prespecified deviations under the null hypothesis.
The procedures are justified through large-sample theory. To ensure practicability, nonstandard bootstrap procedures are developed and investigated addressing particular features that arise in the problem of testing relevant hypotheses. The finite sample properties are explored through a simulation study and an application to annual temperature profiles.
</p>projecteuclid.org/euclid.aos/1590480050_20200526040045Tue, 26 May 2020 04:00 EDTMean estimation with sub-Gaussian rates in polynomial timehttps://projecteuclid.org/euclid.aos/1590480051<strong>Samuel B. Hopkins</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1193--1213.</p><p><strong>Abstract:</strong><br/>
We study polynomial time algorithms for estimating the mean of a heavy-tailed multivariate random vector. We assume only that the random vector $X$ has finite mean and covariance. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that $X$ is Gaussian or sub-Gaussian.
We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely many moments of $X$ either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.
</p>projecteuclid.org/euclid.aos/1590480051_20200526040045Tue, 26 May 2020 04:00 EDTBootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial datahttps://projecteuclid.org/euclid.aos/1590480052<strong>Miles E. Lopes</strong>, <strong>Zhenhua Lin</strong>, <strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>Annals of Statistics, Volume 48, Number 2, 1214--1229.</p><p><strong>Abstract:</strong><br/>
In recent years, bootstrap methods have drawn attention for their ability to approximate the laws of “max statistics” in high-dimensional problems. A leading example of such a statistic is the coordinatewise maximum of a sample average of $n$ random vectors in $\mathbb{R}^{p}$. Existing results for this statistic show that the bootstrap can work when $n\ll p$, and rates of approximation (in Kolmogorov distance) have been obtained with only logarithmic dependence in $p$. Nevertheless, one of the challenging aspects of this setting is that established rates tend to scale like $n^{-1/6}$ as a function of $n$.
The main purpose of this paper is to demonstrate that improvement in rate is possible when extra model structure is available. Specifically, we show that if the coordinatewise variances of the observations exhibit decay, then a nearly $n^{-1/2}$ rate can be achieved, independent of $p$ . Furthermore, a surprising aspect of this dimension-free rate is that it holds even when the decay is very weak . Lastly, we provide examples showing how these ideas can be applied to inference problems dealing with functional and multinomial data.
</p>projecteuclid.org/euclid.aos/1590480052_20200526040045Tue, 26 May 2020 04:00 EDT