The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTMinimax rates of community detection in stochastic block modelshttp://projecteuclid.org/euclid.aos/1473685275<strong>Anderson Y. Zhang</strong>, <strong>Harrison H. Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2252--2280.</p><p><strong>Abstract:</strong><br/>
Recently, network analysis has gained more and more attention in statistics, as well as in computer science, probability and applied mathematics. Community detection for the stochastic block model (SBM) is probably the most studied topic in network analysis. Many methodologies have been proposed. Some beautiful and significant phase transition results are obtained in various settings. In this paper, we provide a general minimax theory for community detection. It gives minimax rates of the mis-match ratio for a wide rage of settings including homogeneous and inhomogeneous SBMs, dense and sparse networks, finite and growing number of communities. The minimax rates are exponential, different from polynomial rates we often see in statistical literature. An immediate consequence of the result is to establish threshold phenomenon for strong consistency (exact recovery) as well as weak consistency (partial recovery). We obtain the upper bound by a range of penalized likelihood-type approaches. The lower bound is achieved by a novel reduction from a global mis-match ratio to a local clustering problem for one node through an exchangeability property.
</p>projecteuclid.org/euclid.aos/1473685275_20160912090107Mon, 12 Sep 2016 09:01 EDTFrom sparse to dense functional data and beyondhttp://projecteuclid.org/euclid.aos/1473685276<strong>Xiaoke Zhang</strong>, <strong>Jane-Ling Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2281--2321.</p><p><strong>Abstract:</strong><br/>
Nonparametric estimation of mean and covariance functions is important in functional data analysis. We investigate the performance of local linear smoothers for both mean and covariance functions with a general weighing scheme, which includes two commonly used schemes, equal weight per observation (OBS), and equal weight per subject (SUBJ), as two special cases. We provide a comprehensive analysis of their asymptotic properties on a unified platform for all types of sampling plan, be it dense, sparse or neither. Three types of asymptotic properties are investigated in this paper: asymptotic normality, $L^{2}$ convergence and uniform convergence. The asymptotic theories are unified on two aspects: (1) the weighing scheme is very general; (2) the magnitude of the number $N_{i}$ of measurements for the $i$th subject relative to the sample size $n$ can vary freely. Based on the relative order of $N_{i}$ to $n$, functional data are partitioned into three types: non-dense, dense and ultra-dense functional data for the OBS and SUBJ schemes. These two weighing schemes are compared both theoretically and numerically. We also propose a new class of weighing schemes in terms of a mixture of the OBS and SUBJ weights, of which theoretical and numerical performances are examined and compared.
</p>projecteuclid.org/euclid.aos/1473685276_20160912090107Mon, 12 Sep 2016 09:01 EDTInfluential features PCA for high dimensional clusteringhttp://projecteuclid.org/euclid.aos/1479891617<strong>Jiashun Jin</strong>, <strong>Wanjie Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2323--2359.</p><p><strong>Abstract:</strong><br/>
We consider a clustering problem where we observe feature vectors $X_{i}\in R^{p}$, $i=1,2,\ldots,n$, from $K$ possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of $p\gg n$, where classical clustering methods face challenges.
We propose Influential Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select a small fraction of features with the largest Kolmogorov–Smirnov (KS) scores, obtain the first $(K-1)$ left singular vectors of the post-selection normalized data matrix, and then estimate the labels by applying the classical $k$-means procedure to these singular vectors. In this procedure, the only tuning parameter is the threshold in the feature selection step. We set the threshold in a data-driven fashion by adapting the recent notion of Higher Criticism. As a result, IF-PCA is a tuning-free clustering method.
We apply IF-PCA to $10$ gene microarray data sets. The method has competitive performance in clustering. Especially, in three of the data sets, the error rates of IF-PCA are only $29\%$ or less of the error rates by other methods. We have also rediscovered a phenomenon on empirical null by Efron [ J. Amer. Statist. Assoc. 99 (2004) 96–104] on microarray data.
With delicate analysis, especially post-selection eigen-analysis, we derive tight probability bounds on the Kolmogorov–Smirnov statistics and show that IF-PCA yields clustering consistency in a broad context. The clustering problem is connected to the problems of sparse PCA and low-rank matrix recovery, but it is different in important ways. We reveal an interesting phase transition phenomenon associated with these problems and identify the range of interest for each.
</p>projecteuclid.org/euclid.aos/1479891617_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891618<strong>Ery Arias-Castro</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2360--2365.</p>projecteuclid.org/euclid.aos/1479891618_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891619<strong>Boaz Nadler</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2366--2371.</p>projecteuclid.org/euclid.aos/1479891619_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential feature PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891620<strong>T. Tony Cai</strong>, <strong>Linjun Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2372--2381.</p>projecteuclid.org/euclid.aos/1479891620_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891621<strong>Natalia A. Stepanova</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2382--2386.</p>projecteuclid.org/euclid.aos/1479891621_20161123040048Wed, 23 Nov 2016 04:00 ESTRejoinder: “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891622<strong>Jiashun Jin</strong>, <strong>Wanjie Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2387--2400.</p>projecteuclid.org/euclid.aos/1479891622_20161123040048Wed, 23 Nov 2016 04:00 ESTNonparametric estimation of dynamics of monotone trajectorieshttp://projecteuclid.org/euclid.aos/1479891623<strong>Debashis Paul</strong>, <strong>Jie Peng</strong>, <strong>Prabir Burman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2401--2432.</p><p><strong>Abstract:</strong><br/>
We study a class of nonlinear nonparametric inverse problems. Specifically, we propose a nonparametric estimator of the dynamics of a monotonically increasing trajectory defined on a finite time interval. Under suitable regularity conditions, we show that in terms of $L^{2}$-loss, the optimal rate of convergence for the proposed estimator is the same as that for the estimation of the derivative of a function. We conduct simulation studies to examine the finite sample behavior of the proposed estimator and apply it to the Berkeley growth data.
</p>projecteuclid.org/euclid.aos/1479891623_20161123040048Wed, 23 Nov 2016 04:00 ESTCausal inference with a graphical hierarchy of interventionshttp://projecteuclid.org/euclid.aos/1479891624<strong>Ilya Shpitser</strong>, <strong>Eric Tchetgen Tchetgen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2433--2466.</p><p><strong>Abstract:</strong><br/>
Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another.
Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures.
In this paper, we give a unifying view of a large class of causal effects of interest, including novel effects not previously considered, in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula.
Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl’s front-door criterion.
</p>projecteuclid.org/euclid.aos/1479891624_20161123040048Wed, 23 Nov 2016 04:00 ESTConsistent model selection criteria for quadratically supported riskshttp://projecteuclid.org/euclid.aos/1479891625<strong>Yongdai Kim</strong>, <strong>Jong-June Jeon</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2467--2496.</p><p><strong>Abstract:</strong><br/>
In this paper, we study asymptotic properties of model selection criteria for high-dimensional regression models where the number of covariates is much larger than the sample size. In particular, we consider a class of loss functions called the class of quadratically supported risks which is large enough to include the quadratic loss, Huber loss, quantile loss and logistic loss. We provide sufficient conditions for the model selection criteria, which are applicable to the class of quadratically supported risks. Our results extend most previous sufficient conditions for model selection consistency. In addition, sufficient conditions for pathconsistency of the Lasso and nonconvex penalized estimators are presented. Here, pathconsistency means that the probability of the solution path that includes the true model converges to 1. Pathconsistency makes it practically feasible to apply consistent model selection criteria to high-dimensional data. The data-adaptive model selection procedure is proposed which is selection consistent and performs well for finite samples. Results of simulation studies as well as real data analysis are presented to compare the finite sample performances of the proposed data-adaptive model selection criterion with other competitors.
</p>projecteuclid.org/euclid.aos/1479891625_20161123040048Wed, 23 Nov 2016 04:00 ESTOn the computational complexity of high-dimensional Bayesian variable selectionhttp://projecteuclid.org/euclid.aos/1479891626<strong>Yun Yang</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2497--2532.</p><p><strong>Abstract:</strong><br/>
We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis–Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.
</p>projecteuclid.org/euclid.aos/1479891626_20161123040048Wed, 23 Nov 2016 04:00 ESTFamily-Wise Separation Rates for multiple testinghttp://projecteuclid.org/euclid.aos/1479891627<strong>Magalie Fromont</strong>, <strong>Matthieu Lerasle</strong>, <strong>Patricia Reynaud-Bouret</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2533--2563.</p><p><strong>Abstract:</strong><br/>
Starting from a parallel between some minimax adaptive tests of a single null hypothesis, based on aggregation approaches, and some tests of multiple hypotheses, we propose a new second kind error-related evaluation criterion, as the core of an emergent minimax theory for multiple tests. Aggregation-based tests, proposed for instance by Baraud [ Bernoulli 8 (2002) 577–606], Baraud, Huet and Laurent [ Ann. Statist. 31 (2003) 225–251] or Fromont and Laurent [ Ann. Statist. 34 (2006) 680–720], are justified through their first kind error rate, which is controlled by the prescribed level on the one hand, and through their separation rates over various classes of alternatives, rates which are minimax on the other hand. We show that some of these tests can be viewed as the first steps of classical step-down multiple testing procedures, and accordingly be evaluated from the multiple testing point of view also, through a control of their Family-Wise Error Rate (FWER). Conversely, many multiple testing procedures, from the historical ones of Bonferroni and Holm, to more recent ones like min-$p$ procedures or randomized procedures such as the ones proposed by Romano and Wolf [ J. Amer. Statist. Assoc. 100 (2005) 94–108], can be investigated from the minimax adaptive testing point of view. To this end, we extend the notion of separation rate to the multiple testing field, by defining the weak Family-Wise Separation Rate and its stronger counterpart, the Family-Wise Separation Rate (FWSR). As for nonparametric tests of a single null hypothesis, we prove that these new concepts allow an accurate analysis of the second kind error of a multiple testing procedure, leading to clear definitions of minimax and minimax adaptive multiple tests. Some illustrations in classical Gaussian frameworks corroborate several expected results under particular conditions on the tested hypotheses, but also lead to new questions and perspectives.
</p>projecteuclid.org/euclid.aos/1479891627_20161123040048Wed, 23 Nov 2016 04:00 ESTMinimax optimal rates of estimation in high dimensional additive modelshttp://projecteuclid.org/euclid.aos/1479891628<strong>Ming Yuan</strong>, <strong>Ding-Xuan Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2564--2593.</p><p><strong>Abstract:</strong><br/>
We establish minimax optimal rates of convergence for estimation in a high dimensional additive model assuming that it is approximately sparse. Our results reveal a behavior universal to this class of high dimensional problems. In the sparse regime when the components are sufficiently smooth or the dimensionality is sufficiently large, the optimal rates are identical to those for high dimensional linear regression and, therefore, there is no additional cost to entertain a nonparametric model. Otherwise, in the so-called smooth regime , the rates coincide with the optimal rates for estimating a univariate function and, therefore, they are immune to the “curse of dimensionality.”
</p>projecteuclid.org/euclid.aos/1479891628_20161123040048Wed, 23 Nov 2016 04:00 ESTOn marginal sliced inverse regression for ultrahigh dimensional model-free feature selectionhttp://projecteuclid.org/euclid.aos/1479891629<strong>Zhou Yu</strong>, <strong>Yuexiao Dong</strong>, <strong>Jun Shao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2594--2623.</p><p><strong>Abstract:</strong><br/>
Model-free variable selection has been implemented under the sufficient dimension reduction framework since the seminal paper of Cook [ Ann. Statist. 32 (2004) 1062–1092]. In this paper, we extend the marginal coordinate test for sliced inverse regression (SIR) in Cook (2004) and propose a novel marginal SIR utility for the purpose of ultrahigh dimensional feature selection. Two distinct procedures, Dantzig selector and sparse precision matrix estimation, are incorporated to get two versions of sample level marginal SIR utilities. Both procedures lead to model-free variable selection consistency with predictor dimensionality $p$ diverging at an exponential rate of the sample size $n$. As a special case of marginal SIR, we ignore the correlation among the predictors and propose marginal independence SIR. Marginal independence SIR is closely related to many existing independence screening procedures in the literature, and achieves model-free screening consistency in the ultrahigh dimensional setting. The finite sample performances of the proposed procedures are studied through synthetic examples and an application to the small round blue cell tumors data.
</p>projecteuclid.org/euclid.aos/1479891629_20161123040048Wed, 23 Nov 2016 04:00 ESTFaithful variable screening for high-dimensional convex regressionhttp://projecteuclid.org/euclid.aos/1479891630<strong>Min Xu</strong>, <strong>Minhua Chen</strong>, <strong>John Lafferty</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2624--2660.</p><p><strong>Abstract:</strong><br/>
We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensional concave regression fits on the residuals. In contrast to previous methods for sparse additive models, the optimization is finite dimensional and requires no tuning parameters for smoothness. Under appropriate assumptions, we prove that the procedure is faithful in the population setting, yielding no false negatives. We give a finite sample statistical analysis, and introduce algorithms for efficiently carrying out the required quadratic programs. The approach leads to computational and statistical advantages over fitting a full model, and provides an effective, practical approach to variable screening in convex regression.
</p>projecteuclid.org/euclid.aos/1479891630_20161123040048Wed, 23 Nov 2016 04:00 ESTHigh-dimensional generalizations of asymmetric least squares regression and their applicationshttp://projecteuclid.org/euclid.aos/1479891631<strong>Yuwen Gu</strong>, <strong>Hui Zou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2661--2694.</p><p><strong>Abstract:</strong><br/>
Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high dimensions where the penalty functions include the Lasso and nonconvex penalties. We develop a unified efficient algorithm for fitting SALES and establish its theoretical properties. As an important application, SALES is used to detect heteroscedasticity in high-dimensional data. Another method for detecting heteroscedasticity is the sparse quantile regression. However, both SALES and the sparse quantile regression may fail to tell which variables are important for the conditional mean and which variables are important for the conditional scale/variance, especially when there are variables that are important for both the mean and the scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression which can be efficiently solved by an algorithm similar to that for solving SALES. We establish theoretical properties of COSALES. In particular, COSALES using the SCAD penalty or MCP is shown to consistently identify the two important subsets for the mean and scale simultaneously, even when the two subsets overlap. We demonstrate the empirical performance of SALES and COSALES by simulated and real data.
</p>projecteuclid.org/euclid.aos/1479891631_20161123040048Wed, 23 Nov 2016 04:00 ESTSub-Gaussian mean estimatorshttp://projecteuclid.org/euclid.aos/1479891632<strong>Luc Devroye</strong>, <strong>Matthieu Lerasle</strong>, <strong>Gabor Lugosi</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2695--2725.</p><p><strong>Abstract:</strong><br/>
We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a nonasymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
</p>projecteuclid.org/euclid.aos/1479891632_20161123040048Wed, 23 Nov 2016 04:00 ESTConvergence rates of parameter estimation for some weakly identifiable finite mixtureshttp://projecteuclid.org/euclid.aos/1479891633<strong>Nhat Ho</strong>, <strong>XuanLong Nguyen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2726--2755.</p><p><strong>Abstract:</strong><br/>
We establish minimax lower bounds and maximum likelihood convergence rates of parameter estimation for mean-covariance multivariate Gaussian mixtures, shape-rate Gamma mixtures and some variants of finite mixture models, including the setting where the number of mixing components is bounded but unknown. These models belong to what we call “weakly identifiable” classes, which exhibit specific interactions among mixing parameters driven by the algebraic structures of the class of kernel densities and their partial derivatives. Accordingly, both the minimax bounds and the maximum likelihood parameter estimation rates in these models, obtained under some compactness conditions on the parameter space, are shown to be typically much slower than the usual $n^{-1/2}$ or $n^{-1/4}$ rates of convergence.
</p>projecteuclid.org/euclid.aos/1479891633_20161123040048Wed, 23 Nov 2016 04:00 ESTGlobal rates of convergence in log-concave density estimationhttp://projecteuclid.org/euclid.aos/1479891634<strong>Arlene K. H. Kim</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2756--2779.</p><p><strong>Abstract:</strong><br/>
The estimation of a log-concave density on $\mathbb{R}^{d}$ represents a central problem in the area of nonparametric inference under shape constraints. In this paper, we study the performance of log-concave density estimators with respect to global loss functions, and adopt a minimax approach. We first show that no statistical procedure based on a sample of size $n$ can estimate a log-concave density with respect to the squared Hellinger loss function with supremum risk smaller than order $n^{-4/5}$, when $d=1$, and order $n^{-2/(d+1)}$ when $d\geq2$. In particular, this reveals a sense in which, when $d\geq3$, log-concave density estimation is fundamentally more challenging than the estimation of a density with two bounded derivatives (a problem to which it has been compared). Second, we show that for $d\leq3$, the Hellinger $\varepsilon$-bracketing entropy of a class of log-concave densities with small mean and covariance matrix close to the identity grows like $\max\{\varepsilon^{-d/2},\varepsilon^{-(d-1)}\}$ (up to a logarithmic factor when $d=2$). This enables us to prove that when $d\leq3$ the log-concave maximum likelihood estimator achieves the minimax optimal rate (up to logarithmic factors when $d=2,3$) with respect to squared Hellinger loss.
</p>projecteuclid.org/euclid.aos/1479891634_20161123040048Wed, 23 Nov 2016 04:00 ESTTensor decompositions and sparse log-linear modelshttp://projecteuclid.org/euclid.aos/1487667616<strong>James E. Johndrow</strong>, <strong>Anirban Bhattacharya</strong>, <strong>David B. Dunson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 1--38.</p><p><strong>Abstract:</strong><br/>
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
</p>projecteuclid.org/euclid.aos/1487667616_20170221040055Tue, 21 Feb 2017 04:00 ESTA lava attack on the recovery of sums of dense and sparse signalshttp://projecteuclid.org/euclid.aos/1487667617<strong>Victor Chernozhukov</strong>, <strong>Christian Hansen</strong>, <strong>Yuan Liao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 39--76.</p><p><strong>Abstract:</strong><br/>
Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a “sparse $+$ dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein’s unbiased estimator for lava’s prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava’s improved performance relative to these benchmarks.
</p>projecteuclid.org/euclid.aos/1487667617_20170221040055Tue, 21 Feb 2017 04:00 ESTStatistical guarantees for the EM algorithm: From population to sample-based analysishttp://projecteuclid.org/euclid.aos/1487667618<strong>Sivaraman Balakrishnan</strong>, <strong>Martin J. Wainwright</strong>, <strong>Bin Yu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 77--120.</p><p><strong>Abstract:</strong><br/>
The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.
</p>projecteuclid.org/euclid.aos/1487667618_20170221040055Tue, 21 Feb 2017 04:00 ESTNormal approximation and concentration of spectral projectors of sample covariancehttp://projecteuclid.org/euclid.aos/1487667619<strong>Vladimir Koltchinskii</strong>, <strong>Karim Lounici</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 121--157.</p><p><strong>Abstract:</strong><br/>
Let $X,X_{1},\dots,X_{n}$ be i.i.d. Gaussian random variables in a separable Hilbert space $\mathbb{H}$ with zero mean and covariance operator $\Sigma=\mathbb{E}(X\otimes X)$, and let $\hat{\Sigma}:=n^{-1}\sum_{j=1}^{n}(X_{j}\otimes X_{j})$ be the sample (empirical) covariance operator based on $(X_{1},\dots,X_{n})$. Denote by $P_{r}$ the spectral projector of $\Sigma$ corresponding to its $r$th eigenvalue $\mu_{r}$ and by $\hat{P}_{r}$ the empirical counterpart of $P_{r}$. The main goal of the paper is to obtain tight bounds on
\[\sup_{x\in\mathbb{R}}\vert\mathbb{P} \{\frac{\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}-\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}}{\operatorname{Var}^{1/2}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})}\leq x\}-\Phi (x)\vert ,\] where $\Vert \cdot \Vert_{2}$ denotes the Hilbert–Schmidt norm and $\Phi$ is the standard normal distribution function. Such accuracy of normal approximation of the distribution of squared Hilbert–Schmidt error is characterized in terms of so-called effective rank of $\Sigma$ defined as ${\mathbf{r}}(\Sigma)=\frac{\operatorname{tr}(\Sigma)}{\Vert \Sigma \Vert_{\infty}}$, where $\operatorname{tr}(\Sigma)$ is the trace of $\Sigma$ and $\Vert \Sigma \Vert_{\infty}$ is its operator norm, as well as another parameter characterizing the size of $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$. Other results include nonasymptotic bounds and asymptotic representations for the mean squared Hilbert–Schmidt norm error $\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ and the variance $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$, and concentration inequalities for $\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ around its expectation.
</p>projecteuclid.org/euclid.aos/1487667619_20170221040055Tue, 21 Feb 2017 04:00 ESTA general theory of hypothesis tests and confidence regions for sparse high dimensional modelshttp://projecteuclid.org/euclid.aos/1487667620<strong>Yang Ning</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 158--195.</p><p><strong>Abstract:</strong><br/>
We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a novel decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our method provides a general framework for high dimensional inference and is applicable to a wide variety of applications. In particular, we apply this general framework to study five illustrative examples: linear regression, logistic regression, Poisson regression, Gaussian graphical model and additive hazards model. For hypothesis testing, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers. For confidence region construction, we show that the decorrelated score function can be used to construct point estimators that are asymptotically normal and semiparametrically efficient. We further generalize this framework to handle the settings of misspecified models. Thorough numerical results are provided to back up the developed theory.
</p>projecteuclid.org/euclid.aos/1487667620_20170221040055Tue, 21 Feb 2017 04:00 ESTA Bayesian approach for envelope modelshttp://projecteuclid.org/euclid.aos/1487667621<strong>Kshitij Khare</strong>, <strong>Subhadip Pal</strong>, <strong>Zhihua Su</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 196--222.</p><p><strong>Abstract:</strong><br/>
The envelope model is a new paradigm to address estimation and prediction in multivariate analysis. Using sufficient dimension reduction techniques, it has the potential to achieve substantial efficiency gains compared to standard models. This model was first introduced by [Statist. Sinica 20 (2010) 927–960] for multivariate linear regression, and has since been adapted to many other contexts. However, a Bayesian approach for analyzing envelope models has not yet been investigated in the literature. In this paper, we develop a comprehensive Bayesian framework for estimation and model selection in envelope models in the context of multivariate linear regression. Our framework has the following attractive features. First, we use the matrix Bingham distribution to construct a prior on the orthogonal basis matrix of the envelope subspace. This prior respects the manifold structure of the envelope model, and can directly incorporate prior information about the envelope subspace through the specification of hyperparamaters. This feature has potential applications in the broader Bayesian sufficient dimension reduction area. Second, sampling from the resulting posterior distribution can be achieved by using a block Gibbs sampler with standard associated conditionals. This in turn facilitates computationally efficient estimation and model selection. Third, unlike the current frequentist approach, our approach can accommodate situations where the sample size is smaller than the number of responses. Lastly, the Bayesian approach inherently offers comprehensive uncertainty characterization through the posterior distribution. We illustrate the utility of our approach on simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1487667621_20170221040055Tue, 21 Feb 2017 04:00 ESTMonge–Kantorovich depth, quantiles, ranks and signshttp://projecteuclid.org/euclid.aos/1487667622<strong>Victor Chernozhukov</strong>, <strong>Alfred Galichon</strong>, <strong>Marc Hallin</strong>, <strong>Marc Henry</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 223--256.</p><p><strong>Abstract:</strong><br/>
We propose new concepts of statistical depth, multivariate quantiles, vector quantiles and ranks, ranks and signs, based on canonical transportation maps between a distribution of interest on $\mathbb{R}^{d}$ and a reference distribution on the $d$-dimensional unit ball. The new depth concept, called Monge–Kantorovich depth , specializes to halfspace depth for $d=1$ and in the case of spherical distributions, but for more general distributions, differs from the latter in the ability for its contours to account for non-convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge–Kantorovich depth contours, quantiles, ranks, signs and vector quantiles and ranks, and show their consistency by establishing a uniform convergence property for empirical (forward and reverse) transport maps, which is the main theoretical result of this paper.
</p>projecteuclid.org/euclid.aos/1487667622_20170221040055Tue, 21 Feb 2017 04:00 ESTIdentifying the number of factors from singular values of a large sample auto-covariance matrixhttp://projecteuclid.org/euclid.aos/1487667623<strong>Zeng Li</strong>, <strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 257--288.</p><p><strong>Abstract:</strong><br/>
Identifying the number of factors in a high-dimensional factor model has attracted much attention in recent years and a general solution to the problem is still lacking. A promising ratio estimator based on singular values of lagged sample auto-covariance matrices has been recently proposed in the literature with a reasonably good performance under some specific assumption on the strength of the factors. Inspired by this ratio estimator and as a first main contribution, this paper proposes a complete theory of such sample singular values for both the factor part and the noise part under the large-dimensional scheme where the dimension and the sample size proportionally grow to infinity. In particular, we provide an exact description of the phase transition phenomenon that determines whether a factor is strong enough to be detected with the observed sample singular values. Based on these findings and as a second main contribution of the paper, we propose a new estimator of the number of factors which is strongly consistent for the detection of all significant factors (which are the only theoretically detectable ones). In particular, factors are assumed to have the minimum strength above the phase transition boundary which is of the order of a constant; they are thus not required to grow to infinity together with the dimension (as assumed in most of the existing papers on high-dimensional factor models). Empirical Monte-Carlo study as well as the analysis of stock returns data attest a very good performance of the proposed estimator. In all the tested cases, the new estimator largely outperforms the existing estimator using the same ratios of singular values.
</p>projecteuclid.org/euclid.aos/1487667623_20170221040055Tue, 21 Feb 2017 04:00 ESTConsistency of spectral hypergraph partitioning under planted partition modelhttp://projecteuclid.org/euclid.aos/1487667624<strong>Debarghya Ghoshdastidar</strong>, <strong>Ambedkar Dukkipati</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 289--315.</p><p><strong>Abstract:</strong><br/>
Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. Many algorithms for hypergraph partitioning have been proposed that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph partitioning. For instance, consistency results of spectral graph partitioning under the stochastic block model are well known. In this paper, we present a planted partition model for sparse random nonuniform hypergraphs that generalizes the stochastic block model. We derive an error bound for a spectral hypergraph partitioning algorithm under this model using matrix concentration inequalities. To the best of our knowledge, this is the first consistency result related to partitioning nonuniform hypergraphs.
</p>projecteuclid.org/euclid.aos/1487667624_20170221040055Tue, 21 Feb 2017 04:00 ESTOracle inequalities for network models and sparse graphon estimationhttp://projecteuclid.org/euclid.aos/1487667625<strong>Olga Klopp</strong>, <strong>Alexandre B. Tsybakov</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 316--354.</p><p><strong>Abstract:</strong><br/>
Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. We consider the problem of statistical estimation of the matrix of connection probabilities based on the observations of the adjacency matrix of the network. Taking the stochastic block model as an approximation, we construct estimators of network connection probabilities—the ordinary block constant least squares estimator, and its restricted version. We show that they satisfy oracle inequalities with respect to the block constant oracle. As a consequence, we derive optimal rates of estimation of the probability matrix. Our results cover the important setting of sparse networks. Another consequence consists in establishing upper bounds on the minimax risks for graphon estimation in the $L_{2}$ norm when the probability matrix is sampled according to a graphon model. These bounds include an additional term accounting for the “agnostic” error induced by the variability of the latent unobserved variables of the graphon model. In this setting, the optimal rates are influenced not only by the bias and variance components as in usual nonparametric problems but also include the third component, which is the agnostic error. The results shed light on the differences between estimation under the empirical loss (the probability matrix estimation) and under the integrated loss (the graphon estimation).
</p>projecteuclid.org/euclid.aos/1487667625_20170221040055Tue, 21 Feb 2017 04:00 ESTApproximate group context treehttp://projecteuclid.org/euclid.aos/1487667626<strong>Alexandre Belloni</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 355--385.</p><p><strong>Abstract:</strong><br/>
We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $\beta$-mixing. In particular, model misspecification is allowed.
These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes’ continuity rates. Similar guarantees are also derived for renewal processes.
Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work by Galves et al. [ Ann. Appl. Stat. 6 (2012) 186–209], of the rhythmic differences between Brazilian and European Portuguese.
</p>projecteuclid.org/euclid.aos/1487667626_20170221040055Tue, 21 Feb 2017 04:00 ESTFlexible results for quadratic forms with applications to variance components estimationhttp://projecteuclid.org/euclid.aos/1487667627<strong>Lee H. Dicker</strong>, <strong>Murat A. Erdogdu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 386--414.</p><p><strong>Abstract:</strong><br/>
We derive convenient uniform concentration bounds and finite sample multivariate normal approximation results for quadratic forms, then describe some applications involving variance components estimation in linear random-effects models. Random-effects models and variance components estimation are classical topics in statistics, with a corresponding well-established asymptotic theory. However, our finite sample results for quadratic forms provide additional flexibility for easily analyzing random-effects models in nonstandard settings, which are becoming more important in modern applications (e.g., genomics). For instance, in addition to deriving novel non-asymptotic bounds for variance components estimators in classical linear random-effects models, we provide a concentration bound for variance components estimators in linear models with correlated random-effects and discuss an application involving sparse random-effects models. Our general concentration bound is a uniform version of the Hanson–Wright inequality. The main normal approximation result in the paper is derived using Reinert and Röllin [ Ann. Probab. (2009) 37 2150–2173] embedding technique for Stein’s method of exchangeable pairs.
</p>projecteuclid.org/euclid.aos/1487667627_20170221040055Tue, 21 Feb 2017 04:00 ESTExtreme eigenvalues of large-dimensional spiked Fisher matrices with applicationhttp://projecteuclid.org/euclid.aos/1487667628<strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 415--460.</p><p><strong>Abstract:</strong><br/>
Consider two $p$-variate populations, not necessarily Gaussian, with covariance matrices $\Sigma_{1}$ and $\Sigma_{2}$, respectively. Let $S_{1}$ and $S_{2}$ be the corresponding sample covariance matrices with degrees of freedom $m$ and $n$. When the difference $\Delta$ between $\Sigma_{1}$ and $\Sigma_{2}$ is of small rank compared to $p,m$ and $n$, the Fisher matrix $S:=S_{2}^{-1}S_{1}$ is called a spiked Fisher matrix . When $p,m$ and $n$ grow to infinity proportionally, we establish a phase transition for the extreme eigenvalues of the Fisher matrix: a displacement formula showing that when the eigenvalues of $\Delta$ ( spikes ) are above (or under) a critical value, the associated extreme eigenvalues of $S$ will converge to some point outside the support of the global limit (LSD) of other eigenvalues (become outliers); otherwise, they will converge to the edge points of the LSD. Furthermore, we derive central limit theorems for those outlier eigenvalues of $S$. The limiting distributions are found to be Gaussian if and only if the corresponding population spike eigenvalues in $\Delta$ are simple . Two applications are introduced. The first application uses the largest eigenvalue of the Fisher matrix to test the equality between two high-dimensional covariance matrices, and explicit power function is found under the spiked alternative. The second application is in the field of signal detection, where an estimator for the number of signals is proposed while the covariance structure of the noise is arbitrary.
</p>projecteuclid.org/euclid.aos/1487667628_20170221040055Tue, 21 Feb 2017 04:00 ESTMimicking counterfactual outcomes to estimate causal effectshttp://projecteuclid.org/euclid.aos/1494921947<strong>Judith J. Lok</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 461--499.</p><p><strong>Abstract:</strong><br/>
In observational studies, treatment may be adapted to covariates at several times without a fixed protocol, in continuous time. Treatment influences covariates, which influence treatment, which influences covariates and so on. Then even time-dependent Cox-models cannot be used to estimate the net treatment effect. Structural nested models have been applied in this setting. Structural nested models are based on counterfactuals: the outcome a person would have had had treatment been withheld after a certain time. Previous work on continuous-time structural nested models assumes that counterfactuals depend deterministically on observed data, while conjecturing that this assumption can be relaxed. This article proves that one can mimic counterfactuals by constructing random variables, solutions to a differential equation, that have the same distribution as the counterfactuals, even given past observed data. These “mimicking” variables can be used to estimate the parameters of structural nested models without assuming the treatment effect to be deterministic.
</p>projecteuclid.org/euclid.aos/1494921947_20170516040557Tue, 16 May 2017 04:05 EDTLikelihood-based model selection for stochastic block modelshttp://projecteuclid.org/euclid.aos/1494921948<strong>Y. X. Rachel Wang</strong>, <strong>Peter J. Bickel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 500--528.</p><p><strong>Abstract:</strong><br/>
The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model (DCSBM). In practice, the likelihood function can be estimated using more computationally efficient variational methods or consistent label estimation algorithms, allowing the criterion to be applied to large networks.
</p>projecteuclid.org/euclid.aos/1494921948_20170516040557Tue, 16 May 2017 04:05 EDTMultiple testing of local maxima for detection of peaks in random fieldshttp://projecteuclid.org/euclid.aos/1494921949<strong>Dan Cheng</strong>, <strong>Armin Schwartzman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 529--556.</p><p><strong>Abstract:</strong><br/>
A topological multiple testing scheme is presented for detecting peaks in images under stationary ergodic Gaussian noise, where tests are performed at local maxima of the smoothed observed signals. The procedure generalizes the one-dimensional scheme of Schwartzman, Gavrilov and Adler [ Ann. Statist. 39 (2011) 3290–3319] to Euclidean domains of arbitrary dimension. Two methods are developed according to two different ways of computing p-values: (i) using the exact distribution of the height of local maxima, available explicitly when the noise field is isotropic [ Extremes 18 (2015) 213–240; Expected number and height distribution of critical points of smooth isotropic Gaussian random fields (2015) Preprint]; (ii) using an approximation to the overshoot distribution of local maxima above a pre-threshold, applicable when the exact distribution is unknown, such as when the stationary noise field is nonisotropic [ Extremes 18 (2015) 213–240]. The algorithms, combined with the Benjamini–Hochberg procedure for thresholding p-values, provide asymptotic strong control of the False Discovery Rate (FDR) and power consistency, with specific rates, as the search space and signal strength get large. The optimal smoothing bandwidth and optimal pre-threshold are obtained to achieve maximum power. Simulations show that FDR levels are maintained in nonasymptotic conditions. The methods are illustrated in the analysis of functional magnetic resonance images of the brain.
</p>projecteuclid.org/euclid.aos/1494921949_20170516040557Tue, 16 May 2017 04:05 EDTA rate optimal procedure for recovering sparse differences between high-dimensional means under dependencehttp://projecteuclid.org/euclid.aos/1494921950<strong>Jun Li</strong>, <strong>Ping-Shou Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 557--590.</p><p><strong>Abstract:</strong><br/>
The paper considers the problem of recovering the sparse different components between two high-dimensional means of column-wise dependent random vectors. We show that dependence can be utilized to lower the identification boundary for signal recovery. Moreover, an optimal convergence rate for the marginal false nondiscovery rate (mFNR) is established under dependence. The convergence rate is faster than the optimal rate without dependence. To recover the sparse signal bearing dimensions, we propose a Dependence-Assisted Thresholding and Excising (DATE) procedure, which is shown to be rate optimal for the mFNR with the marginal false discovery rate (mFDR) controlled at a pre-specified level. Extensions of the DATE to recover the differences in contrasts among multiple population means and differences between two covariance matrices are also provided. Simulation studies and case study are given to demonstrate the performance of the proposed signal identification procedure.
</p>projecteuclid.org/euclid.aos/1494921950_20170516040557Tue, 16 May 2017 04:05 EDTOnline estimation of the geometric median in Hilbert spaces: Nonasymptotic confidence ballshttp://projecteuclid.org/euclid.aos/1494921951<strong>Hervé Cardot</strong>, <strong>Peggy Cénac</strong>, <strong>Antoine Godichon-Baggioni</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 591--614.</p><p><strong>Abstract:</strong><br/>
Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with very large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged nonlinear stochastic gradient algorithm has been developed by [ Bernoulli 19 (2013) 18–43]. This work aims at studying more precisely the nonasymptotic behavior of this nonlinear algorithm by giving nonasymptotic confidence balls in general separable Hilbert spaces. This new result is based on the derivation of improved $L^{2}$ rates of convergence as well as an exponential inequality for the nearly martingale terms of the recursive nonlinear Robbins–Monro algorithm.
</p>projecteuclid.org/euclid.aos/1494921951_20170516040557Tue, 16 May 2017 04:05 EDTConfidence intervals for high-dimensional linear regression: Minimax rates and adaptivityhttp://projecteuclid.org/euclid.aos/1494921952<strong>T. Tony Cai</strong>, <strong>Zijian Guo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 615--646.</p><p><strong>Abstract:</strong><br/>
Confidence sets play a fundamental role in statistical inference. In this paper, we consider confidence intervals for high-dimensional linear regression with random design. We first establish the convergence rates of the minimax expected length for confidence intervals in the oracle setting where the sparsity parameter is given. The focus is then on the problem of adaptation to sparsity for the construction of confidence intervals. Ideally, an adaptive confidence interval should have its length automatically adjusted to the sparsity of the unknown regression vector, while maintaining a pre-specified coverage probability. It is shown that such a goal is in general not attainable, except when the sparsity parameter is restricted to a small region over which the confidence intervals have the optimal length of the usual parametric rate. It is further demonstrated that the lack of adaptivity is not due to the conservativeness of the minimax framework, but is fundamentally caused by the difficulty of learning the bias accurately.
</p>projecteuclid.org/euclid.aos/1494921952_20170516040557Tue, 16 May 2017 04:05 EDTEstimating the effect of joint interventions from observational data in sparse high-dimensional settingshttp://projecteuclid.org/euclid.aos/1494921953<strong>Preetam Nandy</strong>, <strong>Marloes H. Maathuis</strong>, <strong>Thomas S. Richardson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 647--674.</p><p><strong>Abstract:</strong><br/>
We consider the estimation of joint causal effects from observational data. In particular, we propose new methods to estimate the effect of multiple simultaneous interventions (e.g., multiple gene knockouts), under the assumption that the observational data come from an unknown linear structural equation model with independent errors. We derive asymptotic variances of our estimators when the underlying causal structure is partly known, as well as high-dimensional consistency when the causal structure is fully unknown and the joint distribution is multivariate Gaussian. We also propose a generalization of our methodology to the class of nonparanormal distributions. We evaluate the estimators in simulation studies and also illustrate them on data from the DREAM4 challenge.
</p>projecteuclid.org/euclid.aos/1494921953_20170516040557Tue, 16 May 2017 04:05 EDTIdentifiability of restricted latent class models with binary responseshttp://projecteuclid.org/euclid.aos/1494921954<strong>Gongjun Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 675--707.</p><p><strong>Abstract:</strong><br/>
Statistical latent class models are widely used in social and psychological researches, yet it is often difficult to establish the identifiability of the model parameters. In this paper, we consider the identifiability issue of a family of restricted latent class models, where the restriction structures are needed to reflect pre-specified assumptions on the related assessment. We establish the identifiability results in the strict sense and specify which types of restriction structure would give the identifiability of the model parameters. The results not only guarantee the validity of many of the popularly used models, but also provide a guideline for the related experimental design, where in the current applications the design is usually experience based and identifiability is not guaranteed. Theoretically, we develop a new technique to establish the identifiability result, which may be extended to other restricted latent class models.
</p>projecteuclid.org/euclid.aos/1494921954_20170516040557Tue, 16 May 2017 04:05 EDTA Bernstein-type inequality for some mixing processes and dynamical systems with an application to learninghttp://projecteuclid.org/euclid.aos/1494921955<strong>Hanyuan Hang</strong>, <strong>Ingo Steinwart</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 708--743.</p><p><strong>Abstract:</strong><br/>
We establish a Bernstein-type inequality for a class of stochastic processes that includes the classical geometrically $\phi$-mixing processes, Rio’s generalization of these processes and many time-discrete dynamical systems. Modulo a logarithmic factor and some constants, our Bernstein-type inequality coincides with the classical Bernstein inequality for i.i.d. data. We further use this new Bernstein-type inequality to derive an oracle inequality for generic regularized empirical risk minimization algorithms and data generated by such processes. Applying this oracle inequality to support vector machines using the Gaussian kernels for binary classification, we obtain essentially the same rate as for i.i.d. processes, and for least squares and quantile regression; it turns out that the resulting learning rates match, up to some arbitrarily small extra term in the exponent, the optimal rates for i.i.d. processes.
</p>projecteuclid.org/euclid.aos/1494921955_20170516040557Tue, 16 May 2017 04:05 EDTConsistency of likelihood estimation for Gibbs point processeshttp://projecteuclid.org/euclid.aos/1494921956<strong>David Dereudre</strong>, <strong>Frédéric Lavancier</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 744--770.</p><p><strong>Abstract:</strong><br/>
Strong consistency of the maximum likelihood estimator (MLE) for parametric Gibbs point process models is established. The setting is very general. It includes pairwise pair potentials, finite and infinite multibody interactions and geometrical interactions, where the range can be finite or infinite. The Gibbs interaction may depend linearly or nonlinearly on the parameters, a particular case being hardcore parameters and interaction range parameters. As important examples, we deduce the consistency of the MLE for all parameters of the Strauss model, the hardcore Strauss model, the Lennard–Jones model and the area-interaction model.
</p>projecteuclid.org/euclid.aos/1494921956_20170516040557Tue, 16 May 2017 04:05 EDTTests for high-dimensional data based on means, spatial signs and spatial rankshttp://projecteuclid.org/euclid.aos/1494921957<strong>Anirvan Chakraborty</strong>, <strong>Probal Chaudhuri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 771--799.</p><p><strong>Abstract:</strong><br/>
Tests based on mean vectors and spatial signs and ranks for a zero mean in one-sample problems and for the equality of means in two-sample problems have been studied in the recent literature for high-dimensional data with the dimension larger than the sample size. For the above testing problems, we show that under suitable sequences of alternatives, the powers of the mean-based tests and the tests based on spatial signs and ranks tend to be same as the data dimension tends to infinity for any sample size when the coordinate variables satisfy appropriate mixing conditions. Further, their limiting powers do not depend on the heaviness of the tails of the distributions. This is in striking contrast to the asymptotic results obtained in the classical multivariate setting. On the other hand, we show that in the presence of stronger dependence among the coordinate variables, the spatial-sign- and rank-based tests for high-dimensional data can be asymptotically more powerful than the mean-based tests if, in addition to the data dimension, the sample size also tends to infinity. The sizes of some mean-based tests for high-dimensional data studied in the recent literature are observed to be significantly different from their nominal levels. This is due to the inadequacy of the asymptotic approximations used for the distributions of those test statistics. However, our asymptotic approximations for the tests based on spatial signs and ranks are observed to work well when the tests are applied on a variety of simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1494921957_20170516040557Tue, 16 May 2017 04:05 EDTInference on the mode of weak directional signals: A Le Cam perspective on hypothesis testing near singularitieshttp://projecteuclid.org/euclid.aos/1494921958<strong>Davy Paindaveine</strong>, <strong>Thomas Verdebout</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 800--832.</p><p><strong>Abstract:</strong><br/>
We revisit, in an original and challenging perspective, the problem of testing the null hypothesis that the mode of a directional signal is equal to a given value. Motivated by a real data example where the signal is weak, we consider this problem under asymptotic scenarios for which the signal strength goes to zero at an arbitrary rate $\eta_{n}$. Both under the null and the alternative, we focus on rotationally symmetric distributions. We show that, while they are asymptotically equivalent under fixed signal strength, the classical Wald and Watson tests exhibit very different (null and nonnull) behaviours when the signal becomes arbitrarily weak. To fully characterize how challenging the problem is as a function of $\eta_{n}$, we adopt a Le Cam, convergence-of-statistical-experiments, point of view and show that the resulting limiting experiments crucially depend on $\eta_{n}$. In the light of these results, the Watson test is shown to be adaptively rate-consistent and essentially adaptively Le Cam optimal. Throughout, our theoretical findings are illustrated via Monte-Carlo simulations. The practical relevance of our results is also shown on the real data example that motivated the present work.
</p>projecteuclid.org/euclid.aos/1494921958_20170516040557Tue, 16 May 2017 04:05 EDTAsymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimatorhttp://projecteuclid.org/euclid.aos/1494921959<strong>Judith Rousseau</strong>, <strong>Botond Szabo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 833--865.</p><p><strong>Abstract:</strong><br/>
We consider the asymptotic behaviour of the marginal maximum likelihood empirical Bayes posterior distribution in general setting. First, we characterize the set where the maximum marginal likelihood estimator is located with high probability. Then we provide oracle type of upper and lower bounds for the contraction rates of the empirical Bayes posterior. We also show that the hierarchical Bayes posterior achieves the same contraction rate as the maximum marginal likelihood empirical Bayes posterior. We demonstrate the applicability of our general results for various models and prior distributions by deriving upper and lower bounds for the contraction rates of the corresponding empirical and hierarchical Bayes posterior distributions.
</p>projecteuclid.org/euclid.aos/1494921959_20170516040557Tue, 16 May 2017 04:05 EDTStatistical consistency and asymptotic normality for high-dimensional robust $M$-estimatorshttp://projecteuclid.org/euclid.aos/1494921960<strong>Po-Ling Loh</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 866--896.</p><p><strong>Abstract:</strong><br/>
We study theoretical properties of regularized robust $M$-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an $\ell_{1}$-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex $M$-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex $M$-estimator to achieve consistency and a nonconvex $M$-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.
</p>projecteuclid.org/euclid.aos/1494921960_20170516040557Tue, 16 May 2017 04:05 EDTInteraction pursuit in high-dimensional multi-response regression via distance correlationhttp://projecteuclid.org/euclid.aos/1494921961<strong>Yinfei Kong</strong>, <strong>Daoji Li</strong>, <strong>Yingying Fan</strong>, <strong>Jinchi Lv</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 897--922.</p><p><strong>Abstract:</strong><br/>
Feature interactions can contribute to a large proportion of variation in many prediction models. In the era of big data, the coexistence of high dimensionality in both responses and covariates poses unprecedented challenges in identifying important interactions. In this paper, we suggest a two-stage interaction identification method, called the interaction pursuit via distance correlation (IPDC), in the setting of high-dimensional multi-response interaction models that exploits feature screening applied to transformed variables with distance correlation followed by feature selection. Such a procedure is computationally efficient, generally applicable beyond the heredity assumption, and effective even when the number of responses diverges with the sample size. Under mild regularity conditions, we show that this method enjoys nice theoretical properties including the sure screening property, support union recovery and oracle inequalities in prediction and estimation for both interactions and main effects. The advantages of our method are supported by several simulation studies and real data analysis.
</p>projecteuclid.org/euclid.aos/1494921961_20170516040557Tue, 16 May 2017 04:05 EDTMinimax estimation of linear and quadratic functionals on sparsity classeshttp://projecteuclid.org/euclid.aos/1497319684<strong>Olivier Collier</strong>, <strong>Laëtitia Comminges</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 923--958.</p><p><strong>Abstract:</strong><br/>
For the Gaussian sequence model, we obtain nonasymptotic minimax rates of estimation of the linear, quadratic and the $\ell_{2}$-norm functionals on classes of sparse vectors and construct optimal estimators that attain these rates. The main object of interest is the class $B_{0}(s)$ of $s$-sparse vectors $\theta=(\theta_{1},\dots,\theta_{d})$, for which we also provide completely adaptive estimators (independent of $s$ and of the noise variance $\sigma $) having logarithmically slower rates than the minimax ones. Furthermore, we obtain the minimax rates on the $\ell_{q}$-balls $B_{q}(r)=\{\theta\in\mathbb{R}^{d}:\|\theta\|_{q}\le r\}$ where $0<q\le2$, and $\|\theta\|_{q}=(\sum_{i=1}^{d}|\theta_{i}|^{q})^{1/q}$. This analysis shows that there are, in general, three zones in the rates of convergence that we call the sparse zone, the dense zone and the degenerate zone, while a fourth zone appears for estimation of the quadratic functional. We show that, as opposed to estimation of $\theta$, the correct logarithmic terms in the optimal rates for the sparse zone scale as $\log(d/s^{2})$ and not as $\log(d/s)$. For the class $B_{0}(s)$, the rates of estimation of the linear functional and of the $\ell_{2}$-norm have a simple elbow at $s=\sqrt{d}$ (boundary between the sparse and the dense zones) and exhibit similar performances, whereas the estimation of the quadratic functional $Q(\theta)$ reveals more complex effects: the minimax risk on $B_{0}(s)$ is infinite and the sparseness assumption needs to be combined with a bound on the $\ell_{2}$-norm. Finally, we apply our results on estimation of the $\ell_{2}$-norm to the problem of testing against sparse alternatives. In particular, we obtain a nonasymptotic analog of the Ingster–Donoho–Jin theory revealing some effects that were not captured by the previous asymptotic analysis.
</p>projecteuclid.org/euclid.aos/1497319684_20170612220820Mon, 12 Jun 2017 22:08 EDTSupport consistency of direct sparse-change learning in Markov networkshttp://projecteuclid.org/euclid.aos/1497319685<strong>Song Liu</strong>, <strong>Taiji Suzuki</strong>, <strong>Raissa Relator</strong>, <strong>Jun Sese</strong>, <strong>Masashi Sugiyama</strong>, <strong>Kenji Fukumizu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 959--990.</p><p><strong>Abstract:</strong><br/>
We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models. In this paper, we give sufficient conditions for successful change detection with respect to the sample size $n_{p},n_{q}$, the dimension of data $m$ and the number of changed edges $d$. When using an unbounded density ratio model, we prove that the true sparse changes can be consistently identified for $n_{p}=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ and $n_{q}=\Omega({n_{p}^{2}})$, with an exponentially decaying upper-bound on learning error. Such sample complexity can be improved to $\min(n_{p},n_{q})=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ when the boundedness of the density ratio model is assumed. Our theoretical guarantee can be applied to a wide range of discrete/continuous Markov networks.
</p>projecteuclid.org/euclid.aos/1497319685_20170612220820Mon, 12 Jun 2017 22:08 EDTRandomized sketches for kernels: Fast and optimal nonparametric regressionhttp://projecteuclid.org/euclid.aos/1497319686<strong>Yun Yang</strong>, <strong>Mert Pilanci</strong>, <strong>Martin J. Wainwright</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 991--1023.</p><p><strong>Abstract:</strong><br/>
Kernel ridge regression (KRR) is a standard method for performing nonparametric regression over reproducing kernel Hilbert spaces. Given $n$ samples, the time and space complexity of computing the KRR estimate scale as $\mathcal{O}(n^{3})$ and $\mathcal{O}(n^{2})$, respectively, and so is prohibitive in many cases. We propose approximations of KRR based on $m$-dimensional randomized sketches of the kernel matrix, and study how small the projection dimension $m$ can be chosen while still preserving minimax optimality of the approximate KRR estimate. For various classes of randomized sketches, including those based on Gaussian and randomized Hadamard matrices, we prove that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors). Thus, we obtain fast and minimax optimal approximations to the KRR estimate for nonparametric regression. In doing so, we prove a novel lower bound on the minimax risk of kernel regression in terms of the localized Rademacher complexity.
</p>projecteuclid.org/euclid.aos/1497319686_20170612220820Mon, 12 Jun 2017 22:08 EDTTesting uniformity on high-dimensional spheres against monotone rotationally symmetric alternativeshttp://projecteuclid.org/euclid.aos/1497319687<strong>Christine Cutting</strong>, <strong>Davy Paindaveine</strong>, <strong>Thomas Verdebout</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1024--1058.</p><p><strong>Abstract:</strong><br/>
We consider the problem of testing uniformity on high-dimensional unit spheres. We are primarily interested in nonnull issues. We show that rotationally symmetric alternatives lead to two Local Asymptotic Normality (LAN) structures. The first one is for fixed modal location $\mathbf{{\theta}}$ and allows to derive locally asymptotically most powerful tests under specified $\mathbf{{\theta}}$. The second one, that addresses the Fisher–von Mises–Langevin (FvML) case, relates to the unspecified-$\mathbf{{\theta}}$ problem and shows that the high-dimensional Rayleigh test is locally asymptotically most powerful invariant. Under mild assumptions, we derive the asymptotic nonnull distribution of this test, which allows to extend away from the FvML case the asymptotic powers obtained there from Le Cam’s third lemma. Throughout, we allow the dimension $p$ to go to infinity in an arbitrary way as a function of the sample size $n$. Some of our results also strengthen the local optimality properties of the Rayleigh test in low dimensions. We perform a Monte Carlo study to illustrate our asymptotic results. Finally, we treat an application related to testing for sphericity in high dimensions.
</p>projecteuclid.org/euclid.aos/1497319687_20170612220820Mon, 12 Jun 2017 22:08 EDTNonlinear sufficient dimension reduction for functional datahttp://projecteuclid.org/euclid.aos/1497319688<strong>Bing Li</strong>, <strong>Jun Song</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1059--1095.</p><p><strong>Abstract:</strong><br/>
We propose a general theory and the estimation procedures for nonlinear sufficient dimension reduction where both the predictor and the response may be random functions. The relation between the response and predictor can be arbitrary and the sets of observed time points can vary from subject to subject. The functional and nonlinear nature of the problem leads to construction of two functional spaces: the first representing the functional data, assumed to be a Hilbert space, and the second characterizing nonlinearity, assumed to be a reproducing kernel Hilbert space. A particularly attractive feature of our construction is that the two spaces are nested, in the sense that the kernel for the second space is determined by the inner product of the first. We propose two estimators for this general dimension reduction problem, and establish the consistency and convergence rate for one of them. These asymptotic results are flexible enough to accommodate both fully and partially observed functional data. We investigate the performances of our estimators by simulations, and applied them to data sets about speech recognition and handwritten symbols.
</p>projecteuclid.org/euclid.aos/1497319688_20170612220820Mon, 12 Jun 2017 22:08 EDTNetwork vector autoregressionhttp://projecteuclid.org/euclid.aos/1497319689<strong>Xuening Zhu</strong>, <strong>Rui Pan</strong>, <strong>Guodong Li</strong>, <strong>Yuewen Liu</strong>, <strong>Hansheng Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1096--1123.</p><p><strong>Abstract:</strong><br/>
We consider here a large-scale social network with a continuous response observed for each node at equally spaced time points. The responses from different nodes constitute an ultra-high dimensional vector, whose time series dynamic is to be investigated. In addition, the network structure is also taken into consideration, for which we propose a network vector autoregressive (NAR) model. The NAR model assumes each node’s response at a given time point as a linear combination of (a) its previous value, (b) the average of its connected neighbors, (c) a set of node-specific covariates and (d) an independent noise. The corresponding coefficients are referred to as the momentum effect, the network effect and the nodal effect, respectively. Conditions for strict stationarity of the NAR models are obtained. In order to estimate the NAR model, an ordinary least squares type estimator is developed, and its asymptotic properties are investigated. We further illustrate the usefulness of the NAR model through a number of interesting potential applications. Simulation studies and an empirical example are presented.
</p>projecteuclid.org/euclid.aos/1497319689_20170612220820Mon, 12 Jun 2017 22:08 EDTOn coverage and local radial rates of credible setshttp://projecteuclid.org/euclid.aos/1497319690<strong>Eduard Belitser</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1124--1151.</p><p><strong>Abstract:</strong><br/>
In the mildly ill-posed inverse signal-in-white-noise model, we construct confidence sets as credible balls with respect to the empirical Bayes posterior resulting from a certain two-level hierarchical prior. The quality of the posterior is characterized by the contraction rate which we allow to be local, that is, depending on the parameter. The issue of optimality of the constructed confidence sets is addressed via a trade-off between its “size” (the local radial rate ) and its coverage probability. We introduce excessive bias restriction (EBR), more general than self-similarity and polished tail condition recently studied in the literature. Under EBR, we establish the confidence optimality of our credible set with some local ( oracle ) radial rate. We also derive the oracle estimation inequality and the oracle posterior contraction rate. The obtained local results are more powerful than global: adaptive minimax results for a number of smoothness scales follow as consequence, in particular, the ones considered by Szabó et al. [ Ann. Statist. 43 (2015) 1391–1428].
</p>projecteuclid.org/euclid.aos/1497319690_20170612220820Mon, 12 Jun 2017 22:08 EDTTotal positivity in Markov structureshttp://projecteuclid.org/euclid.aos/1497319691<strong>Shaun Fallat</strong>, <strong>Steffen Lauritzen</strong>, <strong>Kayvan Sadeghi</strong>, <strong>Caroline Uhler</strong>, <strong>Nanny Wermuth</strong>, <strong>Piotr Zwiernik</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1152--1184.</p><p><strong>Abstract:</strong><br/>
We discuss properties of distributions that are multivariate totally positive of order two ($\mathrm{MTP}_{2}$) related to conditional independence. In particular, we show that any independence model generated by an $\mathrm{MTP}_{2}$ distribution is a compositional semi-graphoid which is upward-stable and singleton-transitive. In addition, we prove that any $\mathrm{MTP}_{2}$ distribution satisfying an appropriate support condition is faithful to its concentration graph. Finally, we analyze factorization properties of $\mathrm{MTP}_{2}$ distributions and discuss ways of constructing $\mathrm{MTP}_{2}$ distributions; in particular, we give conditions on the log-linear parameters of a discrete distribution which ensure $\mathrm{MTP}_{2}$ and characterize conditional Gaussian distributions which satisfy $\mathrm{MTP}_{2}$.
</p>projecteuclid.org/euclid.aos/1497319691_20170612220820Mon, 12 Jun 2017 22:08 EDTTests for covariance structures with high-dimensional repeated measurementshttp://projecteuclid.org/euclid.aos/1497319692<strong>Ping-Shou Zhong</strong>, <strong>Wei Lan</strong>, <strong>Peter X. K. Song</strong>, <strong>Chih-Ling Tsai</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1185--1213.</p><p><strong>Abstract:</strong><br/>
In regression analysis with repeated measurements, such as longitudinal data and panel data, structured covariance matrices characterized by a small number of parameters have been widely used and play an important role in parameter estimation and statistical inference. To assess the adequacy of a specified covariance structure, one often adopts the classical likelihood-ratio test when the dimension of the repeated measurements ($p$) is smaller than the sample size ($n$). However, this assessment becomes quite challenging when $p$ is bigger than $n$, since the classical likelihood-ratio test is no longer applicable. This paper proposes an adjusted goodness-of-fit test to examine a broad range of covariance structures under the scenario of “large $p$, small $n$.” Analytical examples are presented to illustrate the effectiveness of the adjustment. In addition, large sample properties of the proposed test are established. Moreover, simulation studies and a real data example are provided to demonstrate the finite sample performance and the practical utility of the test.
</p>projecteuclid.org/euclid.aos/1497319692_20170612220820Mon, 12 Jun 2017 22:08 EDTWeak signal identification and inference in penalized model selectionhttp://projecteuclid.org/euclid.aos/1497319693<strong>Peibei Shi</strong>, <strong>Annie Qu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1214--1253.</p><p><strong>Abstract:</strong><br/>
Weak signal identification and inference are very important in the area of penalized model selection, yet they are underdeveloped and not well studied. Existing inference procedures for penalized estimators are mainly focused on strong signals. In this paper, we propose an identification procedure for weak signals in finite samples, and provide a transition phase in-between noise and strong signal strengths. We also introduce a new two-step inferential method to construct better confidence intervals for the identified weak signals. Our theory development assumes that variables are orthogonally designed. Both theory and numerical studies indicate that the proposed method leads to better confidence coverage for weak signals, compared with those using asymptotic inference. In addition, the proposed method outperforms the perturbation and bootstrap resampling approaches. We illustrate our method for HIV antiretroviral drug susceptibility data to identify genetic mutations associated with HIV drug resistance.
</p>projecteuclid.org/euclid.aos/1497319693_20170612220820Mon, 12 Jun 2017 22:08 EDTSemimartingale detection and goodness-of-fit testshttp://projecteuclid.org/euclid.aos/1497319694<strong>Adam D. Bull</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1254--1283.</p><p><strong>Abstract:</strong><br/>
In quantitative finance, we often fit a parametric semimartingale model to asset prices. To ensure our model is correct, we must then perform goodness-of-fit tests. In this paper, we give a new goodness-of-fit test for volatility-like processes, which is easily applied to a variety of semimartingale models. In each case, we reduce the problem to the detection of a semimartingale observed under noise. In this setting, we then describe a wavelet-thresholding test, which obtains adaptive and near-optimal detection rates.
</p>projecteuclid.org/euclid.aos/1497319694_20170612220820Mon, 12 Jun 2017 22:08 EDTTesting for time-varying jump activity for pure jump semimartingaleshttp://projecteuclid.org/euclid.aos/1497319695<strong>Viktor Todorov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1284--1311.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a test for deciding whether the jump activity index of a discretely observed Itô semimartingale of pure-jump type (i.e., one without a diffusion) varies over a fixed interval of time. The asymptotic setting is based on observations within a fixed time interval with mesh of the observation grid shrinking to zero. The test is derived for semimartingales whose “spot” jump compensator around zero is like that of a stable process, but importantly the stability index can vary over the time interval. The test is based on forming a sequence of local estimators of the jump activity over blocks of shrinking time span and contrasting their variability around a global activity estimator based on the whole data set. The local and global jump activity estimates are constructed from the real part of the empirical characteristic function of the increments of the process scaled by local power variations. We derive the asymptotic distribution of the test statistic under the null hypothesis of constant jump activity and show that the test has asymptotic power of one against fixed alternatives of processes with time-varying jump activity.
</p>projecteuclid.org/euclid.aos/1497319695_20170612220820Mon, 12 Jun 2017 22:08 EDTOperational time and in-sample density forecastinghttp://projecteuclid.org/euclid.aos/1497319696<strong>Young K. Lee</strong>, <strong>Enno Mammen</strong>, <strong>Jens P. Nielsen</strong>, <strong>Byeong U. Park</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1312--1341.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider a new structural model for in-sample density forecasting. In-sample density forecasting is to estimate a structured density on a region where data are observed and then reuse the estimated structured density on some region where data are not observed. Our structural assumption is that the density is a product of one-dimensional functions with one function sitting on the scale of a transformed space of observations. The transformation involves another unknown one-dimensional function, so that our model is formulated via a known smooth function of three underlying unknown one-dimensional functions. We present an innovative way of estimating the one-dimensional functions and show that all the estimators of the three components achieve the optimal one-dimensional rate of convergence. We illustrate how one can use our approach by analyzing a real dataset, and also verify the tractable finite sample performance of the method via a simulation study.
</p>projecteuclid.org/euclid.aos/1497319696_20170612220820Mon, 12 Jun 2017 22:08 EDTAsymptotics of empirical eigenstructure for high dimensional spiked covariancehttp://projecteuclid.org/euclid.aos/1497319697<strong>Weichen Wang</strong>, <strong>Jianqing Fan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1342--1374.</p><p><strong>Abstract:</strong><br/>
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size and dimensionality play in principal component analysis. Our results are a natural extension of those in [ Statist. Sinica 17 (2007) 1617–1642] to a more general setting and solve the rates of convergence problems in [ Statist. Sinica 26 (2016) 1747–1770]. They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called Shrinkage Principal Orthogonal complEment Thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks for large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.
</p>projecteuclid.org/euclid.aos/1497319697_20170612220820Mon, 12 Jun 2017 22:08 EDT