The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTNonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficientshttps://projecteuclid.org/euclid.aos/1509436835<strong>Ying Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2274--2298.</p><p><strong>Abstract:</strong><br/>
We consider a two-step projection based Lasso procedure for estimating a partially linear regression model where the number of coefficients in the linear component can exceed the sample size and these coefficients belong to the $l_{q}$-“balls” for $q\in[0,1]$. Our theoretical results regarding the properties of the estimators are nonasymptotic. In particular, we establish a new nonasymptotic “oracle” result: Although the error of the nonparametric projection per se (with respect to the prediction norm) has the scaling $t_{n}$ in the first step, it only contributes a scaling $t_{n}^{2}$ in the $l_{2}$-error of the second-step estimator for the linear coefficients. This new “oracle” result holds for a large family of nonparametric least squares procedures and regularized nonparametric least squares procedures for the first-step estimation and the driver behind it lies in the projection strategy. We specialize our analysis to the estimation of a semiparametric sample selection model and provide a simple method with theoretical guarantees for choosing the regularization parameter in practice.
</p>projecteuclid.org/euclid.aos/1509436835_20171031040040Tue, 31 Oct 2017 04:00 EDTA likelihood ratio framework for high-dimensional semiparametric regressionhttps://projecteuclid.org/euclid.aos/1513328574<strong>Yang Ning</strong>, <strong>Tianqi Zhao</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2299--2327.</p><p><strong>Abstract:</strong><br/>
We propose a new inferential framework for high-dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high-dimensional data analysis, including incomplete data, selection bias and heterogeneity. Our work has three main contributions: (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new likelihood ratio based framework to construct post-regularization confidence regions and tests for the low dimensional components of high-dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We further extend the theoretical results to the problems of missing data and multiple datasets inference. Extensive simulation studies and real data analysis are provided to illustrate the proposed approach.
</p>projecteuclid.org/euclid.aos/1513328574_20171215040315Fri, 15 Dec 2017 04:03 ESTA new perspective on boosting in linear regression via subgradient optimization and relativeshttps://projecteuclid.org/euclid.aos/1513328575<strong>Robert M. Freund</strong>, <strong>Paul Grigas</strong>, <strong>Rahul Mazumder</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2328--2364.</p><p><strong>Abstract:</strong><br/>
We analyze boosting algorithms [ Ann. Statist. 29 (2001) 1189–1232; Ann. Statist. 28 (2000) 337–407; Ann. Statist. 32 (2004) 407–499] in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm ($\text{FS}_{\varepsilon}$) and least squares boosting [LS-BOOST$(\varepsilon)$], can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a minor modification of $\text{FS}_{\varepsilon}$ that yields an algorithm for the LASSO, and that may be easily extended to an algorithm that computes the LASSO path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the LASSO may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-BOOST$(\varepsilon)$ and $\text{FS}_{\varepsilon}$) by using techniques of first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular, they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset.
</p>projecteuclid.org/euclid.aos/1513328575_20171215040315Fri, 15 Dec 2017 04:03 ESTOn the validity of resampling methods under long memoryhttps://projecteuclid.org/euclid.aos/1513328576<strong>Shuyang Bai</strong>, <strong>Murad S. Taqqu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2365--2399.</p><p><strong>Abstract:</strong><br/>
For long-memory time series, inference based on resampling is of crucial importance, since the asymptotic distribution can often be non-Gaussian and is difficult to determine statistically. However, due to the strong dependence, establishing the asymptotic validity of resampling methods is nontrivial. In this paper, we derive an efficient bound for the canonical correlation between two finite blocks of a long-memory time series. We show how this bound can be applied to establish the asymptotic consistency of subsampling procedures for general statistics under long memory. It allows the subsample size $b$ to be $o(n)$, where $n$ is the sample size, irrespective of the strength of the memory. We are then able to improve many results found in the literature. We also consider applications of subsampling procedures under long memory to the sample covariance, M-estimation and empirical processes.
</p>projecteuclid.org/euclid.aos/1513328576_20171215040315Fri, 15 Dec 2017 04:03 ESTCoCoLasso for high-dimensional error-in-variables regressionhttps://projecteuclid.org/euclid.aos/1513328577<strong>Abhirup Datta</strong>, <strong>Hui Zou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2400--2426.</p><p><strong>Abstract:</strong><br/>
Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright [ Ann. Statist. 40 (2012) 1637–1664] proposed a nonconvex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel calibrated cross-validation technique by using the basic idea in CoCoLasso. The calibrated cross-validation has its own importance. We demonstrate the superior performance of our method over the nonconvex approach by simulation studies.
</p>projecteuclid.org/euclid.aos/1513328577_20171215040315Fri, 15 Dec 2017 04:03 ESTConsistent parameter estimation for LASSO and approximate message passinghttps://projecteuclid.org/euclid.aos/1513328578<strong>Ali Mousavi</strong>, <strong>Arian Maleki</strong>, <strong>Richard G. Baraniuk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2427--2454.</p><p><strong>Abstract:</strong><br/>
This paper studies the optimal tuning of the regularization parameter in LASSO or the threshold parameters in approximate message passing (AMP). Considering a model in which the design matrix and noise are zero-mean i.i.d. Gaussian, we propose a data-driven approach for estimating the regularization parameter of LASSO and the threshold parameters in AMP. Our estimates are consistent, that is, they converge to their asymptotically optimal values in probability as $n$, the number of observations, and $p$, the ambient dimension of the sparse vector, grow to infinity, while $n/p$ converges to a fixed number $\delta$. As a byproduct of our analysis, we will shed light on the asymptotic properties of the solution paths of LASSO and AMP.
</p>projecteuclid.org/euclid.aos/1513328578_20171215040315Fri, 15 Dec 2017 04:03 ESTSupport recovery without incoherence: A case for nonconvex regularizationhttps://projecteuclid.org/euclid.aos/1513328579<strong>Po-Ling Loh</strong>, <strong>Martin J. Wainwright</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2455--2482.</p><p><strong>Abstract:</strong><br/>
We develop a new primal-dual witness proof framework that may be used to establish variable selection consistency and $\ell_{\infty}$-bounds for sparse regression problems, even when the loss function and regularizer are nonconvex. We use this method to prove two theorems concerning support recovery and $\ell_{\infty}$-guarantees for a regression estimator in a general setting. Notably, our theory applies to all potential stationary points of the objective and certifies that the stationary point is unique under mild conditions. Our results provide a strong theoretical justification for the use of nonconvex regularization: For certain nonconvex regularizers with vanishing derivative away from the origin, any stationary point can be used to recover the support without requiring the typical incoherence conditions present in $\ell_{1}$-based methods. We also derive corollaries illustrating the implications of our theorems for composite objective functions involving losses such as least squares, nonconvex modified least squares for errors-in-variables linear regression, the negative log likelihood for generalized linear models and the graphical Lasso. We conclude with empirical studies that corroborate our theoretical predictions.
</p>projecteuclid.org/euclid.aos/1513328579_20171215040315Fri, 15 Dec 2017 04:03 ESTOptimal design of fMRI experiments using circulant (almost-)orthogonal arrayshttps://projecteuclid.org/euclid.aos/1513328580<strong>Yuan-Lung Lin</strong>, <strong>Frederick Kin Hing Phoa</strong>, <strong>Ming-Hung Kao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2483--2510.</p><p><strong>Abstract:</strong><br/>
Functional magnetic resonance imaging (fMRI) is a pioneering technology for studying brain activity in response to mental stimuli. Although efficient designs on these fMRI experiments are important for rendering precise statistical inference on brain functions, they are not systematically constructed. Design with circulant property is crucial for estimating a hemodynamic response function (HRF) and discussing fMRI experimental optimality. In this paper, we develop a theory that not only successfully explains the structure of a circulant design, but also provides a method of constructing efficient fMRI designs systematically. We further provide a class of two-level circulant designs with good performance (statistically optimal), and they can be used to estimate the HRF of a stimulus type and study the comparison of two HRFs. Some efficient three- and four-levels circulant designs are also provided, and we proved the existence of a class of circulant orthogonal arrays.
</p>projecteuclid.org/euclid.aos/1513328580_20171215040315Fri, 15 Dec 2017 04:03 ESTAdaptive Bernstein–von Mises theorems in Gaussian white noisehttps://projecteuclid.org/euclid.aos/1513328581<strong>Kolyan Ray</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2511--2536.</p><p><strong>Abstract:</strong><br/>
We investigate Bernstein–von Mises theorems for adaptive nonparametric Bayesian procedures in the canonical Gaussian white noise model. We consider both a Hilbert space and multiscale setting with applications in $L^{2}$ and $L^{\infty}$, respectively. This provides a theoretical justification for plug-in procedures, for example the use of certain credible sets for sufficiently smooth linear functionals. We use this general approach to construct optimal frequentist confidence sets based on the posterior distribution. We also provide simulations to numerically illustrate our approach and obtain a visual representation of the geometries involved.
</p>projecteuclid.org/euclid.aos/1513328581_20171215040315Fri, 15 Dec 2017 04:03 ESTTargeted sequential design for targeted learning inference of the optimal treatment rule and its mean rewardhttps://projecteuclid.org/euclid.aos/1513328582<strong>Antoine Chambaz</strong>, <strong>Wenjing Zheng</strong>, <strong>Mark J. van der Laan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2537--2564.</p><p><strong>Abstract:</strong><br/> This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the nonexceptional case, that is , assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal TR and under the optimal TR itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference. As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. A simulation study illustrates the procedure. One of the cornerstones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral. </p>projecteuclid.org/euclid.aos/1513328582_20171215040315Fri, 15 Dec 2017 04:03 ESTNonparametric goodness-of-fit tests for uniform stochastic orderinghttps://projecteuclid.org/euclid.aos/1513328583<strong>Chuan-Fa Tang</strong>, <strong>Dewei Wang</strong>, <strong>Joshua M. Tebbs</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2565--2589.</p><p><strong>Abstract:</strong><br/>
We propose $L^{p}$ distance-based goodness-of-fit (GOF) tests for uniform stochastic ordering with two continuous distributions $F$ and $G$, both of which are unknown. Our tests are motivated by the fact that when $F$ and $G$ are uniformly stochastically ordered, the ordinal dominance curve $R=FG^{-1}$ is star-shaped. We derive asymptotic distributions and prove that our testing procedure has a unique least favorable configuration of $F$ and $G$ for $p\in [1,\infty]$. We use simulation to assess finite-sample performance and demonstrate that a modified, one-sample version of our procedure (e.g., with $G$ known) is more powerful than the one-sample GOF test suggested by Arcones and Samaniego [ Ann. Statist. 28 (2000) 116–150]. We also discuss sample size determination. We illustrate our methods using data from a pharmacology study evaluating the effects of administering caffeine to prematurely born infants.
</p>projecteuclid.org/euclid.aos/1513328583_20171215040315Fri, 15 Dec 2017 04:03 ESTSelecting the number of principal components: Estimation of the true rank of a noisy matrixhttps://projecteuclid.org/euclid.aos/1513328584<strong>Yunjin Choi</strong>, <strong>Jonathan Taylor</strong>, <strong>Robert Tibshirani</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2590--2617.</p><p><strong>Abstract:</strong><br/>
Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of principal components. In order to address this challenge, we propose distribution-based methods with exact type 1 error controls for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples. Assuming Gaussian noise, we derive exact type 1 error controls based on the conditional distribution of the singular values of a Gaussian matrix by utilizing a post-selection inference framework, and extending the approach of [Taylor, Loftus and Tibshirani (2013)] in a PCA setting. In simulation studies, we find that our proposed methods compare well to existing approaches.
</p>projecteuclid.org/euclid.aos/1513328584_20171215040315Fri, 15 Dec 2017 04:03 ESTExtended conditional independence and applications in causal inferencehttps://projecteuclid.org/euclid.aos/1513328585<strong>Panayiota Constantinou</strong>, <strong>A. Philip Dawid</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2618--2653.</p><p><strong>Abstract:</strong><br/>
The goal of this paper is to integrate the notions of stochastic conditional independence and variation conditional independence under a more general notion of extended conditional independence. We show that under appropriate assumptions the calculus that applies for the two cases separately (axioms of a separoid) still applies for the extended case. These results provide a rigorous basis for a wide range of statistical concepts, including ancillarity and sufficiency, and, in particular, the Decision Theoretic framework for statistical causality, which uses the language and calculus of conditional independence in order to express causal properties and make causal inferences.
</p>projecteuclid.org/euclid.aos/1513328585_20171215040315Fri, 15 Dec 2017 04:03 ESTA weight-relaxed model averaging approach for high-dimensional generalized linear modelshttps://projecteuclid.org/euclid.aos/1513328586<strong>Tomohiro Ando</strong>, <strong>Ker-chau Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2654--2679.</p><p><strong>Abstract:</strong><br/>
Model averaging has long been proposed as a powerful alternative to model selection in regression analysis. However, how well it performs in high-dimensional regression is still poorly understood. Recently, Ando and Li [ J. Amer. Statist. Assoc. 109 (2014) 254–265] introduced a new method of model averaging that allows the number of predictors to increase as the sample size increases. One notable feature of Ando and Li’s method is the relaxation on the total model weights so that weak signals can be efficiently combined from high-dimensional linear models. It is natural to ask if Ando and Li’s method and results can be extended to nonlinear models. Because all candidate models should be treated as working models, the existence of a theoretical target of the quasi maximum likelihood estimator under model misspecification needs to be established first. In this paper, we consider generalized linear models as our candidate models. We establish a general result to show the existence of pseudo-true regression parameters under model misspecification. We derive proper conditions for the leave-one-out cross-validation weight selection to achieve asymptotic optimality. Technically, the pseudo true target parameters between working models are not linearly linked. To overcome the encountered difficulties, we employ a novel strategy of decomposing and bounding the bias and variance terms in our proof. We conduct simulations to illustrate the merits of our model averaging procedure over several existing methods, including the lasso and group lasso methods, the Akaike and Bayesian information criterion model-averaging methods and some other state-of-the-art regularization methods.
</p>projecteuclid.org/euclid.aos/1513328586_20171215040315Fri, 15 Dec 2017 04:03 ESTStructural similarity and difference testing on multiple sparse Gaussian graphical modelshttps://projecteuclid.org/euclid.aos/1513328587<strong>Weidong Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2680--2707.</p><p><strong>Abstract:</strong><br/>
We present a new framework on inferring structural similarities and differences among multiple high-dimensional Gaussian graphical models (GGMs) corresponding to the same set of variables under distinct experimental conditions. The new framework adopts the partial correlation coefficients to characterize the potential changes of dependency strengths between two variables. A hierarchical method has been further developed to recover edges with different or similar dependency strengths across multiple GGMs. In particular, we first construct two-sample test statistics for testing the equality of partial correlation coefficients and conduct large-scale multiple tests to estimate the substructure of differential dependencies. After removing differential substructure from original GGMs, a follow-up multiple testing procedure is used to detect the substructure of similar dependencies among GGMs. In each step, false discovery rate is controlled asymptotically at a desired level. Power results are proved, which demonstrate that our method is more powerful on finding common edges than the common approach that separately estimates GGMs. The performance of the proposed hierarchical method is illustrated on simulated datasets.
</p>projecteuclid.org/euclid.aos/1513328587_20171215040315Fri, 15 Dec 2017 04:03 ESTEstimating a probability mass function with unknown labelshttps://projecteuclid.org/euclid.aos/1513328588<strong>Dragi Anevski</strong>, <strong>Richard D. Gill</strong>, <strong>Stefan Zohren</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2708--2735.</p><p><strong>Abstract:</strong><br/>
In the context of a species sampling problem, we discuss a nonparametric maximum likelihood estimator for the underlying probability mass function. The estimator is known in the computer science literature as the high profile estimator. We prove strong consistency and derive the rates of convergence, for an extended model version of the estimator. We also study a sieved estimator for which similar consistency results are derived. Numerical computation of the sieved estimator is of great interest for practical problems, such as forensic DNA analysis, and we present a computational algorithm based on the stochastic approximation of the expectation maximisation algorithm. As an interesting byproduct of the numerical analyses, we introduce an algorithm for bounded isotonic regression for which we also prove convergence.
</p>projecteuclid.org/euclid.aos/1513328588_20171215040315Fri, 15 Dec 2017 04:03 ESTOptimal sequential detection in multi-stream datahttps://projecteuclid.org/euclid.aos/1513328589<strong>Hock Peng Chan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2736--2763.</p><p><strong>Abstract:</strong><br/>
Consider a large number of detectors each generating a data stream. The task is to detect online, distribution changes in a small fraction of the data streams. Previous approaches to this problem include the use of mixture likelihood ratios and sum of CUSUMs. We provide here extensions and modifications of these approaches that are optimal in detecting normal mean shifts. We show how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity. There are three detection domains. In the first domain for moderately large fractions, immediate detection is possible. In the second domain for smaller fractions, the detection delay grows logarithmically with the number of detectors, with an asymptotic constant extending those in sparse normal mixture detection. In the third domain for even smaller fractions, the detection delay lies in the framework of the classical detection delay formula of Lorden. We show that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.
</p>projecteuclid.org/euclid.aos/1513328589_20171215040315Fri, 15 Dec 2017 04:03 ESTChernoff index for Cox test of separate parametric familieshttps://projecteuclid.org/euclid.aos/1519268422<strong>Xiaoou Li</strong>, <strong>Jingchen Liu</strong>, <strong>Zhiliang Ying</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 1--29.</p><p><strong>Abstract:</strong><br/>
The asymptotic efficiency of a generalized likelihood ratio test proposed by Cox is studied under the large deviations framework for error probabilities developed by Chernoff. In particular, two separate parametric families of hypotheses are considered [In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (1961) 105–123; J. Roy. Statist. Soc. Ser. B 24 (1962) 406–424]. The significance level is set such that the maximal type I and type II error probabilities for the generalized likelihood ratio test decay exponentially fast with the same rate. We derive the analytic form of such a rate that is also known as the Chernoff index [ Ann. Math. Stat. 23 (1952) 493–507], a relative efficiency measure when there is no preference between the null and the alternative hypotheses. We further extend the analysis to approximate error probabilities when the two families are not completely separated. Discussions are provided concerning the implications of the present result on model selection.
</p>projecteuclid.org/euclid.aos/1519268422_20180221220038Wed, 21 Feb 2018 22:00 ESTOptimal bounds for aggregation of affine estimatorshttps://projecteuclid.org/euclid.aos/1519268423<strong>Pierre C. Bellec</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 30--59.</p><p><strong>Abstract:</strong><br/>
We study the problem of aggregation of estimators when the estimators are not independent of the data used for aggregation and no sample splitting is allowed. If the estimators are deterministic vectors, it is well known that the minimax rate of aggregation is of order $\log(M)$, where $M$ is the number of estimators to aggregate. It is proved that for affine estimators, the minimax rate of aggregation is unchanged: it is possible to handle the linear dependence between the affine estimators and the data used for aggregation at no extra cost. The minimax rate is not impacted either by the variance of the affine estimators, or any other measure of their statistical complexity. The minimax rate is attained with a penalized procedure over the convex hull of the estimators, for a penalty that is inspired from the $Q$-aggregation procedure. The results follow from the interplay between the penalty, strong convexity and concentration.
</p>projecteuclid.org/euclid.aos/1519268423_20180221220038Wed, 21 Feb 2018 22:00 ESTRate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statisticshttps://projecteuclid.org/euclid.aos/1519268424<strong>T. Tony Cai</strong>, <strong>Anru Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 60--89.</p><p><strong>Abstract:</strong><br/> Perturbation bounds for singular spaces, in particular Wedin’s $\mathop{\mathrm{sin}}\nolimits \Theta$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\mathop{\mathrm{sin}}\nolimits \Theta$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given. The new perturbation bounds are applicable to a wide range of problems. In this paper, we consider in detail applications to low-rank matrix denoising and singular space estimation, high-dimensional clustering and canonical correlation analysis (CCA). In particular, separate matching upper and lower bounds are obtained for estimating the left and right singular spaces. To the best of our knowledge, this is the first result that gives different optimal rates for the left and right singular spaces under the same perturbation. </p>projecteuclid.org/euclid.aos/1519268424_20180221220038Wed, 21 Feb 2018 22:00 ESTExact formulas for the normalizing constants of Wishart distributions for graphical modelshttps://projecteuclid.org/euclid.aos/1519268425<strong>Caroline Uhler</strong>, <strong>Alex Lenkoski</strong>, <strong>Donald Richards</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 90--118.</p><p><strong>Abstract:</strong><br/>
Gaussian graphical models have received considerable attention during the past four decades from the statistical and machine learning communities. In Bayesian treatments of this model, the $G$-Wishart distribution serves as the conjugate prior for inverse covariance matrices satisfying graphical constraints. While it is straightforward to posit the unnormalized densities, the normalizing constants of these distributions have been known only for graphs that are chordal, or decomposable. Up until now, it was unknown whether the normalizing constant for a general graph could be represented explicitly, and a considerable body of computational literature emerged that attempted to avoid this apparent intractability. We close this question by providing an explicit representation of the $G$-Wishart normalizing constant for general graphs.
</p>projecteuclid.org/euclid.aos/1519268425_20180221220038Wed, 21 Feb 2018 22:00 ESTConsistent parameter estimation for LASSO and approximate message passinghttps://projecteuclid.org/euclid.aos/1519268426<strong>Ali Mousavi</strong>, <strong>Arian Maleki</strong>, <strong>Richard G. Baraniuk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 119--148.</p><p><strong>Abstract:</strong><br/>
This paper studies the optimal tuning of the regularization parameter in LASSO or the threshold parameters in approximate message passing (AMP). Considering a model in which the design matrix and noise are zero-mean i.i.d. Gaussian, we propose a data-driven approach for estimating the regularization parameter of LASSO and the threshold parameters in AMP. Our estimates are consistent, that is, they converge to their asymptotically optimal values in probability as $n$, the number of observations, and $p$, the ambient dimension of the sparse vector, grow to infinity, while $n/p$ converges to a fixed number $\delta$. As a byproduct of our analysis, we will shed light on the asymptotic properties of the solution paths of LASSO and AMP.
</p>projecteuclid.org/euclid.aos/1519268426_20180221220038Wed, 21 Feb 2018 22:00 ESTOn semidefinite relaxations for the block modelhttps://projecteuclid.org/euclid.aos/1519268427<strong>Arash A. Amini</strong>, <strong>Elizaveta Levina</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 149--179.</p><p><strong>Abstract:</strong><br/>
The stochastic block model (SBM) is a popular tool for community detection in networks, but fitting it by maximum likelihood (MLE) involves a computationally infeasible optimization problem. We propose a new semidefinite programming (SDP) solution to the problem of fitting the SBM, derived as a relaxation of the MLE. We put ours and previously proposed SDPs in a unified framework, as relaxations of the MLE over various subclasses of the SBM, which also reveals a connection to the well-known problem of sparse PCA. Our main relaxation, which we call SDP-1, is tighter than other recently proposed SDP relaxations, and thus previously established theoretical guarantees carry over. However, we show that SDP-1 exactly recovers true communities over a wider class of SBMs than those covered by current results. In particular, the assumption of strong assortativity of the SBM, implicit in consistency conditions for previously proposed SDPs, can be relaxed to weak assortativity for our approach, thus significantly broadening the class of SBMs covered by the consistency results. We also show that strong assortativity is indeed a necessary condition for exact recovery for previously proposed SDP approaches and not an artifact of the proofs. Our analysis of SDPs is based on primal-dual witness constructions, which provides some insight into the nature of the solutions of various SDPs. In particular, we show how to combine features from SDP-1 and already available SDPs to achieve the most flexibility in terms of both assortativity and block-size constraints, as our relaxation has the tendency to produce communities of similar sizes. This tendency makes it the ideal tool for fitting network histograms, a method gaining popularity in the graphon estimation literature, as we illustrate on an example of a social networks of dolphins. We also provide empirical evidence that SDPs outperform spectral methods for fitting SBMs with a large number of blocks.
</p>projecteuclid.org/euclid.aos/1519268427_20180221220038Wed, 21 Feb 2018 22:00 ESTPathwise coordinate optimization for sparse learning: Algorithm and theoryhttps://projecteuclid.org/euclid.aos/1519268428<strong>Tuo Zhao</strong>, <strong>Han Liu</strong>, <strong>Tong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 180--218.</p><p><strong>Abstract:</strong><br/>
The pathwise coordinate optimization is one of the most important computational frameworks for high dimensional convex and nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: warm start initialization , active set updating and strong rule for coordinate preselection . Such a complex algorithmic structure grants superior empirical performance, but also poses significant challenge to theoretical analysis. To tackle this long lasting problem, we develop a new theory showing that these three features play pivotal roles in guaranteeing the outstanding statistical and computational performance of the pathwise coordinate optimization framework. Particularly, we analyze the existing pathwise coordinate optimization algorithms and provide new theoretical insights into them. The obtained insights further motivate the development of several modifications to improve the pathwise coordinate optimization framework, which guarantees linear convergence to a unique sparse local optimum with optimal statistical properties in parameter estimation and support recovery. This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions. Thorough numerical experiments are provided to support our theory.
</p>projecteuclid.org/euclid.aos/1519268428_20180221220038Wed, 21 Feb 2018 22:00 ESTConditional mean and quantile dependence testing in high dimensionhttps://projecteuclid.org/euclid.aos/1519268429<strong>Xianyang Zhang</strong>, <strong>Shun Yao</strong>, <strong>Xiaofeng Shao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 219--246.</p><p><strong>Abstract:</strong><br/>
Motivated by applications in biological science, we propose a novel test to assess the conditional mean dependence of a response variable on a large number of covariates. Our procedure is built on the martingale difference divergence recently proposed in Shao and Zhang [ J. Amer. Statist. Assoc. 109 (2014) 1302–1318], and it is able to detect certain type of departure from the null hypothesis of conditional mean independence without making any specific model assumptions. Theoretically, we establish the asymptotic normality of the proposed test statistic under suitable assumption on the eigenvalues of a Hermitian operator, which is constructed based on the characteristic function of the covariates. These conditions can be simplified under banded dependence structure on the covariates or Gaussian design. To account for heterogeneity within the data, we further develop a testing procedure for conditional quantile independence at a given quantile level and provide an asymptotic justification. Empirically, our test of conditional mean independence delivers comparable results to the competitor, which was constructed under the linear model framework, when the underlying model is linear. It significantly outperforms the competitor when the conditional mean admits a nonlinear form.
</p>projecteuclid.org/euclid.aos/1519268429_20180221220038Wed, 21 Feb 2018 22:00 ESTHigh-dimensional asymptotics of prediction: Ridge regression and classificationhttps://projecteuclid.org/euclid.aos/1519268430<strong>Edgar Dobriban</strong>, <strong>Stefan Wager</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 247--279.</p><p><strong>Abstract:</strong><br/>
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where $p,n\to\infty$ and $p/n\to\gamma>0$, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength and the aspect ratio $\gamma$. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover an exact inverse relation between the limiting predictive risk and the limiting estimation risk in high-dimensional linear models. The analysis builds on recent advances in random matrix theory.
</p>projecteuclid.org/euclid.aos/1519268430_20180221220038Wed, 21 Feb 2018 22:00 ESTTesting independence in high dimensions with sums of rank correlationshttps://projecteuclid.org/euclid.aos/1519268431<strong>Dennis Leung</strong>, <strong>Mathias Drton</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 280--307.</p><p><strong>Abstract:</strong><br/>
We treat the problem of testing independence between $m$ continuous variables when $m$ can be larger than the available sample size $n$. We consider three types of test statistics that are constructed as sums or sums of squares of pairwise rank correlations. In the asymptotic regime where both $m$ and $n$ tend to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendall’s tau and a dominating term of Spearman’s rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffding’s $D$, or the $\tau^{*}$ of Bergsma and Dassios [ Bernoulli 20 (2014) 1006–1028]. As in the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under a Gaussian equicorrelation alternative as well as in numerical experiments for specific cases of more general alternatives.
</p>projecteuclid.org/euclid.aos/1519268431_20180221220038Wed, 21 Feb 2018 22:00 ESTHigh dimensional censored quantile regressionhttps://projecteuclid.org/euclid.aos/1519268432<strong>Qi Zheng</strong>, <strong>Limin Peng</strong>, <strong>Xuming He</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 308--343.</p><p><strong>Abstract:</strong><br/>
Censored quantile regression (CQR) has emerged as a useful regression tool for survival analysis. Some commonly used CQR methods can be characterized by stochastic integral-based estimating equations in a sequential manner across quantile levels. In this paper, we analyze CQR in a high dimensional setting where the regression functions over a continuum of quantile levels are of interest. We propose a two-step penalization procedure, which accommodates stochastic integral based estimating equations and address the challenges due to the recursive nature of the procedure. We establish the uniform convergence rates for the proposed estimators, and investigate the properties on weak convergence and variable selection. We conduct numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposals.
</p>projecteuclid.org/euclid.aos/1519268432_20180221220038Wed, 21 Feb 2018 22:00 ESTLocal M-estimation with discontinuous criterion for dependent and limited observationshttps://projecteuclid.org/euclid.aos/1519268433<strong>Myung Hwan Seo</strong>, <strong>Taisuke Otsu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 344--369.</p><p><strong>Abstract:</strong><br/>
We examine the asymptotic properties of local M-estimators under three sets of high-level conditions. These conditions are sufficiently general to cover the minimum volume predictive region, the conditional maximum score estimator for a panel data discrete choice model and many other widely used estimators in statistics and econometrics. Specifically, they allow for discontinuous criterion functions of weakly dependent observations which may be localized by kernel smoothing and contain nuisance parameters with growing dimension. Furthermore, the localization can occur around parameter values rather than around a fixed point and the observations may take limited values which lead to set estimators. Our theory produces three different nonparametric cube root rates for local M-estimators and enables valid inference building on novel maximal inequalities for weakly dependent observations. The standard cube root asymptotics is included as a special case. The results are illustrated by various examples such as the Hough transform estimator with diminishing bandwidth, the maximum score-type set estimator and many others.
</p>projecteuclid.org/euclid.aos/1519268433_20180221220038Wed, 21 Feb 2018 22:00 ESTMixture inner product spaces and their application to functional data analysishttps://projecteuclid.org/euclid.aos/1519268434<strong>Zhenhua Lin</strong>, <strong>Hans-Georg Müller</strong>, <strong>Fang Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 370--400.</p><p><strong>Abstract:</strong><br/>
We introduce the concept of mixture inner product spaces associated with a given separable Hilbert space, which feature an infinite-dimensional mixture of finite-dimensional vector spaces and are dense in the underlying Hilbert space. Any Hilbert valued random element can be arbitrarily closely approximated by mixture inner product space valued random elements. While this concept can be applied to data in any infinite-dimensional Hilbert space, the case of functional data that are random elements in the $L^{2}$ space of square integrable functions is of special interest. For functional data, mixture inner product spaces provide a new perspective, where each realization of the underlying stochastic process falls into one of the component spaces and is represented by a finite number of basis functions, the number of which corresponds to the dimension of the component space. In the mixture representation of functional data, the number of included mixture components used to represent a given random element in $L^{2}$ is specifically adapted to each random trajectory and may be arbitrarily large. Key benefits of this novel approach are, first, that it provides a new perspective on the construction of a probability density in function space under mild regularity conditions, and second, that individual trajectories possess a trajectory-specific dimension that corresponds to a latent random variable, making it possible to use a larger number of components for less smooth and a smaller number for smoother trajectories. This enables flexible and parsimonious modeling of heterogeneous trajectory shapes. We establish estimation consistency of the functional mixture density and introduce an algorithm for fitting the functional mixture model based on a modified expectation-maximization algorithm. Simulations confirm that in comparison to traditional functional principal component analysis the proposed method achieves similar or better data recovery while using fewer components on average. Its practical merits are also demonstrated in an analysis of egg-laying trajectories for medflies.
</p>projecteuclid.org/euclid.aos/1519268434_20180221220038Wed, 21 Feb 2018 22:00 ESTBayesian estimation of sparse signals with a continuous spike-and-slab priorhttps://projecteuclid.org/euclid.aos/1519268435<strong>Veronika Ročková</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 401--437.</p><p><strong>Abstract:</strong><br/>
We introduce a new framework for estimation of sparse normal means, bridging the gap between popular frequentist strategies (LASSO) and popular Bayesian strategies (spike-and-slab). The main thrust of this paper is to introduce the family of Spike-and-Slab LASSO (SS-LASSO) priors, which form a continuum between the Laplace prior and the point-mass spike-and-slab prior. We establish several appealing frequentist properties of SS-LASSO priors, contrasting them with these two limiting cases. First, we adopt the penalized likelihood perspective on Bayesian modal estimation and introduce the framework of Bayesian penalty mixing with spike-and-slab priors. We show that the SS-LASSO global posterior mode is (near) minimax rate-optimal under squared error loss, similarly as the LASSO. Going further, we introduce an adaptive two-step estimator which can achieve provably sharper performance than the LASSO. Second, we show that the whole posterior keeps pace with the global mode and concentrates at the (near) minimax rate, a property that is known \textsl{not to hold} for the single Laplace prior. The minimax-rate optimality is obtained with a suitable class of independent product priors (for known levels of sparsity) as well as with dependent mixing priors (adapting to the unknown levels of sparsity). Up to now, the rate-optimal posterior concentration has been established only for spike-and-slab priors with a point mass at zero. Thus, the SS-LASSO priors, despite being continuous, possess similar optimality properties as the “theoretically ideal” point-mass mixtures. These results provide valuable theoretical justification for our proposed class of priors, underpinning their intuitive appeal and practical potential.
</p>projecteuclid.org/euclid.aos/1519268435_20180221220038Wed, 21 Feb 2018 22:00 ESTOn the asymptotic theory of new bootstrap confidence boundshttps://projecteuclid.org/euclid.aos/1519268436<strong>Charl Pretorius</strong>, <strong>Jan W. H. Swanepoel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 438--456.</p><p><strong>Abstract:</strong><br/>
We propose a new method, based on sample splitting, for constructing bootstrap confidence bounds for a parameter appearing in the regular smooth function model. It has been demonstrated in the literature, for example, by Hall [ Ann. Statist. 16 (1988) 927–985; The Bootstrap and Edgeworth Expansion (1992) Springer], that the well-known percentile-$t$ method for constructing bootstrap confidence bounds typically incurs a coverage error of order $O(n^{-1})$, with $n$ being the sample size. Our version of the percentile-$t$ bound reduces this coverage error to order $O(n^{-3/2})$ and in some cases to $O(n^{-2})$. Furthermore, whereas the standard percentile bounds typically incur coverage error of $O(n^{-1/2})$, the new bounds have reduced error of $O(n^{-1})$. In the case where the parameter of interest is the population mean, we derive for each confidence bound the exact coefficient of the leading term in an asymptotic expansion of the coverage error, although similar results may be obtained for other parameters such as the variance, the correlation coefficient, and the ratio of two means. We show that equal-tailed confidence intervals with coverage error at most $O(n^{-2})$ may be obtained from the newly proposed bounds, as opposed to the typical error $O(n^{-1})$ of the standard intervals. It is also shown that the good properties of the new percentile-$t$ method carry over to regression problems. Results of independent interest are derived, such as a generalisation of a delta method by Cramér [ Mathematical Methods of Statistics (1946) Princeton Univ. Press] and Hurt [ Apl. Mat. 21 (1976) 444–456], and an expression for a polynomial appearing in an Edgeworth expansion of the distribution of a Studentised statistic for the slope parameter in a regression model. A small simulation study illustrates the behavior of the confidence bounds for small to moderate sample sizes.
</p>projecteuclid.org/euclid.aos/1519268436_20180221220038Wed, 21 Feb 2018 22:00 ESTStrong orthogonal arrays of strength two plushttps://projecteuclid.org/euclid.aos/1522742425<strong>Yuanzhen He</strong>, <strong>Ching-Shui Cheng</strong>, <strong>Boxin Tang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 457--468.</p><p><strong>Abstract:</strong><br/>
Strong orthogonal arrays were recently introduced and studied in He and Tang [ Biometrika 100 (2013) 254–260] as a class of space-filling designs for computer experiments. To enjoy the benefits of better space-filling properties, when compared to ordinary orthogonal arrays, strong orthogonal arrays need to have strength three or higher, which may require run sizes that are too large for experimenters to afford. To address this problem, we introduce a new class of arrays, called strong orthogonal arrays of strength two plus. These arrays, while being more economical than strong orthogonal arrays of strength three, still enjoy the better two-dimensional space-filling property of the latter. Among the many results we have obtained on the characterizations and constructions of strong orthogonal arrays of strength two plus, worth special mention is their intimate connection with second-order saturated designs.
</p>projecteuclid.org/euclid.aos/1522742425_20180403220226Tue, 03 Apr 2018 22:02 EDTStatistical inference for spatial statistics defined in the Fourier domainhttps://projecteuclid.org/euclid.aos/1522742426<strong>Suhasini Subba Rao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 469--499.</p><p><strong>Abstract:</strong><br/>
A class of Fourier based statistics for irregular spaced spatial data is introduced. Examples include the Whittle likelihood, a parametric estimator of the covariance function based on the $L_{2}$-contrast function and a simple nonparametric estimator of the spatial autocovariance which is a nonnegative function. The Fourier based statistic is a quadratic form of a discrete Fourier-type transform of the spatial data. Evaluation of the statistic is computationally tractable, requiring $O(nb^{})$ operations, where $b$ are the number of Fourier frequencies used in the definition of the statistic and $n$ is the sample size. The asymptotic sampling properties of the statistic are derived using both increasing domain and fixed-domain spatial asymptotics. These results are used to construct a statistic which is asymptotically pivotal.
</p>projecteuclid.org/euclid.aos/1522742426_20180403220226Tue, 03 Apr 2018 22:02 EDTOn the inference about the spectral distribution of high-dimensional covariance matrix based on high-frequency noisy observationshttps://projecteuclid.org/euclid.aos/1522742427<strong>Ningning Xia</strong>, <strong>Xinghua Zheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 500--525.</p><p><strong>Abstract:</strong><br/>
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix a signal-plus-noise sample covariance matrix. Aiming to make inferences about the spectral distribution of the population covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (signal) sample covariance matrices depends on that of signal-plus-noise-type sample covariance matrices. As an application, we consider inferences about the spectral distribution of integrated covolatility (ICV) matrices of high-dimensional diffusion processes based on high-frequency data with microstructure noise. The (slightly modified) pre-averaging estimator is a signal-plus-noise sample covariance matrix, and the aforementioned result, together with a (generalized) connection between the spectral distribution of signal sample covariance matrices and that of the population covariance matrix, enables us to propose a two-step procedure to consistently estimate the spectral distribution of ICV for a class of diffusion processes. An alternative approach is further proposed, which possesses several desirable properties: it is more robust, it eliminates the effects of microstructure noise, and the asymptotic relationship that enables consistent estimation of the spectral distribution of ICV is the standard Marčenko–Pastur equation. The performance of the two approaches is examined via simulation studies under both synchronous and asynchronous observation settings.
</p>projecteuclid.org/euclid.aos/1522742427_20180403220226Tue, 03 Apr 2018 22:02 EDTOnline rules for control of false discovery rate and false discovery exceedancehttps://projecteuclid.org/euclid.aos/1522742428<strong>Adel Javanmard</strong>, <strong>Andrea Montanari</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 526--554.</p><p><strong>Abstract:</strong><br/>
Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses $\mathcal{H}(n)=(H_{1},\ldots,H_{n})$, Benjamini and Hochberg [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 (1995) 289–300] introduced the false discovery rate ($\mathrm{FDR}$), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls $\mathrm{FDR}$ below a pre-assigned significance level. Nowadays $\mathrm{FDR}$ is the criterion of choice for large-scale multiple hypothesis testing.
In this paper we consider the problem of controlling $\mathrm{FDR}$ in an online manner . Concretely, we consider an ordered—possibly infinite—sequence of null hypotheses $\mathcal{H}=(H_{1},H_{2},H_{3},\ldots)$ where, at each step $i$, the statistician must decide whether to reject hypothesis $H_{i}$ having access only to the previous decisions. This model was introduced by Foster and Stine [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 (2008) 429–444].
We study a class of generalized alpha investing procedures, first introduced by Aharoni and Rosset [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 (2014) 771–794]. We prove that any rule in this class controls online $\mathrm{FDR}$, provided $p$-values corresponding to true nulls are independent from the other $p$-values. Earlier work only established $\mathrm{mFDR}$ control. Next, we obtain conditions under which generalized alpha investing controls $\mathrm{FDR}$ in the presence of general $p$-values dependencies. We also develop a modified set of procedures that allow to control the false discovery exceedance (the tail of the proportion of false discoveries). Finally, we evaluate the performance of online procedures on both synthetic and real data, comparing them with offline approaches, such as adaptive Benjamini–Hochberg.
</p>projecteuclid.org/euclid.aos/1522742428_20180403220226Tue, 03 Apr 2018 22:02 EDTFrequency domain minimum distance inference for possibly noninvertible and noncausal ARMA modelshttps://projecteuclid.org/euclid.aos/1522742429<strong>Carlos Velasco</strong>, <strong>Ignacio N. Lobato</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 555--579.</p><p><strong>Abstract:</strong><br/>
This article introduces frequency domain minimum distance procedures for performing inference in general, possibly non causal and/or noninvertible, autoregressive moving average (ARMA) models. We use information from higher order moments to achieve identification on the location of the roots of the AR and MA polynomials for non-Gaussian time series. We propose a minimum distance estimator that optimally combines the information contained in second, third, and fourth moments. Contrary to existing estimators, the proposed one is consistent under general assumptions, and may improve on the efficiency of estimators based on only second order moments. Our procedures are also applicable for processes for which either the third or the fourth order spectral density is the zero function.
</p>projecteuclid.org/euclid.aos/1522742429_20180403220226Tue, 03 Apr 2018 22:02 EDTOn consistency and sparsity for sliced inverse regression in high dimensionshttps://projecteuclid.org/euclid.aos/1522742430<strong>Qian Lin</strong>, <strong>Zhigen Zhao</strong>, <strong>Jun S. Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 580--610.</p><p><strong>Abstract:</strong><br/>
We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by Li [ J. Amer. Statist. Assoc. 86 (1991) 316–342]. Under mild conditions, the asymptotic ratio $\rho=\lim p/n$ is the phase transition parameter and the SIR estimator is consistent if and only if $\rho=0$. When dimension $p$ is greater than $n$, we propose a diagonal thresholding screening SIR (DT-SIR) algorithm. This method provides us with an estimate of the eigenspace of $\operatorname{var}(\mathbb{E}[\boldsymbol{x}|y])$, the covariance matrix of the conditional expectation. The desired dimension reduction space is then obtained by multiplying the inverse of the covariance matrix on the eigenspace. Under certain sparsity assumptions on both the covariance matrix of predictors and the loadings of the directions, we prove the consistency of DT-SIR in estimating the dimension reduction space in high-dimensional data analysis. Extensive numerical experiments demonstrate superior performances of the proposed method in comparison to its competitors.
</p>projecteuclid.org/euclid.aos/1522742430_20180403220226Tue, 03 Apr 2018 22:02 EDTRegularization and the small-ball method I: Sparse recoveryhttps://projecteuclid.org/euclid.aos/1522742431<strong>Guillaume Lecué</strong>, <strong>Shahar Mendelson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 611--641.</p><p><strong>Abstract:</strong><br/>
We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*}\hat{f}\in\mathop{\operatorname{argmin}}_{f\in F}(\frac{1}{N}\sum_{i=1}^{N}(Y_{i}-f(X_{i}))^{2}+\lambda \Psi(f))\end{equation*} when $\Psi$ is a norm and $F$ is convex.
Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $\Psi$ in a neighborhood of the true minimizer.
As “proof of concept” we extend the known estimates for the LASSO, SLOPE and trace norm regularization.
</p>projecteuclid.org/euclid.aos/1522742431_20180403220226Tue, 03 Apr 2018 22:02 EDTGaussian and bootstrap approximations for high-dimensional U-statistics and their applicationshttps://projecteuclid.org/euclid.aos/1522742432<strong>Xiaohui Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 642--678.</p><p><strong>Abstract:</strong><br/>
This paper studies the Gaussian and bootstrap approximations for the probabilities of a nondegenerate U-statistic belonging to the hyperrectangles in $\mathbb{R}^{d}$ when the dimension $d$ is large. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence uniformly in the class of all hyperrectangles in $\mathbb{R}^{d}$ that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical bootstrap, the randomly reweighted bootstrap and the Gaussian multiplier bootstrap with the jackknife estimator of covariance matrix as randomly reweighted quadratic forms and we establish their validity. We show that all three methods are inferentially first-order equivalent for high-dimensional U-statistics in the sense that they achieve the same uniform rate of convergence over all $d$-dimensional hyperrectangles. In particular, they are asymptotically valid when the dimension $d$ can be as large as $O(e^{n^{c}})$ for some constant $c\in(0,1/7)$.
The bootstrap methods are applied to statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that for a class of sub-Gaussian data, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold. In addition, we also show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution.
</p>projecteuclid.org/euclid.aos/1522742432_20180403220226Tue, 03 Apr 2018 22:02 EDTSelective inference with a randomized responsehttps://projecteuclid.org/euclid.aos/1522742433<strong>Xiaoying Tian</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 679--710.</p><p><strong>Abstract:</strong><br/>
Inspired by sample splitting and the reusable holdout introduced in the field of differential privacy, we consider selective inference with a randomized response. We discuss two major advantages of using a randomized response for model selection. First, the selectively valid tests are more powerful after randomized selection. Second, it allows consistent estimation and weak convergence of selective inference procedures. Under independent sampling, we prove a selective (or privatized) central limit theorem that transfers procedures valid under asymptotic normality without selection to their corresponding selective counterparts. This allows selective inference in nonparametric settings. Finally, we propose a framework of inference after combining multiple randomized selection procedures. We focus on the classical asymptotic setting, leaving the interesting high-dimensional asymptotic questions for future work.
</p>projecteuclid.org/euclid.aos/1522742433_20180403220226Tue, 03 Apr 2018 22:02 EDTMultiscale blind source separationhttps://projecteuclid.org/euclid.aos/1522742434<strong>Merle Behr</strong>, <strong>Chris Holmes</strong>, <strong>Axel Munk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 711--744.</p><p><strong>Abstract:</strong><br/>
We provide a new methodology for statistical recovery of single linear mixtures of piecewise constant signals (sources) with unknown mixing weights and change points in a multiscale fashion. We show exact recovery within an $\varepsilon$-neighborhood of the mixture when the sources take only values in a known finite alphabet. Based on this we provide the SLAM (Separates Linear Alphabet Mixtures) estimators for the mixing weights and sources. For Gaussian error, we obtain uniform confidence sets and optimal rates (up to log-factors) for all quantities. SLAM is efficiently computed as a nonconvex optimization problem by a dynamic program tailored to the finite alphabet assumption. Its performance is investigated in a simulation study. Finally, it is applied to assign copy-number aberrations from genetic sequencing data to different clones and to estimate their proportions.
</p>projecteuclid.org/euclid.aos/1522742434_20180403220226Tue, 03 Apr 2018 22:02 EDTSharp oracle inequalities for Least Squares estimators in shape restricted regressionhttps://projecteuclid.org/euclid.aos/1522742435<strong>Pierre C. Bellec</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 745--780.</p><p><strong>Abstract:</strong><br/>
The performance of Least Squares (LS) estimators is studied in shape-constrained regression models under Gaussian and sub-Gaussian noise. General bounds on the performance of LS estimators over closed convex sets are provided. These results have the form of sharp oracle inequalities that account for the model misspecification error. In the presence of misspecification, these bounds imply that the LS estimator estimates the projection of the true parameter at the same rate as in the well-specified case.
In isotonic and unimodal regression, the LS estimator achieves the nonparametric rate $n^{-2/3}$ as well as a parametric rate of order $k/n$ up to logarithmic factors, where $k$ is the number of constant pieces of the true parameter. In univariate convex regression, the LS estimator satisfies an adaptive risk bound of order $q/n$ up to logarithmic factors, where $q$ is the number of affine pieces of the true regression function. This adaptive risk bound holds for any collection of design points. While Guntuboyina and Sen [ Probab. Theory Related Fields 163 (2015) 379–411] established that the nonparametric rate of convex regression is of order $n^{-4/5}$ for equispaced design points, we show that the nonparametric rate of convex regression can be as slow as $n^{-2/3}$ for some worst-case design points. This phenomenon can be explained as follows: Although convexity brings more structure than unimodality, for some worst-case design points this extra structure is uninformative and the nonparametric rates of unimodal regression and convex regression are both $n^{-2/3}$. Higher order cones, such as the cone of $\beta $-monotone sequences, are also studied.
</p>projecteuclid.org/euclid.aos/1522742435_20180403220226Tue, 03 Apr 2018 22:02 EDTOracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert spacehttps://projecteuclid.org/euclid.aos/1522742436<strong>Shaogao Lv</strong>, <strong>Huazhen Lin</strong>, <strong>Heng Lian</strong>, <strong>Jian Huang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 781--813.</p><p><strong>Abstract:</strong><br/>
This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle rates without the light tail requirement. In particular, the proposed estimator achieves the minimax lower bounds established for sparse additive mean regression. As a by-product, we also establish the concentration inequality for estimating the population mean when the general Lipschitz loss is involved. The practical effectiveness of the new method is demonstrated by competitive numerical results.
</p>projecteuclid.org/euclid.aos/1522742436_20180403220226Tue, 03 Apr 2018 22:02 EDTI-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical errorhttps://projecteuclid.org/euclid.aos/1522742437<strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Qiang Sun</strong>, <strong>Tong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 814--841.</p><p><strong>Abstract:</strong><br/>
We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an improved linear rate of convergence. Though this framework is completely algorithmic, it provides solutions with optimal statistical performances and controlled algorithmic complexity for a large family of nonconvex optimization problems. The iteration effects on statistical errors are clearly demonstrated via a contraction property. Our theory relies on a localized version of the sparse/restricted eigenvalue condition, which allows us to analyze a large family of loss and penalty functions and provide optimality guarantees under very weak assumptions (e.g., I-LAMM requires much weaker minimal signal strength than other procedures). Thorough numerical results are provided to support the obtained theory.
</p>projecteuclid.org/euclid.aos/1522742437_20180403220226Tue, 03 Apr 2018 22:02 EDTOn Bayesian index policies for sequential resource allocationhttps://projecteuclid.org/euclid.aos/1522742438<strong>Emilie Kaufmann</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 842--865.</p><p><strong>Abstract:</strong><br/>
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite-Horizon Gittins indices provide a justification for the kl-UCB$^{+}$ and kl-UCB-H$^{+}$ algorithms, whose asymptotic optimality is also established.
</p>projecteuclid.org/euclid.aos/1522742438_20180403220226Tue, 03 Apr 2018 22:02 EDTTesting independence with high-dimensional correlated sampleshttps://projecteuclid.org/euclid.aos/1522742439<strong>Xi Chen</strong>, <strong>Weidong Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 866--894.</p><p><strong>Abstract:</strong><br/>
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p\times n$ data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical power and prove its minimax optimality. As an important by-product of the test statistic, a ratio-consistent estimator for the quadratic functional of a covariance matrix from correlated samples is developed. We further study the effect of correlation among samples to an important high-dimensional inference problem—large-scale multiple testing of Pearson’s correlation coefficients. Indeed, blindly using classical inference results based on the assumed independence of samples will lead to many false discoveries, which suggests the need for conducting independence testing before applying existing methods. To address the challenge arising from correlation among samples, we propose a “sandwich estimator” of Pearson’s correlation coefficient by de-correlating the samples. Based on this approach, the resulting multiple testing procedure asymptotically controls the overall false discovery rate at the nominal level while maintaining good statistical power. Both simulated and real data experiments are carried out to demonstrate the advantages of the proposed methods.
</p>projecteuclid.org/euclid.aos/1522742439_20180403220226Tue, 03 Apr 2018 22:02 EDTDetecting rare and faint signals via thresholding maximum likelihood estimatorshttps://projecteuclid.org/euclid.aos/1522742440<strong>Yumou Qiu</strong>, <strong>Song Xi Chen</strong>, <strong>Dan Nettleton</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 895--923.</p><p><strong>Abstract:</strong><br/>
Motivated by the analysis of RNA sequencing (RNA-seq) data for genes differentially expressed across multiple conditions, we consider detecting rare and faint signals in high-dimensional response variables. We address the signal detection problem under a general framework, which includes generalized linear models for count-valued responses as special cases. We propose a test statistic that carries out a multi-level thresholding on maximum likelihood estimators (MLEs) of the signals, based on a new Cramér-type moderate deviation result for multidimensional MLEs. Based on the multi-level thresholding test, a multiple testing procedure is proposed for signal identification. Numerical simulations and a case study on maize RNA-seq data are conducted to demonstrate the effectiveness of the proposed approaches on signal detection and identification.
</p>projecteuclid.org/euclid.aos/1522742440_20180403220226Tue, 03 Apr 2018 22:02 EDTHigh-dimensional $A$-learning for optimal dynamic treatment regimeshttps://projecteuclid.org/euclid.aos/1525313071<strong>Chengchun Shi</strong>, <strong>Ailin Fan</strong>, <strong>Rui Song</strong>, <strong>Wenbin Lu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 925--957.</p><p><strong>Abstract:</strong><br/>
Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This makes variable selection an emerging need in precision medicine.
In this paper, we propose a penalized multi-stage $A$-learning for deriving the optimal dynamic treatment regime when the number of covariates is of the nonpolynomial (NP) order of the sample size. To preserve the double robustness property of the $A$-learning method, we adopt the Dantzig selector, which directly penalizes the A-leaning estimating equations. Oracle inequalities of the proposed estimators for the parameters in the optimal dynamic treatment regime and error bounds on the difference between the value functions of the estimated optimal dynamic treatment regime and the true optimal dynamic treatment regime are established. Empirical performance of the proposed approach is evaluated by simulations and illustrated with an application to data from the STAR∗D study.
</p>projecteuclid.org/euclid.aos/1525313071_20180502220435Wed, 02 May 2018 22:04 EDTTest for high-dimensional regression coefficients using refitted cross-validation variance estimationhttps://projecteuclid.org/euclid.aos/1525313072<strong>Hengjian Cui</strong>, <strong>Wenwen Guo</strong>, <strong>Wei Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 958--988.</p><p><strong>Abstract:</strong><br/>
Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte Carlo simulations, which show that the new test based on the RCV estimator achieves higher powers, especially for the sparse cases. We also demonstrate an application by an empirical analysis of a microarray data set on Yorkshire gilts.
</p>projecteuclid.org/euclid.aos/1525313072_20180502220435Wed, 02 May 2018 22:04 EDTAre discoveries spurious? Distributions of maximum spurious correlations and their applicationshttps://projecteuclid.org/euclid.aos/1525313073<strong>Jianqing Fan</strong>, <strong>Qi-Man Shao</strong>, <strong>Wen-Xin Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 989--1017.</p><p><strong>Abstract:</strong><br/>
Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$ with the best $s$ linear combinations of $p$ covariates $\mathbf{X}$, even when $\mathbf{X}$ and $Y$ are independent. When the covariance matrix of $\mathbf{X}$ possesses the restricted eigenvalue property, we derive such distributions for both a finite $s$ and a diverging $s$, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of $\mathbf{X}$. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis.
</p>projecteuclid.org/euclid.aos/1525313073_20180502220435Wed, 02 May 2018 22:04 EDTAdaptive estimation of planar convex setshttps://projecteuclid.org/euclid.aos/1525313074<strong>T. Tony Cai</strong>, <strong>Adityanand Guntuboyina</strong>, <strong>Yuting Wei</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1018--1049.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function. Both the problem of estimating the support function at a point and that of estimating the whole convex set are studied. For pointwise estimation, we consider the problem in a general nonasymptotic framework, which evaluates the performance of a procedure at each individual set, instead of the worst case performance over a large parameter space as in conventional minimax theory. A data-driven adaptive estimator is proposed and is shown to be optimally adaptive to every compact, convex set. For estimating the whole convex set, we propose estimators that are shown to adaptively achieve the optimal rate of convergence. In both of these problems, our analysis makes no smoothness assumptions on the boundary of the unknown convex set.
</p>projecteuclid.org/euclid.aos/1525313074_20180502220435Wed, 02 May 2018 22:04 EDTConsistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysishttps://projecteuclid.org/euclid.aos/1525313075<strong>Zhidong Bai</strong>, <strong>Kwok Pui Choi</strong>, <strong>Yasunori Fujikoshi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1050--1076.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the problem of estimating the number of significant components in principal component analysis (PCA), which corresponds to the number of dominant eigenvalues of the covariance matrix of $p$ variables. Our purpose is to examine the consistency of the estimation criteria AIC and BIC based on the model selection criteria by Akaike [In 2nd International Symposium on Information Theory (1973) 267–281, Akadémia Kiado] and Schwarz [ Estimating the dimension of a model 6 (1978) 461–464] under a high-dimensional asymptotic framework. Using random matrix theory techniques, we derive sufficient conditions for the criterion to be strongly consistent for the case when the dominant population eigenvalues are bounded, and when the dominant eigenvalues tend to infinity. Moreover, the asymptotic results are obtained without normality assumption on the population distribution. Simulation studies are also conducted, and results show that the sufficient conditions in our theorems are essential.
</p>projecteuclid.org/euclid.aos/1525313075_20180502220435Wed, 02 May 2018 22:04 EDTOn the systematic and idiosyncratic volatility with large panel high-frequency datahttps://projecteuclid.org/euclid.aos/1525313076<strong>Xin-Bing Kong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1077--1108.</p><p><strong>Abstract:</strong><br/>
In this paper, we separate the integrated (spot) volatility of an individual Itô process into integrated (spot) systematic and idiosyncratic volatilities, and estimate them by aggregation of local factor analysis (localization) with large-dimensional high-frequency data. We show that, when both the sampling frequency $n$ and the dimensionality $p$ go to infinity and $p\geq C\sqrt{n}$ for some constant $C$, our estimators of High dimensional Itô process; common driving process; specific driving process, integrated High dimensional Itô process, common driving process, specific driving process, systematic and idiosyncratic volatilities are $\sqrt{n}$ ($n^{1/4}$ for spot estimates) consistent, the best rate achieved in estimating the integrated (spot) volatility which is readily identified even with univariate high-frequency data. However, when $Cn^{1/4}\leq p<C\sqrt{n}$, aggregation of $n^{1/4}$-consistent local estimates of systematic and idiosyncratic volatilities results in $p$-consistent (not $\sqrt{n}$-consistent) estimates of integrated systematic and idiosyncratic volatilities. Even more interesting, when $p<Cn^{1/4}$, the integrated estimate has the same convergence rate as the spot estimate, both being $p$-consistent. This reveals a distinctive feature from aggregating local estimates in the low-dimensional high-frequency data setting. We also present estimators of the integrated (spot) idiosyncratic volatility matrices as well as their inverse matrices under some sparsity assumption. We finally present a factor-based estimator of the inverse of the spot volatility matrix. Numerical studies including the Monte Carlo experiments and real data analysis justify the performance of our estimators.
</p>projecteuclid.org/euclid.aos/1525313076_20180502220435Wed, 02 May 2018 22:04 EDTBall Divergence: Nonparametric two sample testhttps://projecteuclid.org/euclid.aos/1525313077<strong>Wenliang Pan</strong>, <strong>Yuan Tian</strong>, <strong>Xueqin Wang</strong>, <strong>Heping Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1109--1137.</p><p><strong>Abstract:</strong><br/>
In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. We show that this multivariate two sample test statistic is consistent with the Ball Divergence, and it converges to a mixture of $\chi^{2}$ distributions under the null hypothesis and a normal distribution under the alternative hypothesis. Importantly, we prove its consistency against a general alternative hypothesis. Moreover, this result does not depend on the ratio of the two imbalanced sample sizes, ensuring that can be applied to imbalanced data. Numerical studies confirm that our test is superior to several existing tests in terms of Type I error and power. We conclude our paper with two applications of our method: one is for virtual screening in drug development process and the other is for genome wide expression analysis in hormone replacement therapy.
</p>projecteuclid.org/euclid.aos/1525313077_20180502220435Wed, 02 May 2018 22:04 EDTA smooth block bootstrap for quantile regression with time serieshttps://projecteuclid.org/euclid.aos/1525313078<strong>Karl B. Gregory</strong>, <strong>Soumendra N. Lahiri</strong>, <strong>Daniel J. Nordman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1138--1166.</p><p><strong>Abstract:</strong><br/>
Quantile regression allows for broad (conditional) characterizations of a response distribution beyond conditional means and is of increasing interest in economic and financial applications. Because quantile regression estimators have complex limiting distributions, several bootstrap methods for the independent data setting have been proposed, many of which involve smoothing steps to improve bootstrap approximations. Currently, no similar advances in smoothed bootstraps exist for quantile regression with dependent data. To this end, we establish a smooth tapered block bootstrap procedure for approximating the distribution of quantile regression estimators for time series. This bootstrap involves two rounds of smoothing in resampling: individual observations are resampled via kernel smoothing techniques and resampled data blocks are smoothed by tapering. The smooth bootstrap results in performance improvements over previous unsmoothed versions of the block bootstrap as well as normal approximations based on Powell’s kernel variance estimator, which are common in application. Our theoretical results correct errors in proofs for earlier and simpler versions of the (unsmoothed) moving blocks bootstrap for quantile regression and broaden the validity of block bootstraps for this problem under weak conditions. We illustrate the smooth bootstrap through numerical studies and examples.
</p>projecteuclid.org/euclid.aos/1525313078_20180502220435Wed, 02 May 2018 22:04 EDTAsymptotic distribution-free tests for semiparametric regressions with dependent datahttps://projecteuclid.org/euclid.aos/1525313079<strong>Juan Carlos Escanciano</strong>, <strong>Juan Carlos Pardo-Fernández</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1167--1196.</p><p><strong>Abstract:</strong><br/>
This article proposes a new general methodology for constructing nonparametric and semiparametric Asymptotically Distribution-Free (ADF) tests for semiparametric hypotheses in regression models for possibly dependent data coming from a strictly stationary process. Classical tests based on the difference between the estimated distributions of the restricted and unrestricted regression errors are not ADF. In this article, we introduce a novel transformation of this difference that leads to ADF tests with well-known critical values. The general methodology is illustrated with applications to testing for parametric models against nonparametric or semiparametric alternatives, and semiparametric constrained mean–variance models. Several Monte Carlo studies and an empirical application show that the finite sample performance of the proposed tests is satisfactory in moderate sample sizes.
</p>projecteuclid.org/euclid.aos/1525313079_20180502220435Wed, 02 May 2018 22:04 EDTGradient-based structural change detection for nonstationary time series M-estimationhttps://projecteuclid.org/euclid.aos/1525313080<strong>Weichi Wu</strong>, <strong>Zhou Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1197--1224.</p><p><strong>Abstract:</strong><br/>
We consider structural change testing for a wide class of time series M-estimation with nonstationary predictors and errors. Flexible predictor-error relationships, including exogenous, state-heteroscedastic and autoregressive regressions and their mixtures, are allowed. New uniform Bahadur representations are established with nearly optimal approximation rates. A CUSUM-type test statistic based on the gradient vectors of the regression is considered. In this paper, a simple bootstrap method is proposed and is proved to be consistent for M-estimation structural change detection under both abrupt and smooth nonstationarity and temporal dependence. Our bootstrap procedure is shown to have certain asymptotically optimal properties in terms of accuracy and power. A public health time series dataset is used to illustrate our methodology, and asymmetry of structural changes in high and low quantiles is found.
</p>projecteuclid.org/euclid.aos/1525313080_20180502220435Wed, 02 May 2018 22:04 EDTModerate deviations and nonparametric inference for monotone functionshttps://projecteuclid.org/euclid.aos/1525313081<strong>Fuqing Gao</strong>, <strong>Jie Xiong</strong>, <strong>Xingqiu Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1225--1254.</p><p><strong>Abstract:</strong><br/>
This paper considers self-normalized limits and moderate deviations of nonparametric maximum likelihood estimators for monotone functions. We obtain their self-normalized Cramér-type moderate deviations and limit distribution theorems for the nonparametric maximum likelihood estimator in the current status model and the Grenander-type estimator. As applications of the results, we present a new procedure to construct asymptotical confidence intervals and asymptotical rejection regions of hypothesis testing for monotone functions. The theoretical results can guarantee that the new test has the probability of type II error tending to 0 exponentially. Simulation studies also show that the new nonparametric test works well for the most commonly used parametric survival functions such as exponential and Weibull survival distributions.
</p>projecteuclid.org/euclid.aos/1525313081_20180502220435Wed, 02 May 2018 22:04 EDTUniform asymptotic inference and the bootstrap after model selectionhttps://projecteuclid.org/euclid.aos/1525313082<strong>Ryan J. Tibshirani</strong>, <strong>Alessandro Rinaldo</strong>, <strong>Rob Tibshirani</strong>, <strong>Larry Wasserman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1255--1287.</p><p><strong>Abstract:</strong><br/>
Recently, Tibshirani et al. [ J. Amer. Statist. Assoc. 111 (2016) 600–620] proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples $n$ grows and the dimension $d$ of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that is provably (asymptotically) conservative, and in practice, often delivers shorter intervals than those from the original normality-based approach. Finally, we prove that the test statistic of Tibshirani et al. (2016) does not enjoy uniform validity in a high-dimensional setting, when the dimension $d$ is allowed grow.
</p>projecteuclid.org/euclid.aos/1525313082_20180502220435Wed, 02 May 2018 22:04 EDTDetection thresholds for the $\beta$-model on sparse graphshttps://projecteuclid.org/euclid.aos/1525313083<strong>Rajarshi Mukherjee</strong>, <strong>Sumit Mukherjee</strong>, <strong>Subhabrata Sen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1288--1317.</p><p><strong>Abstract:</strong><br/>
In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a version of the higher criticism test which is provably sharp up to optimal constants in the regime of sparse signals. The theoretical results are further verified by numerical simulations.
</p>projecteuclid.org/euclid.aos/1525313083_20180502220435Wed, 02 May 2018 22:04 EDTAdaptive sup-norm estimation of the Wigner function in noisy quantum homodyne tomographyhttps://projecteuclid.org/euclid.aos/1525313084<strong>Karim Lounici</strong>, <strong>Katia Meziani</strong>, <strong>Gabriel Peyré</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1318--1351.</p><p><strong>Abstract:</strong><br/>
In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb{R}^{2}$, which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2<\eta\leq1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb{L}_{\infty}$-risk over a class of infinitely differentiable functions. We also compute the lower bound for the $\mathbb{L}_{2}$-risk. We construct an adaptive estimator, that is, which does not depend on the smoothness parameters, and prove that it attains the minimax rates for the corresponding smoothness of the class of functions up to a logarithmic factor in the sample size. Finite sample behaviour of our adaptive procedure is explored through numerical experiments.
</p>projecteuclid.org/euclid.aos/1525313084_20180502220435Wed, 02 May 2018 22:04 EDTDistributed testing and estimation under sparse high dimensional modelshttps://projecteuclid.org/euclid.aos/1525313085<strong>Heather Battey</strong>, <strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Junwei Lu</strong>, <strong>Ziwei Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1352--1382.</p><p><strong>Abstract:</strong><br/>
This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large $k$ can be, as $n$ grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.
</p>projecteuclid.org/euclid.aos/1525313085_20180502220435Wed, 02 May 2018 22:04 EDT