The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTOn the optimality of Bayesian change-point detectionhttp://projecteuclid.org/euclid.aos/1498636860<strong>Dong Han</strong>, <strong>Fugee Tsung</strong>, <strong>Jinguo Xian</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1375--1402.</p><p><strong>Abstract:</strong><br/>
By introducing suitable loss random variables of detection, we obtain optimal tests in terms of the stopping time or alarm time for Bayesian change-point detection not only for a general prior distribution of change-points but also for observations being a Markov process. Moreover, the optimal (minimal) average detection delay is proved to be equal to $1$ for any (possibly large) average run length to false alarm if the number of possible change-points is finite.
</p>projecteuclid.org/euclid.aos/1498636860_20170628040134Wed, 28 Jun 2017 04:01 EDTComputational and statistical boundaries for submatrix localization in a large noisy matrixhttp://projecteuclid.org/euclid.aos/1498636861<strong>T. Tony Cai</strong>, <strong>Tengyuan Liang</strong>, <strong>Alexander Rakhlin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1403--1430.</p><p><strong>Abstract:</strong><br/>
We study in this paper computational and statistical boundaries for submatrix localization . Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude $\lambda$ and size $k_{m}\times k_{n}$) embedded in a large noise matrix (of size $m\times n$), the goal is to optimal identify the support of the signal submatrix computationally and statistically.
Two transition thresholds for the signal-to-noise ratio $\lambda/\sigma$ are established in terms of $m$, $n$, $k_{m}$ and $k_{n}$. The first threshold, $\sf SNR_{c}$, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold $\sf SNR_{c}$. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis . The second threshold, $\sf SNR_{s}$, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that $\sf SNR_{c}$ is always significantly larger than $\sf SNR_{s}$ under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.
</p>projecteuclid.org/euclid.aos/1498636861_20170628040134Wed, 28 Jun 2017 04:01 EDTTests for separability in nonparametric covariance operators of random surfaceshttp://projecteuclid.org/euclid.aos/1498636862<strong>John A. D. Aston</strong>, <strong>Davide Pigoli</strong>, <strong>Shahin Tavakoli</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1431--1461.</p><p><strong>Abstract:</strong><br/>
The assumption of separability of the covariance operator for a random image or hypersurface can be of substantial use in applications, especially in situations where the accurate estimation of the full covariance structure is unfeasible, either for computational reasons, or due to a small sample size. However, inferential tools to verify this assumption are somewhat lacking in high-dimensional or functional data analysis settings, where this assumption is most relevant. We propose here to test separability by focusing on $K$-dimensional projections of the difference between the covariance operator and a nonparametric separable approximation. The subspace we project onto is one generated by the eigenfunctions of the covariance operator estimated under the separability hypothesis, negating the need to ever estimate the full nonseparable covariance. We show that the rescaled difference of the sample covariance operator with its separable approximation is asymptotically Gaussian. As a by-product of this result, we derive asymptotically pivotal tests under Gaussian assumptions, and propose bootstrap methods for approximating the distribution of the test statistics. We probe the finite sample performance through simulations studies, and present an application to log-spectrogram images from a phonetic linguistics dataset.
</p>projecteuclid.org/euclid.aos/1498636862_20170628040134Wed, 28 Jun 2017 04:01 EDTIdentification of universally optimal circular designs for the interference modelhttp://projecteuclid.org/euclid.aos/1498636863<strong>Wei Zheng</strong>, <strong>Mingyao Ai</strong>, <strong>Kang Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1462--1487.</p><p><strong>Abstract:</strong><br/>
Many applications of block designs exhibit neighbor and edge effects. A popular remedy is to use the circular design coupled with the interference model. The search for optimal or efficient designs has been intensively studied in recent years. The circular neighbor balanced designs at distances 1 and 2 (CNBD2), including orthogonal array of type I ($\mathrm{OA}_{I}$) of strength $2$, are the two major designs proposed in literature for the purpose of estimating the direct treatment effects. They are shown to be optimal within some reasonable subclasses of designs. By using benchmark designs in approximate design theory, we show that CNBD2 is highly efficient among all possible designs when the error terms are homoscedastic and uncorrelated. However, when the error terms are correlated, these designs will be outperformed significantly by other designs. Note that CNBD2 fall into the special catalog of pseudo symmetric designs, and they only exist when the number of treatments is larger than the block size and the number of blocks is multiple of some constants. In this paper, we elaborate equivalent conditions for any design, pseudo symmetric or not, to be universally optimal for any size of experiment and any covariance structure of the error terms. This result is novel for circular designs and sheds light on other similar models in the search for optimal or efficient asymmetric designs.
</p>projecteuclid.org/euclid.aos/1498636863_20170628040134Wed, 28 Jun 2017 04:01 EDTCo-clustering of nonsmooth graphonshttp://projecteuclid.org/euclid.aos/1498636864<strong>David Choi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1488--1515.</p><p><strong>Abstract:</strong><br/>
Performance bounds are given for exploratory co-clustering/blockmodeling of bipartite graph data, where we assume the rows and columns of the data matrix are samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonsmooth graphon. It is shown that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with estimation error bounded by $O_{P}(n^{-1/2})$. Analogous performance bounds are also given for degree-corrected blockmodels and random dot product graphs, with error rates depending on the dimensionality of the latent variable space.
</p>projecteuclid.org/euclid.aos/1498636864_20170628040134Wed, 28 Jun 2017 04:01 EDTMinimax theory of estimation of linear functionals of the deconvolution density with or without sparsityhttp://projecteuclid.org/euclid.aos/1498636865<strong>Marianna Pensky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1516--1541.</p><p><strong>Abstract:</strong><br/>
The present paper considers the problem of estimating a linear functional $\Phi=\int_{-\infty}^{\infty}\varphi(x)f(x)\,dx$ of an unknown deconvolution density $f$ on the basis of $n$ i.i.d. observations, $Y_{1},\ldots,Y_{n}$ of $Y=\theta+\xi$, where $\xi$ has a known pdf $g$, and $f$ is the pdf of $\theta$. The objective of the present paper is to develop the a general minimax theory of estimating $\Phi$, and to relate this problem to estimation of functionals $\Phi_{n}=n^{-1}\sum_{i=1}^{n}\varphi(\theta_{i})$ in indirect observations. In particular, we offer a general, Fourier transform based approach to estimation of $\Phi$ (and $\Phi_{n}$) and derive upper and minimax lower bounds for the risk for an arbitrary square integrable function $\varphi$. Furthermore, using technique of inversion formulas, we extend the theory to a number of situations when the Fourier transform of $\varphi$ does not exist, but $\Phi$ can be presented as a functional of the Fourier transform of $f$ and its derivatives. The latter enables us to construct minimax estimators of the functionals that have never been handled before such as the odd absolute moments and the generalized moments of the deconvolution density. Finally, we generalize our results to the situation when the vector $\mathbf{{\theta}}$ is sparse and the objective is estimating $\Phi$ (or $\Phi_{n}$) over the nonzero components only. As a direct application of the proposed theory, we automatically recover multiple recent results and obtain a variety of new ones such as, for example, estimation of the mixing probability density function with classical and Berkson errors and estimation of the $(2M+1)$-th absolute moment of the deconvolution density.
</p>projecteuclid.org/euclid.aos/1498636865_20170628040134Wed, 28 Jun 2017 04:01 EDTNonparametric change-point analysis of volatilityhttp://projecteuclid.org/euclid.aos/1498636866<strong>Markus Bibinger</strong>, <strong>Moritz Jirak</strong>, <strong>Mathias Vetter</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1542--1578.</p><p><strong>Abstract:</strong><br/>
In this work, we develop change-point methods for statistics of high-frequency data. The main interest is in the volatility of an Itô semimartingale, the latter being discretely observed over a fixed time horizon. We construct a minimax-optimal test to discriminate continuous paths from paths with volatility jumps, and it is shown that the test can be embedded into a more general theory to infer the smoothness of volatilities. In a high-frequency setting, we prove weak convergence of the test statistic under the hypothesis to an extreme value distribution. Moreover, we develop methods to infer changes in the Hurst parameters of fractional volatility processes. A simulation study is conducted to demonstrate the performance of our methods in finite-sample applications.
</p>projecteuclid.org/euclid.aos/1498636866_20170628040134Wed, 28 Jun 2017 04:01 EDTA new approach to optimal designs for correlated observationshttp://projecteuclid.org/euclid.aos/1498636867<strong>Holger Dette</strong>, <strong>Maria Konstantinou</strong>, <strong>Anatoly Zhigljavsky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1579--1608.</p><p><strong>Abstract:</strong><br/>
This paper presents a new and efficient method for the construction of optimal designs for regression models with dependent error processes. In contrast to most of the work in this field, which starts with a model for a finite number of observations and considers the asymptotic properties of estimators and designs as the sample size converges to infinity, our approach is based on a continuous time model. We use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in this model. Based on the BLUE, we construct an efficient linear estimator and corresponding optimal designs in the model for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator) and the optimal design points, in particular in the multiparameter case.
In contrast to previous work on the subject, the resulting estimators and corresponding optimal designs are very efficient and easy to implement. This means that they are practically not distinguishable from the weighted least squares estimator and the corresponding optimal designs, which have to be found numerically by nonconvex discrete optimization. The advantages of the new approach are illustrated in several numerical examples.
</p>projecteuclid.org/euclid.aos/1498636867_20170628040134Wed, 28 Jun 2017 04:01 EDTRare-event analysis for extremal eigenvalues of white Wishart matriceshttp://projecteuclid.org/euclid.aos/1498636868<strong>Tiefeng Jiang</strong>, <strong>Kevin Leder</strong>, <strong>Gongjun Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1609--1637.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider the extreme behavior of the extremal eigenvalues of white Wishart matrices, which plays an important role in multivariate analysis. In particular, we focus on the case when the dimension of the feature $p$ is much larger than or comparable to the number of observations $n$, a common situation in modern data analysis. We provide asymptotic approximations and bounds for the tail probabilities of the extremal eigenvalues. Moreover, we construct efficient Monte Carlo simulation algorithms to compute the tail probabilities. Simulation results show that our method has the best performance among known approximation approaches, and furthermore provides an efficient and accurate way for evaluating the tail probabilities in practice.
</p>projecteuclid.org/euclid.aos/1498636868_20170628040134Wed, 28 Jun 2017 04:01 EDTRobust discrimination designs over Hellinger neighbourhoodshttp://projecteuclid.org/euclid.aos/1498636869<strong>Rui Hu</strong>, <strong>Douglas P. Wiens</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1638--1663.</p><p><strong>Abstract:</strong><br/>
To aid in the discrimination between two, possibly nonlinear, regression models, we study the construction of experimental designs. Considering that each of these two models might be only approximately specified, robust “maximin” designs are proposed. The rough idea is as follows. We impose neighbourhood structures on each regression response, to describe the uncertainty in the specifications of the true underlying models. We determine the least favourable—in terms of Kullback–Leibler divergence—members of these neighbourhoods. Optimal designs are those maximizing this minimum divergence. Sequential, adaptive approaches to this maximization are studied. Asymptotic optimality is established.
</p>projecteuclid.org/euclid.aos/1498636869_20170628040134Wed, 28 Jun 2017 04:01 EDTNonparametric Bayesian posterior contraction rates for discretely observed scalar diffusionshttp://projecteuclid.org/euclid.aos/1498636870<strong>Richard Nickl</strong>, <strong>Jakob Söhl</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1664--1693.</p><p><strong>Abstract:</strong><br/>
We consider nonparametric Bayesian inference in a reflected diffusion model $dX_{t}=b(X_{t})\,dt+\sigma(X_{t})\,dW_{t}$, with discretely sampled observations $X_{0},X_{\Delta},\ldots,X_{n\Delta}$. We analyse the nonlinear inverse problem corresponding to the “low frequency sampling” regime where $\Delta>0$ is fixed and $n\to\infty$. A general theorem is proved that gives conditions for prior distributions $\Pi$ on the diffusion coefficient $\sigma$ and the drift function $b$ that ensure minimax optimal contraction rates of the posterior distribution over Hölder–Sobolev smoothness classes. These conditions are verified for natural examples of nonparametric random wavelet series priors. For the proofs, we derive new concentration inequalities for empirical processes arising from discretely observed diffusions that are of independent interest.
</p>projecteuclid.org/euclid.aos/1498636870_20170628040134Wed, 28 Jun 2017 04:01 EDTAsymptotic and finite-sample properties of estimators based on stochastic gradientshttp://projecteuclid.org/euclid.aos/1498636871<strong>Panos Toulis</strong>, <strong>Edoardo M. Airoldi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1694--1727.</p><p><strong>Abstract:</strong><br/>
Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.
</p>projecteuclid.org/euclid.aos/1498636871_20170628040134Wed, 28 Jun 2017 04:01 EDTFunctional central limit theorems for single-stage sampling designshttp://projecteuclid.org/euclid.aos/1498636872<strong>Hélène Boistard</strong>, <strong>Hendrik P. Lopuhaä</strong>, <strong>Anne Ruiz-Gazen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1728--1758.</p><p><strong>Abstract:</strong><br/>
For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz–Thompson empirical process and the Hájek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to single-stage unequal probability sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.
</p>projecteuclid.org/euclid.aos/1498636872_20170628040134Wed, 28 Jun 2017 04:01 EDTAsymptotic normality of scrambled geometric net quadraturehttp://projecteuclid.org/euclid.aos/1498636873<strong>Kinjal Basu</strong>, <strong>Rajarshi Mukherjee</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1759--1788.</p><p><strong>Abstract:</strong><br/>
In a very recent work, Basu and Owen [ Found. Comput. Math. 17 (2017) 467–496] propose the use of scrambled geometric nets in numerical integration when the domain is a product of $s$ arbitrary spaces of dimension $d$ having a certain partitioning constraint. It was shown that for a class of smooth functions, the integral estimate has variance $O(n^{-1-2/d}(\log n)^{s-1})$ for scrambled geometric nets compared to $O(n^{-1})$ for ordinary Monte Carlo. The main idea of this paper is to expand on the work by Loh [ Ann. Statist. 31 (2003) 1282–1324] to show that the scrambled geometric net estimate has an asymptotic normal distribution for certain smooth functions defined on products of suitable subsets of $\mathbb{R}^{d}$.
</p>projecteuclid.org/euclid.aos/1498636873_20170628040134Wed, 28 Jun 2017 04:01 EDTYule’s “nonsense correlation” solved!http://projecteuclid.org/euclid.aos/1498636874<strong>Philip A. Ernst</strong>, <strong>Larry A. Shepp</strong>, <strong>Abraham J. Wyner</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1789--1809.</p><p><strong>Abstract:</strong><br/>
In this paper, we resolve a longstanding open statistical problem. The problem is to mathematically prove Yule’s 1926 empirical finding of “nonsense correlation” [ J. Roy. Statist. Soc. 89 (1926) 1–63], which we do by analytically determining the second moment of the empirical correlation coefficient \begin{eqnarray*}&&\theta:=\frac{\int_{0}^{1}W_{1}(t)W_{2}(t)\,dt-\int_{0}^{1}W_{1}(t)\,dt\int_{0}^{1}W_{2}(t)\,dt}{\sqrt{\int_{0}^{1}W^{2}_{1}(t)\,dt-(\int_{0}^{1}W_{1}(t)\,dt)^{2}}\sqrt{\int_{0}^{1}W^{2}_{2}(t)\,dt-(\int_{0}^{1}W_{2}(t)\,dt)^{2}}},\end{eqnarray*} of two independent Wiener processes, $W_{1},W_{2}$. Using tools from Fredholm integral equation theory, we successfully calculate the second moment of $\theta$ to obtain a value for the standard deviation of $\theta$ of nearly 0.5. The “nonsense” correlation, which we call “volatile” correlation, is volatile in the sense that its distribution is heavily dispersed and is frequently large in absolute value. It is induced because each Wiener process is “self-correlated” in time. This is because a Wiener process is an integral of pure noise, and thus its values at different time points are correlated. In addition to providing an explicit formula for the second moment of $\theta$, we offer implicit formulas for higher moments of $\theta$.
</p>projecteuclid.org/euclid.aos/1498636874_20170628040134Wed, 28 Jun 2017 04:01 EDTSharp detection in PCA under correlations: All eigenvalues matterhttp://projecteuclid.org/euclid.aos/1498636875<strong>Edgar Dobriban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1810--1833.</p><p><strong>Abstract:</strong><br/>
Principal component analysis (PCA) is a widely used method for dimension reduction. In high-dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over all eigenvalues—were recently shown to achieve some power.
We consider a “local alternatives” model for the spectrum of covariance matrices that allows a general correlation structure. We develop new tests to detect PCs in this model. While the top eigenvalue contains little information, due to the strong correlations between the eigenvalues we can detect weak PCs by averaging over all eigenvalues using LSS. We show that it is possible to find the optimal LSS, by solving a certain integral equation. To solve this equation, we develop efficient algorithms that build on our recent method for computing the limit empirical spectrum [Dobriban (2015)]. The solvability of this equation also presents a new perspective on phase transitions in spiked models.
</p>projecteuclid.org/euclid.aos/1498636875_20170628040134Wed, 28 Jun 2017 04:01 EDT“Local” vs. “global” parameters—breaking the Gaussian complexity barrierhttps://projecteuclid.org/euclid.aos/1509436820<strong>Shahar Mendelson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1835--1862.</p><p><strong>Abstract:</strong><br/>
We show that if $F$ is a convex class of functions that is $L$-sub-Gaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by “local” covering estimates of the class (i.e., the covering number at a specific level), rather than by the Gaussian average, which takes into account the structure of $F$ at an arbitrarily small scale. To that end, we establish new sharp upper and lower estimates on the error rate in such learning problems.
</p>projecteuclid.org/euclid.aos/1509436820_20171031040040Tue, 31 Oct 2017 04:00 EDTConfounder adjustment in multiple hypothesis testinghttps://projecteuclid.org/euclid.aos/1509436821<strong>Jingshu Wang</strong>, <strong>Qingyuan Zhao</strong>, <strong>Trevor Hastie</strong>, <strong>Art B. Owen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1863--1894.</p><p><strong>Abstract:</strong><br/>
We consider large-scale studies in which thousands of significance tests are performed simultaneously. In some of these studies, the multiple testing procedure can be severely biased by latent confounding factors such as batch effects and unmeasured covariates that correlate with both primary variable(s) of interest (e.g., treatment variable, phenotype) and the outcome. Over the past decade, many statistical methods have been proposed to adjust for the confounders in hypothesis testing. We unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties. In particular, we provide theoretical guarantees for RUV-4 [Gagnon-Bartsch, Jacob and Speed (2013)] and LEAPP [ Ann. Appl. Stat. 6 (2012) 1664–1688], which correspond to two different identification conditions in the framework: the first requires a set of “negative controls” that are known a priori to follow the null distribution; the second requires the true nonnulls to be sparse. Two different estimators which are based on RUV-4 and LEAPP are then applied to these two scenarios. We show that if the confounding factors are strong, the resulting estimators can be asymptotically as powerful as the oracle estimator which observes the latent confounding factors. For hypothesis testing, we show the asymptotic $z$-tests based on the estimators can control the type I error. Numerical experiments show that the false discovery rate is also controlled by the Benjamini–Hochberg procedure when the sample size is reasonably large.
</p>projecteuclid.org/euclid.aos/1509436821_20171031040040Tue, 31 Oct 2017 04:00 EDTGaussian approximation for high dimensional time serieshttps://projecteuclid.org/euclid.aos/1509436822<strong>Danna Zhang</strong>, <strong>Wei Biao Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1895--1919.</p><p><strong>Abstract:</strong><br/>
We consider the problem of approximating sums of high dimensional stationary time series by Gaussian vectors, using the framework of functional dependence measure. The validity of the Gaussian approximation depends on the sample size $n$, the dimension $p$, the moment condition and the dependence of the underlying processes. We also consider an estimator for long-run covariance matrices and study its convergence properties. Our results allow constructing simultaneous confidence intervals for mean vectors of high-dimensional time series with asymptotically correct coverage probabilities. As an application, we propose a Kolmogorov–Smirnov-type statistic for testing distributions of high-dimensional time series.
</p>projecteuclid.org/euclid.aos/1509436822_20171031040040Tue, 31 Oct 2017 04:00 EDTDetection and feature selection in sparse mixture modelshttps://projecteuclid.org/euclid.aos/1509436823<strong>Nicolas Verzelen</strong>, <strong>Ery Arias-Castro</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1920--1950.</p><p><strong>Abstract:</strong><br/>
We consider Gaussian mixture models in high dimensions, focusing on the twin tasks of detection and feature selection. Under sparsity assumptions on the difference in means, we derive minimax rates for the problems of testing and of variable selection. We find these rates to depend crucially on the knowledge of the covariance matrices and on whether the mixture is symmetric or not. We establish the performance of various procedures, including the top sparse eigenvalue of the sample covariance matrix (popular in the context of Sparse PCA), as well as new tests inspired by the normality tests of Malkovich and Afifi [ J. Amer. Statist. Assoc. 68 (1973) 176–179].
</p>projecteuclid.org/euclid.aos/1509436823_20171031040040Tue, 31 Oct 2017 04:00 EDTMinimax estimation of a functional on a structured high-dimensional modelhttps://projecteuclid.org/euclid.aos/1509436824<strong>James M. Robins</strong>, <strong>Lingling Li</strong>, <strong>Rajarshi Mukherjee</strong>, <strong>Eric Tchetgen Tchetgen</strong>, <strong>Aad van der Vaart</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1951--1987.</p><p><strong>Abstract:</strong><br/>
We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method employs $U$-statistics that are based on higher-order influence functions of the parameter of interest, which extend ordinary linear influence functions, and represent higher derivatives of this parameter. For parameters for which the representation cannot be perfect the method often leads to a bias-variance trade-off, and results in estimators that converge at a slower than $\sqrt{n}$-rate. In a number of examples, the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at $\sqrt{n}$-rate, but we also consider efficient $\sqrt{n}$-estimation using novel nonlinear estimators. The general approach is applied in detail to the example of estimating a mean response when the response is not always observed.
</p>projecteuclid.org/euclid.aos/1509436824_20171031040040Tue, 31 Oct 2017 04:00 EDTAsymptotic theory of generalized estimating equations based on jack-knife pseudo-observationshttps://projecteuclid.org/euclid.aos/1509436825<strong>Morten Overgaard</strong>, <strong>Erik Thorlund Parner</strong>, <strong>Jan Pedersen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 1988--2015.</p><p><strong>Abstract:</strong><br/>
A general asymptotic theory of estimates from estimating functions based on jack-knife pseudo-observations is established by requiring that the underlying estimator can be expressed as a smooth functional of the empirical distribution. Using results in $p$-variation norms, the theory is applied to important estimators from time-to-event analysis, namely the Kaplan–Meier estimator and the Aalen–Johansen estimator in a competing risks model, and the corresponding estimators of restricted mean survival and cause-specific lifetime lost. Under an assumption of completely independent censorings, this allows for estimating parameters in regression models of survival, cumulative incidences, restricted mean survival, and cause-specific lifetime lost. Considering estimators as functionals and applying results in $p$-variation norms is apparently an excellent way of studying the asymptotics of such estimators.
</p>projecteuclid.org/euclid.aos/1509436825_20171031040040Tue, 31 Oct 2017 04:00 EDTBayesian Poisson calculus for latent feature modeling via generalized Indian Buffet Process priorshttps://projecteuclid.org/euclid.aos/1509436826<strong>Lancelot F. James</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2016--2045.</p><p><strong>Abstract:</strong><br/>
Statistical latent feature models, such as latent factor models, are models where each observation is associated with a vector of latent features. A general problem is how to select the number/types of features, and related quantities. In Bayesian statistical machine learning, one seeks (nonparametric) models where one can learn such quantities in the presence of observed data. The Indian Buffet Process (IBP), devised by Griffiths and Ghahramani (2005), generates a (sparse) latent binary matrix with columns representing a potentially unbounded number of features and where each row corresponds to an individual or object. Its generative scheme is cast in terms of customers entering sequentially an Indian Buffet restaurant and selecting previously sampled dishes as well as new dishes. Dishes correspond to latent features shared by individuals. The IBP has been applied to a wide range of statistical problems. Recent works have demonstrated the utility of generalizations to nonbinary matrices. The purpose of this work is to describe a unified mechanism for construction, Bayesian analysis, and practical sampling of broad generalizations of the IBP that generate (sparse) matrices with general entries. An adaptation of the Poisson partition calculus is employed to handle the complexities, including combinatorial aspects, of these models. Our work reveals a spike and slab characterization, and also presents a general framework for multivariate extensions. We close by highlighting a multivariate IBP with condiments, and the role of a stable-Beta Dirichlet multivariate prior.
</p>projecteuclid.org/euclid.aos/1509436826_20171031040040Tue, 31 Oct 2017 04:00 EDTInformation-regret compromise in covariate-adaptive treatment allocationhttps://projecteuclid.org/euclid.aos/1509436827<strong>Asya Metelkina</strong>, <strong>Luc Pronzato</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2046--2073.</p><p><strong>Abstract:</strong><br/>
Covariate-adaptive treatment allocation is considered in the situation when a compromise must be made between information (about the dependency of the probability of success of each treatment upon influential covariates) and cost (in terms of number of subjects receiving the poorest treatment). Information is measured through a design criterion for parameter estimation, the cost is additive and is related to the success probabilities. Within the framework of approximate design theory, the determination of optimal allocations forms a compound design problem. We show that when the covariates are i.i.d. with a probability measure $\mu$, its solution possesses some similarities with the construction of optimal design measures bounded by $\mu$. We characterize optimal designs through an equivalence theorem and construct a covariate-adaptive sequential allocation strategy that converges to the optimum. Our new optimal designs can be used as benchmarks for other, more usual, allocation methods. A response-adaptive implementation is possible for practical applications with unknown model parameters. Several illustrative examples are provided.
</p>projecteuclid.org/euclid.aos/1509436827_20171031040040Tue, 31 Oct 2017 04:00 EDTSparse CCA: Adaptive estimation and computational barriershttps://projecteuclid.org/euclid.aos/1509436828<strong>Chao Gao</strong>, <strong>Zongming Ma</strong>, <strong>Harrison H. Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2074--2101.</p><p><strong>Abstract:</strong><br/>
Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. Under a Gaussian canonical pair model, we first establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition. Finally, we show that a sample size condition of this kind is needed for any randomized polynomial-time estimator to be consistent, assuming hardness of certain instances of the planted clique detection problem. As a byproduct, we obtain the first computational lower bounds for sparse PCA under the Gaussian single spiked covariance model.
</p>projecteuclid.org/euclid.aos/1509436828_20171031040040Tue, 31 Oct 2017 04:00 EDTOptimal designs for dose response curves with common parametershttps://projecteuclid.org/euclid.aos/1509436829<strong>Chrystel Feller</strong>, <strong>Kirsten Schorning</strong>, <strong>Holger Dette</strong>, <strong>Georgina Bermann</strong>, <strong>Björn Bornkamp</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2102--2132.</p><p><strong>Abstract:</strong><br/> A common problem in Phase II clinical trials is the comparison of dose response curves corresponding to different treatment groups. If the effect of the dose level is described by parametric regression models and the treatments differ in the administration frequency (but not in the sort of drug), a reasonable assumption is that the regression models for the different treatments share common parameters. This paper develops optimal design theory for the comparison of different regression models with common parameters. We derive upper bounds on the number of support points of admissible designs, and explicit expressions for $D$-optimal designs are derived for frequently used dose response models with a common location parameter. If the location and scale parameter in the different models coincide, minimally supported designs are determined and sufficient conditions for their optimality in the class of all designs derived. The results are illustrated in a dose-finding study comparing monthly and weekly administration. </p>projecteuclid.org/euclid.aos/1509436829_20171031040040Tue, 31 Oct 2017 04:00 EDTFalse discoveries occur early on the Lasso pathhttps://projecteuclid.org/euclid.aos/1509436830<strong>Weijie Su</strong>, <strong>Małgorzata Bogdan</strong>, <strong>Emmanuel Candès</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2133--2150.</p><p><strong>Abstract:</strong><br/>
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.
</p>projecteuclid.org/euclid.aos/1509436830_20171031040040Tue, 31 Oct 2017 04:00 EDTPhase transitions for high dimensional clustering and related problemshttps://projecteuclid.org/euclid.aos/1509436831<strong>Jiashun Jin</strong>, <strong>Zheng Tracy Ke</strong>, <strong>Wanjie Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2151--2189.</p><p><strong>Abstract:</strong><br/> Consider a two-class clustering problem where we observe $X_{i}=\ell_{i}\mu+Z_{i}$, $Z_{i}\stackrel{\mathit{i.i.d.}}{\sim}N(0,I_{p})$, $1\leq i\leq n$. The feature vector $\mu\in R^{p}$ is unknown but is presumably sparse. The class labels $\ell_{i}\in\{-1,1\}$ are also unknown and the main interest is to estimate them. We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of Impossibility and Region of Possibility . In the former, useful features are too rare/weak for successful clustering. In the latter, useful features are strong enough to allow successful clustering. The results are extended to the case of colored noise using Le Cam’s idea on comparison of experiments. We also extend the study on statistical limits for clustering to that for signal recovery and that for global testing. We compare the statistical limits for three problems and expose some interesting insight. We propose classical PCA and Important Features PCA (IF-PCA) for clustering. For a threshold $t>0$, IF-PCA clusters by applying classical PCA to all columns of $X$ with an $L^{2}$-norm larger than $t$. We also propose two aggregation methods. For any parameter in the Region of Possibility, some of these methods yield successful clustering. We discover a phase transition for IF-PCA. For any threshold $t>0$, let $\xi^{(t)}$ be the first left singular vector of the post-selection data matrix. The phase space partitions into two different regions. In one region, there is a $t$ such that $\cos(\xi^{(t)},\ell)\rightarrow 1$ and IF-PCA yields successful clustering. In the other, $\cos(\xi^{(t)},\ell)\leq c_{0}<1$ for all $t>0$. Our results require delicate analysis, especially on post-selection random matrix theory and on lower bound arguments. </p>projecteuclid.org/euclid.aos/1509436831_20171031040040Tue, 31 Oct 2017 04:00 EDTBayesian detection of image boundarieshttps://projecteuclid.org/euclid.aos/1509436832<strong>Meng Li</strong>, <strong>Subhashis Ghosal</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2190--2217.</p><p><strong>Abstract:</strong><br/>
Detecting boundary of an image based on noisy observations is a fundamental problem of image processing and image segmentation. For a $d$-dimensional image ($d=2,3,\ldots$), the boundary can often be described by a closed smooth $(d-1)$-dimensional manifold. In this paper, we propose a nonparametric Bayesian approach based on priors indexed by $\mathbb{S}^{d-1}$, the unit sphere in $\mathbb{R}^{d}$. We derive optimal posterior contraction rates for Gaussian processes or finite random series priors using basis functions such as trigonometric polynomials for 2-dimensional images and spherical harmonics for 3-dimensional images. For 2-dimensional images, we show a rescaled squared exponential Gaussian process on $\mathbb{S}^{1}$ achieves four goals of guaranteed geometric restriction, (nearly) minimax optimal rate adapting to the smoothness level, convenience for joint inference and computational efficiency. We conduct an extensive study of its reproducing kernel Hilbert space, which may be of interest by its own and can also be used in other contexts. Several new estimates on modified Bessel functions of the first kind are given. Simulations confirm excellent performance and robustness of the proposed method.
</p>projecteuclid.org/euclid.aos/1509436832_20171031040040Tue, 31 Oct 2017 04:00 EDTSpectrum estimation from sampleshttps://projecteuclid.org/euclid.aos/1509436833<strong>Weihao Kong</strong>, <strong>Gregory Valiant</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2218--2247.</p><p><strong>Abstract:</strong><br/>
We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together, this yields an eigenvalue reconstruction algorithm that is asymptotically consistent as the dimensionality of the distribution and sample size tend toward infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes.
</p>projecteuclid.org/euclid.aos/1509436833_20171031040040Tue, 31 Oct 2017 04:00 EDTOn the contraction properties of some high-dimensional quasi-posterior distributionshttps://projecteuclid.org/euclid.aos/1509436834<strong>Yves A. Atchadé</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2248--2273.</p><p><strong>Abstract:</strong><br/>
We study the contraction properties of a quasi-posterior distribution $\check{\Pi}_{n,d}$ obtained by combining a quasi-likelihood function and a sparsity inducing prior distribution on $\mathbb{R}^{d}$, as both $n$ (the sample size), and $d$ (the dimension of the parameter) increase. We derive some general results that highlight a set of sufficient conditions under which $\check{\Pi}_{n,d}$ puts increasingly high probability on sparse subsets of $\mathbb{R}^{d}$, and contracts toward the true value of the parameter. We apply these results to the analysis of logistic regression models, and binary graphical models, in high-dimensional settings. For the logistic regression model, we shows that for well-behaved design matrices, the posterior distribution contracts at the rate $O(\sqrt{s_{\star}\log(d)/n})$, where $s_{\star}$ is the number of nonzero components of the parameter. For the binary graphical model, under some regularity conditions, we show that a quasi-posterior analog of the neighborhood selection of [ Ann. Statist. 34 (2006) 1436–1462] contracts in the Frobenius norm at the rate $O(\sqrt{(p+S)\log(p)/n})$, where $p$ is the number of nodes, and $S$ the number of edges of the true graph.
</p>projecteuclid.org/euclid.aos/1509436834_20171031040040Tue, 31 Oct 2017 04:00 EDTNonasymptotic analysis of semiparametric regression models with high-dimensional parametric coefficientshttps://projecteuclid.org/euclid.aos/1509436835<strong>Ying Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 5, 2274--2298.</p><p><strong>Abstract:</strong><br/>
We consider a two-step projection based Lasso procedure for estimating a partially linear regression model where the number of coefficients in the linear component can exceed the sample size and these coefficients belong to the $l_{q}$-“balls” for $q\in[0,1]$. Our theoretical results regarding the properties of the estimators are nonasymptotic. In particular, we establish a new nonasymptotic “oracle” result: Although the error of the nonparametric projection per se (with respect to the prediction norm) has the scaling $t_{n}$ in the first step, it only contributes a scaling $t_{n}^{2}$ in the $l_{2}$-error of the second-step estimator for the linear coefficients. This new “oracle” result holds for a large family of nonparametric least squares procedures and regularized nonparametric least squares procedures for the first-step estimation and the driver behind it lies in the projection strategy. We specialize our analysis to the estimation of a semiparametric sample selection model and provide a simple method with theoretical guarantees for choosing the regularization parameter in practice.
</p>projecteuclid.org/euclid.aos/1509436835_20171031040040Tue, 31 Oct 2017 04:00 EDTA likelihood ratio framework for high-dimensional semiparametric regressionhttps://projecteuclid.org/euclid.aos/1513328574<strong>Yang Ning</strong>, <strong>Tianqi Zhao</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2299--2327.</p><p><strong>Abstract:</strong><br/>
We propose a new inferential framework for high-dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high-dimensional data analysis, including incomplete data, selection bias and heterogeneity. Our work has three main contributions: (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new likelihood ratio based framework to construct post-regularization confidence regions and tests for the low dimensional components of high-dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We further extend the theoretical results to the problems of missing data and multiple datasets inference. Extensive simulation studies and real data analysis are provided to illustrate the proposed approach.
</p>projecteuclid.org/euclid.aos/1513328574_20171215040315Fri, 15 Dec 2017 04:03 ESTA new perspective on boosting in linear regression via subgradient optimization and relativeshttps://projecteuclid.org/euclid.aos/1513328575<strong>Robert M. Freund</strong>, <strong>Paul Grigas</strong>, <strong>Rahul Mazumder</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2328--2364.</p><p><strong>Abstract:</strong><br/>
We analyze boosting algorithms [ Ann. Statist. 29 (2001) 1189–1232; Ann. Statist. 28 (2000) 337–407; Ann. Statist. 32 (2004) 407–499] in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm ($\text{FS}_{\varepsilon}$) and least squares boosting [LS-BOOST$(\varepsilon)$], can be viewed as subgradient descent to minimize the loss function defined as the maximum absolute correlation between the features and residuals. We also propose a minor modification of $\text{FS}_{\varepsilon}$ that yields an algorithm for the LASSO, and that may be easily extended to an algorithm that computes the LASSO path for different values of the regularization parameter. Furthermore, we show that these new algorithms for the LASSO may also be interpreted as the same master algorithm (subgradient descent), applied to a regularized version of the maximum absolute correlation loss function. We derive novel, comprehensive computational guarantees for several boosting algorithms in linear regression (including LS-BOOST$(\varepsilon)$ and $\text{FS}_{\varepsilon}$) by using techniques of first-order methods in convex optimization. Our computational guarantees inform us about the statistical properties of boosting algorithms. In particular, they provide, for the first time, a precise theoretical description of the amount of data-fidelity and regularization imparted by running a boosting algorithm with a prespecified learning rate for a fixed but arbitrary number of iterations, for any dataset.
</p>projecteuclid.org/euclid.aos/1513328575_20171215040315Fri, 15 Dec 2017 04:03 ESTOn the validity of resampling methods under long memoryhttps://projecteuclid.org/euclid.aos/1513328576<strong>Shuyang Bai</strong>, <strong>Murad S. Taqqu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2365--2399.</p><p><strong>Abstract:</strong><br/>
For long-memory time series, inference based on resampling is of crucial importance, since the asymptotic distribution can often be non-Gaussian and is difficult to determine statistically. However, due to the strong dependence, establishing the asymptotic validity of resampling methods is nontrivial. In this paper, we derive an efficient bound for the canonical correlation between two finite blocks of a long-memory time series. We show how this bound can be applied to establish the asymptotic consistency of subsampling procedures for general statistics under long memory. It allows the subsample size $b$ to be $o(n)$, where $n$ is the sample size, irrespective of the strength of the memory. We are then able to improve many results found in the literature. We also consider applications of subsampling procedures under long memory to the sample covariance, M-estimation and empirical processes.
</p>projecteuclid.org/euclid.aos/1513328576_20171215040315Fri, 15 Dec 2017 04:03 ESTCoCoLasso for high-dimensional error-in-variables regressionhttps://projecteuclid.org/euclid.aos/1513328577<strong>Abhirup Datta</strong>, <strong>Hui Zou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2400--2426.</p><p><strong>Abstract:</strong><br/>
Much theoretical and applied work has been devoted to high-dimensional regression with clean data. However, we often face corrupted data in many applications where missing data and measurement errors cannot be ignored. Loh and Wainwright [ Ann. Statist. 40 (2012) 1637–1664] proposed a nonconvex modification of the Lasso for doing high-dimensional regression with noisy and missing data. It is generally agreed that the virtues of convexity contribute fundamentally the success and popularity of the Lasso. In light of this, we propose a new method named CoCoLasso that is convex and can handle a general class of corrupted datasets. We establish the estimation error bounds of CoCoLasso and its asymptotic sign-consistent selection property. We further elucidate how the standard cross validation techniques can be misleading in presence of measurement error and develop a novel calibrated cross-validation technique by using the basic idea in CoCoLasso. The calibrated cross-validation has its own importance. We demonstrate the superior performance of our method over the nonconvex approach by simulation studies.
</p>projecteuclid.org/euclid.aos/1513328577_20171215040315Fri, 15 Dec 2017 04:03 ESTConsistent parameter estimation for LASSO and approximate message passinghttps://projecteuclid.org/euclid.aos/1513328578<strong>Ali Mousavi</strong>, <strong>Arian Maleki</strong>, <strong>Richard G. Baraniuk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2427--2454.</p><p><strong>Abstract:</strong><br/>
This paper studies the optimal tuning of the regularization parameter in LASSO or the threshold parameters in approximate message passing (AMP). Considering a model in which the design matrix and noise are zero-mean i.i.d. Gaussian, we propose a data-driven approach for estimating the regularization parameter of LASSO and the threshold parameters in AMP. Our estimates are consistent, that is, they converge to their asymptotically optimal values in probability as $n$, the number of observations, and $p$, the ambient dimension of the sparse vector, grow to infinity, while $n/p$ converges to a fixed number $\delta$. As a byproduct of our analysis, we will shed light on the asymptotic properties of the solution paths of LASSO and AMP.
</p>projecteuclid.org/euclid.aos/1513328578_20171215040315Fri, 15 Dec 2017 04:03 ESTSupport recovery without incoherence: A case for nonconvex regularizationhttps://projecteuclid.org/euclid.aos/1513328579<strong>Po-Ling Loh</strong>, <strong>Martin J. Wainwright</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2455--2482.</p><p><strong>Abstract:</strong><br/>
We develop a new primal-dual witness proof framework that may be used to establish variable selection consistency and $\ell_{\infty}$-bounds for sparse regression problems, even when the loss function and regularizer are nonconvex. We use this method to prove two theorems concerning support recovery and $\ell_{\infty}$-guarantees for a regression estimator in a general setting. Notably, our theory applies to all potential stationary points of the objective and certifies that the stationary point is unique under mild conditions. Our results provide a strong theoretical justification for the use of nonconvex regularization: For certain nonconvex regularizers with vanishing derivative away from the origin, any stationary point can be used to recover the support without requiring the typical incoherence conditions present in $\ell_{1}$-based methods. We also derive corollaries illustrating the implications of our theorems for composite objective functions involving losses such as least squares, nonconvex modified least squares for errors-in-variables linear regression, the negative log likelihood for generalized linear models and the graphical Lasso. We conclude with empirical studies that corroborate our theoretical predictions.
</p>projecteuclid.org/euclid.aos/1513328579_20171215040315Fri, 15 Dec 2017 04:03 ESTOptimal design of fMRI experiments using circulant (almost-)orthogonal arrayshttps://projecteuclid.org/euclid.aos/1513328580<strong>Yuan-Lung Lin</strong>, <strong>Frederick Kin Hing Phoa</strong>, <strong>Ming-Hung Kao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2483--2510.</p><p><strong>Abstract:</strong><br/>
Functional magnetic resonance imaging (fMRI) is a pioneering technology for studying brain activity in response to mental stimuli. Although efficient designs on these fMRI experiments are important for rendering precise statistical inference on brain functions, they are not systematically constructed. Design with circulant property is crucial for estimating a hemodynamic response function (HRF) and discussing fMRI experimental optimality. In this paper, we develop a theory that not only successfully explains the structure of a circulant design, but also provides a method of constructing efficient fMRI designs systematically. We further provide a class of two-level circulant designs with good performance (statistically optimal), and they can be used to estimate the HRF of a stimulus type and study the comparison of two HRFs. Some efficient three- and four-levels circulant designs are also provided, and we proved the existence of a class of circulant orthogonal arrays.
</p>projecteuclid.org/euclid.aos/1513328580_20171215040315Fri, 15 Dec 2017 04:03 ESTAdaptive Bernstein–von Mises theorems in Gaussian white noisehttps://projecteuclid.org/euclid.aos/1513328581<strong>Kolyan Ray</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2511--2536.</p><p><strong>Abstract:</strong><br/>
We investigate Bernstein–von Mises theorems for adaptive nonparametric Bayesian procedures in the canonical Gaussian white noise model. We consider both a Hilbert space and multiscale setting with applications in $L^{2}$ and $L^{\infty}$, respectively. This provides a theoretical justification for plug-in procedures, for example the use of certain credible sets for sufficiently smooth linear functionals. We use this general approach to construct optimal frequentist confidence sets based on the posterior distribution. We also provide simulations to numerically illustrate our approach and obtain a visual representation of the geometries involved.
</p>projecteuclid.org/euclid.aos/1513328581_20171215040315Fri, 15 Dec 2017 04:03 ESTTargeted sequential design for targeted learning inference of the optimal treatment rule and its mean rewardhttps://projecteuclid.org/euclid.aos/1513328582<strong>Antoine Chambaz</strong>, <strong>Wenjing Zheng</strong>, <strong>Mark J. van der Laan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2537--2564.</p><p><strong>Abstract:</strong><br/> This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the nonexceptional case, that is , assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Our main result is a central limit theorem which enables the construction of confidence intervals on both mean rewards under the current estimate of the optimal TR and under the optimal TR itself. The asymptotic variance of the estimator takes the form of the variance of an efficient influence curve at a limiting distribution, allowing to discuss the efficiency of inference. As a by product, we also derive confidence intervals on two cumulated pseudo-regrets, a key notion in the study of bandits problems. A simulation study illustrates the procedure. One of the cornerstones of the theoretical study is a new maximal inequality for martingales with respect to the uniform entropy integral. </p>projecteuclid.org/euclid.aos/1513328582_20171215040315Fri, 15 Dec 2017 04:03 ESTNonparametric goodness-of-fit tests for uniform stochastic orderinghttps://projecteuclid.org/euclid.aos/1513328583<strong>Chuan-Fa Tang</strong>, <strong>Dewei Wang</strong>, <strong>Joshua M. Tebbs</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2565--2589.</p><p><strong>Abstract:</strong><br/>
We propose $L^{p}$ distance-based goodness-of-fit (GOF) tests for uniform stochastic ordering with two continuous distributions $F$ and $G$, both of which are unknown. Our tests are motivated by the fact that when $F$ and $G$ are uniformly stochastically ordered, the ordinal dominance curve $R=FG^{-1}$ is star-shaped. We derive asymptotic distributions and prove that our testing procedure has a unique least favorable configuration of $F$ and $G$ for $p\in [1,\infty]$. We use simulation to assess finite-sample performance and demonstrate that a modified, one-sample version of our procedure (e.g., with $G$ known) is more powerful than the one-sample GOF test suggested by Arcones and Samaniego [ Ann. Statist. 28 (2000) 116–150]. We also discuss sample size determination. We illustrate our methods using data from a pharmacology study evaluating the effects of administering caffeine to prematurely born infants.
</p>projecteuclid.org/euclid.aos/1513328583_20171215040315Fri, 15 Dec 2017 04:03 ESTSelecting the number of principal components: Estimation of the true rank of a noisy matrixhttps://projecteuclid.org/euclid.aos/1513328584<strong>Yunjin Choi</strong>, <strong>Jonathan Taylor</strong>, <strong>Robert Tibshirani</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2590--2617.</p><p><strong>Abstract:</strong><br/>
Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of principal components. In order to address this challenge, we propose distribution-based methods with exact type 1 error controls for hypothesis testing and construction of confidence intervals for signals in a noisy matrix with finite samples. Assuming Gaussian noise, we derive exact type 1 error controls based on the conditional distribution of the singular values of a Gaussian matrix by utilizing a post-selection inference framework, and extending the approach of [Taylor, Loftus and Tibshirani (2013)] in a PCA setting. In simulation studies, we find that our proposed methods compare well to existing approaches.
</p>projecteuclid.org/euclid.aos/1513328584_20171215040315Fri, 15 Dec 2017 04:03 ESTExtended conditional independence and applications in causal inferencehttps://projecteuclid.org/euclid.aos/1513328585<strong>Panayiota Constantinou</strong>, <strong>A. Philip Dawid</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2618--2653.</p><p><strong>Abstract:</strong><br/>
The goal of this paper is to integrate the notions of stochastic conditional independence and variation conditional independence under a more general notion of extended conditional independence. We show that under appropriate assumptions the calculus that applies for the two cases separately (axioms of a separoid) still applies for the extended case. These results provide a rigorous basis for a wide range of statistical concepts, including ancillarity and sufficiency, and, in particular, the Decision Theoretic framework for statistical causality, which uses the language and calculus of conditional independence in order to express causal properties and make causal inferences.
</p>projecteuclid.org/euclid.aos/1513328585_20171215040315Fri, 15 Dec 2017 04:03 ESTA weight-relaxed model averaging approach for high-dimensional generalized linear modelshttps://projecteuclid.org/euclid.aos/1513328586<strong>Tomohiro Ando</strong>, <strong>Ker-chau Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2654--2679.</p><p><strong>Abstract:</strong><br/>
Model averaging has long been proposed as a powerful alternative to model selection in regression analysis. However, how well it performs in high-dimensional regression is still poorly understood. Recently, Ando and Li [ J. Amer. Statist. Assoc. 109 (2014) 254–265] introduced a new method of model averaging that allows the number of predictors to increase as the sample size increases. One notable feature of Ando and Li’s method is the relaxation on the total model weights so that weak signals can be efficiently combined from high-dimensional linear models. It is natural to ask if Ando and Li’s method and results can be extended to nonlinear models. Because all candidate models should be treated as working models, the existence of a theoretical target of the quasi maximum likelihood estimator under model misspecification needs to be established first. In this paper, we consider generalized linear models as our candidate models. We establish a general result to show the existence of pseudo-true regression parameters under model misspecification. We derive proper conditions for the leave-one-out cross-validation weight selection to achieve asymptotic optimality. Technically, the pseudo true target parameters between working models are not linearly linked. To overcome the encountered difficulties, we employ a novel strategy of decomposing and bounding the bias and variance terms in our proof. We conduct simulations to illustrate the merits of our model averaging procedure over several existing methods, including the lasso and group lasso methods, the Akaike and Bayesian information criterion model-averaging methods and some other state-of-the-art regularization methods.
</p>projecteuclid.org/euclid.aos/1513328586_20171215040315Fri, 15 Dec 2017 04:03 ESTStructural similarity and difference testing on multiple sparse Gaussian graphical modelshttps://projecteuclid.org/euclid.aos/1513328587<strong>Weidong Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2680--2707.</p><p><strong>Abstract:</strong><br/>
We present a new framework on inferring structural similarities and differences among multiple high-dimensional Gaussian graphical models (GGMs) corresponding to the same set of variables under distinct experimental conditions. The new framework adopts the partial correlation coefficients to characterize the potential changes of dependency strengths between two variables. A hierarchical method has been further developed to recover edges with different or similar dependency strengths across multiple GGMs. In particular, we first construct two-sample test statistics for testing the equality of partial correlation coefficients and conduct large-scale multiple tests to estimate the substructure of differential dependencies. After removing differential substructure from original GGMs, a follow-up multiple testing procedure is used to detect the substructure of similar dependencies among GGMs. In each step, false discovery rate is controlled asymptotically at a desired level. Power results are proved, which demonstrate that our method is more powerful on finding common edges than the common approach that separately estimates GGMs. The performance of the proposed hierarchical method is illustrated on simulated datasets.
</p>projecteuclid.org/euclid.aos/1513328587_20171215040315Fri, 15 Dec 2017 04:03 ESTEstimating a probability mass function with unknown labelshttps://projecteuclid.org/euclid.aos/1513328588<strong>Dragi Anevski</strong>, <strong>Richard D. Gill</strong>, <strong>Stefan Zohren</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2708--2735.</p><p><strong>Abstract:</strong><br/>
In the context of a species sampling problem, we discuss a nonparametric maximum likelihood estimator for the underlying probability mass function. The estimator is known in the computer science literature as the high profile estimator. We prove strong consistency and derive the rates of convergence, for an extended model version of the estimator. We also study a sieved estimator for which similar consistency results are derived. Numerical computation of the sieved estimator is of great interest for practical problems, such as forensic DNA analysis, and we present a computational algorithm based on the stochastic approximation of the expectation maximisation algorithm. As an interesting byproduct of the numerical analyses, we introduce an algorithm for bounded isotonic regression for which we also prove convergence.
</p>projecteuclid.org/euclid.aos/1513328588_20171215040315Fri, 15 Dec 2017 04:03 ESTOptimal sequential detection in multi-stream datahttps://projecteuclid.org/euclid.aos/1513328589<strong>Hock Peng Chan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 6, 2736--2763.</p><p><strong>Abstract:</strong><br/>
Consider a large number of detectors each generating a data stream. The task is to detect online, distribution changes in a small fraction of the data streams. Previous approaches to this problem include the use of mixture likelihood ratios and sum of CUSUMs. We provide here extensions and modifications of these approaches that are optimal in detecting normal mean shifts. We show how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity. There are three detection domains. In the first domain for moderately large fractions, immediate detection is possible. In the second domain for smaller fractions, the detection delay grows logarithmically with the number of detectors, with an asymptotic constant extending those in sparse normal mixture detection. In the third domain for even smaller fractions, the detection delay lies in the framework of the classical detection delay formula of Lorden. We show that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.
</p>projecteuclid.org/euclid.aos/1513328589_20171215040315Fri, 15 Dec 2017 04:03 ESTChernoff index for Cox test of separate parametric familieshttps://projecteuclid.org/euclid.aos/1519268422<strong>Xiaoou Li</strong>, <strong>Jingchen Liu</strong>, <strong>Zhiliang Ying</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 1--29.</p><p><strong>Abstract:</strong><br/>
The asymptotic efficiency of a generalized likelihood ratio test proposed by Cox is studied under the large deviations framework for error probabilities developed by Chernoff. In particular, two separate parametric families of hypotheses are considered [In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. (1961) 105–123; J. Roy. Statist. Soc. Ser. B 24 (1962) 406–424]. The significance level is set such that the maximal type I and type II error probabilities for the generalized likelihood ratio test decay exponentially fast with the same rate. We derive the analytic form of such a rate that is also known as the Chernoff index [ Ann. Math. Stat. 23 (1952) 493–507], a relative efficiency measure when there is no preference between the null and the alternative hypotheses. We further extend the analysis to approximate error probabilities when the two families are not completely separated. Discussions are provided concerning the implications of the present result on model selection.
</p>projecteuclid.org/euclid.aos/1519268422_20180221220038Wed, 21 Feb 2018 22:00 ESTOptimal bounds for aggregation of affine estimatorshttps://projecteuclid.org/euclid.aos/1519268423<strong>Pierre C. Bellec</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 30--59.</p><p><strong>Abstract:</strong><br/>
We study the problem of aggregation of estimators when the estimators are not independent of the data used for aggregation and no sample splitting is allowed. If the estimators are deterministic vectors, it is well known that the minimax rate of aggregation is of order $\log(M)$, where $M$ is the number of estimators to aggregate. It is proved that for affine estimators, the minimax rate of aggregation is unchanged: it is possible to handle the linear dependence between the affine estimators and the data used for aggregation at no extra cost. The minimax rate is not impacted either by the variance of the affine estimators, or any other measure of their statistical complexity. The minimax rate is attained with a penalized procedure over the convex hull of the estimators, for a penalty that is inspired from the $Q$-aggregation procedure. The results follow from the interplay between the penalty, strong convexity and concentration.
</p>projecteuclid.org/euclid.aos/1519268423_20180221220038Wed, 21 Feb 2018 22:00 ESTRate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statisticshttps://projecteuclid.org/euclid.aos/1519268424<strong>T. Tony Cai</strong>, <strong>Anru Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 60--89.</p><p><strong>Abstract:</strong><br/> Perturbation bounds for singular spaces, in particular Wedin’s $\mathop{\mathrm{sin}}\nolimits \Theta$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\mathop{\mathrm{sin}}\nolimits \Theta$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given. The new perturbation bounds are applicable to a wide range of problems. In this paper, we consider in detail applications to low-rank matrix denoising and singular space estimation, high-dimensional clustering and canonical correlation analysis (CCA). In particular, separate matching upper and lower bounds are obtained for estimating the left and right singular spaces. To the best of our knowledge, this is the first result that gives different optimal rates for the left and right singular spaces under the same perturbation. </p>projecteuclid.org/euclid.aos/1519268424_20180221220038Wed, 21 Feb 2018 22:00 ESTExact formulas for the normalizing constants of Wishart distributions for graphical modelshttps://projecteuclid.org/euclid.aos/1519268425<strong>Caroline Uhler</strong>, <strong>Alex Lenkoski</strong>, <strong>Donald Richards</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 90--118.</p><p><strong>Abstract:</strong><br/>
Gaussian graphical models have received considerable attention during the past four decades from the statistical and machine learning communities. In Bayesian treatments of this model, the $G$-Wishart distribution serves as the conjugate prior for inverse covariance matrices satisfying graphical constraints. While it is straightforward to posit the unnormalized densities, the normalizing constants of these distributions have been known only for graphs that are chordal, or decomposable. Up until now, it was unknown whether the normalizing constant for a general graph could be represented explicitly, and a considerable body of computational literature emerged that attempted to avoid this apparent intractability. We close this question by providing an explicit representation of the $G$-Wishart normalizing constant for general graphs.
</p>projecteuclid.org/euclid.aos/1519268425_20180221220038Wed, 21 Feb 2018 22:00 ESTConsistent parameter estimation for LASSO and approximate message passinghttps://projecteuclid.org/euclid.aos/1519268426<strong>Ali Mousavi</strong>, <strong>Arian Maleki</strong>, <strong>Richard G. Baraniuk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 119--148.</p><p><strong>Abstract:</strong><br/>
This paper studies the optimal tuning of the regularization parameter in LASSO or the threshold parameters in approximate message passing (AMP). Considering a model in which the design matrix and noise are zero-mean i.i.d. Gaussian, we propose a data-driven approach for estimating the regularization parameter of LASSO and the threshold parameters in AMP. Our estimates are consistent, that is, they converge to their asymptotically optimal values in probability as $n$, the number of observations, and $p$, the ambient dimension of the sparse vector, grow to infinity, while $n/p$ converges to a fixed number $\delta$. As a byproduct of our analysis, we will shed light on the asymptotic properties of the solution paths of LASSO and AMP.
</p>projecteuclid.org/euclid.aos/1519268426_20180221220038Wed, 21 Feb 2018 22:00 ESTOn semidefinite relaxations for the block modelhttps://projecteuclid.org/euclid.aos/1519268427<strong>Arash A. Amini</strong>, <strong>Elizaveta Levina</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 149--179.</p><p><strong>Abstract:</strong><br/>
The stochastic block model (SBM) is a popular tool for community detection in networks, but fitting it by maximum likelihood (MLE) involves a computationally infeasible optimization problem. We propose a new semidefinite programming (SDP) solution to the problem of fitting the SBM, derived as a relaxation of the MLE. We put ours and previously proposed SDPs in a unified framework, as relaxations of the MLE over various subclasses of the SBM, which also reveals a connection to the well-known problem of sparse PCA. Our main relaxation, which we call SDP-1, is tighter than other recently proposed SDP relaxations, and thus previously established theoretical guarantees carry over. However, we show that SDP-1 exactly recovers true communities over a wider class of SBMs than those covered by current results. In particular, the assumption of strong assortativity of the SBM, implicit in consistency conditions for previously proposed SDPs, can be relaxed to weak assortativity for our approach, thus significantly broadening the class of SBMs covered by the consistency results. We also show that strong assortativity is indeed a necessary condition for exact recovery for previously proposed SDP approaches and not an artifact of the proofs. Our analysis of SDPs is based on primal-dual witness constructions, which provides some insight into the nature of the solutions of various SDPs. In particular, we show how to combine features from SDP-1 and already available SDPs to achieve the most flexibility in terms of both assortativity and block-size constraints, as our relaxation has the tendency to produce communities of similar sizes. This tendency makes it the ideal tool for fitting network histograms, a method gaining popularity in the graphon estimation literature, as we illustrate on an example of a social networks of dolphins. We also provide empirical evidence that SDPs outperform spectral methods for fitting SBMs with a large number of blocks.
</p>projecteuclid.org/euclid.aos/1519268427_20180221220038Wed, 21 Feb 2018 22:00 ESTPathwise coordinate optimization for sparse learning: Algorithm and theoryhttps://projecteuclid.org/euclid.aos/1519268428<strong>Tuo Zhao</strong>, <strong>Han Liu</strong>, <strong>Tong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 180--218.</p><p><strong>Abstract:</strong><br/>
The pathwise coordinate optimization is one of the most important computational frameworks for high dimensional convex and nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: warm start initialization , active set updating and strong rule for coordinate preselection . Such a complex algorithmic structure grants superior empirical performance, but also poses significant challenge to theoretical analysis. To tackle this long lasting problem, we develop a new theory showing that these three features play pivotal roles in guaranteeing the outstanding statistical and computational performance of the pathwise coordinate optimization framework. Particularly, we analyze the existing pathwise coordinate optimization algorithms and provide new theoretical insights into them. The obtained insights further motivate the development of several modifications to improve the pathwise coordinate optimization framework, which guarantees linear convergence to a unique sparse local optimum with optimal statistical properties in parameter estimation and support recovery. This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions. Thorough numerical experiments are provided to support our theory.
</p>projecteuclid.org/euclid.aos/1519268428_20180221220038Wed, 21 Feb 2018 22:00 ESTConditional mean and quantile dependence testing in high dimensionhttps://projecteuclid.org/euclid.aos/1519268429<strong>Xianyang Zhang</strong>, <strong>Shun Yao</strong>, <strong>Xiaofeng Shao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 219--246.</p><p><strong>Abstract:</strong><br/>
Motivated by applications in biological science, we propose a novel test to assess the conditional mean dependence of a response variable on a large number of covariates. Our procedure is built on the martingale difference divergence recently proposed in Shao and Zhang [ J. Amer. Statist. Assoc. 109 (2014) 1302–1318], and it is able to detect certain type of departure from the null hypothesis of conditional mean independence without making any specific model assumptions. Theoretically, we establish the asymptotic normality of the proposed test statistic under suitable assumption on the eigenvalues of a Hermitian operator, which is constructed based on the characteristic function of the covariates. These conditions can be simplified under banded dependence structure on the covariates or Gaussian design. To account for heterogeneity within the data, we further develop a testing procedure for conditional quantile independence at a given quantile level and provide an asymptotic justification. Empirically, our test of conditional mean independence delivers comparable results to the competitor, which was constructed under the linear model framework, when the underlying model is linear. It significantly outperforms the competitor when the conditional mean admits a nonlinear form.
</p>projecteuclid.org/euclid.aos/1519268429_20180221220038Wed, 21 Feb 2018 22:00 ESTHigh-dimensional asymptotics of prediction: Ridge regression and classificationhttps://projecteuclid.org/euclid.aos/1519268430<strong>Edgar Dobriban</strong>, <strong>Stefan Wager</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 247--279.</p><p><strong>Abstract:</strong><br/>
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where $p,n\to\infty$ and $p/n\to\gamma>0$, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength and the aspect ratio $\gamma$. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover an exact inverse relation between the limiting predictive risk and the limiting estimation risk in high-dimensional linear models. The analysis builds on recent advances in random matrix theory.
</p>projecteuclid.org/euclid.aos/1519268430_20180221220038Wed, 21 Feb 2018 22:00 ESTTesting independence in high dimensions with sums of rank correlationshttps://projecteuclid.org/euclid.aos/1519268431<strong>Dennis Leung</strong>, <strong>Mathias Drton</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 280--307.</p><p><strong>Abstract:</strong><br/>
We treat the problem of testing independence between $m$ continuous variables when $m$ can be larger than the available sample size $n$. We consider three types of test statistics that are constructed as sums or sums of squares of pairwise rank correlations. In the asymptotic regime where both $m$ and $n$ tend to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendall’s tau and a dominating term of Spearman’s rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffding’s $D$, or the $\tau^{*}$ of Bergsma and Dassios [ Bernoulli 20 (2014) 1006–1028]. As in the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under a Gaussian equicorrelation alternative as well as in numerical experiments for specific cases of more general alternatives.
</p>projecteuclid.org/euclid.aos/1519268431_20180221220038Wed, 21 Feb 2018 22:00 ESTHigh dimensional censored quantile regressionhttps://projecteuclid.org/euclid.aos/1519268432<strong>Qi Zheng</strong>, <strong>Limin Peng</strong>, <strong>Xuming He</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 308--343.</p><p><strong>Abstract:</strong><br/>
Censored quantile regression (CQR) has emerged as a useful regression tool for survival analysis. Some commonly used CQR methods can be characterized by stochastic integral-based estimating equations in a sequential manner across quantile levels. In this paper, we analyze CQR in a high dimensional setting where the regression functions over a continuum of quantile levels are of interest. We propose a two-step penalization procedure, which accommodates stochastic integral based estimating equations and address the challenges due to the recursive nature of the procedure. We establish the uniform convergence rates for the proposed estimators, and investigate the properties on weak convergence and variable selection. We conduct numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposals.
</p>projecteuclid.org/euclid.aos/1519268432_20180221220038Wed, 21 Feb 2018 22:00 ESTLocal M-estimation with discontinuous criterion for dependent and limited observationshttps://projecteuclid.org/euclid.aos/1519268433<strong>Myung Hwan Seo</strong>, <strong>Taisuke Otsu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 344--369.</p><p><strong>Abstract:</strong><br/>
We examine the asymptotic properties of local M-estimators under three sets of high-level conditions. These conditions are sufficiently general to cover the minimum volume predictive region, the conditional maximum score estimator for a panel data discrete choice model and many other widely used estimators in statistics and econometrics. Specifically, they allow for discontinuous criterion functions of weakly dependent observations which may be localized by kernel smoothing and contain nuisance parameters with growing dimension. Furthermore, the localization can occur around parameter values rather than around a fixed point and the observations may take limited values which lead to set estimators. Our theory produces three different nonparametric cube root rates for local M-estimators and enables valid inference building on novel maximal inequalities for weakly dependent observations. The standard cube root asymptotics is included as a special case. The results are illustrated by various examples such as the Hough transform estimator with diminishing bandwidth, the maximum score-type set estimator and many others.
</p>projecteuclid.org/euclid.aos/1519268433_20180221220038Wed, 21 Feb 2018 22:00 ESTMixture inner product spaces and their application to functional data analysishttps://projecteuclid.org/euclid.aos/1519268434<strong>Zhenhua Lin</strong>, <strong>Hans-Georg Müller</strong>, <strong>Fang Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 370--400.</p><p><strong>Abstract:</strong><br/>
We introduce the concept of mixture inner product spaces associated with a given separable Hilbert space, which feature an infinite-dimensional mixture of finite-dimensional vector spaces and are dense in the underlying Hilbert space. Any Hilbert valued random element can be arbitrarily closely approximated by mixture inner product space valued random elements. While this concept can be applied to data in any infinite-dimensional Hilbert space, the case of functional data that are random elements in the $L^{2}$ space of square integrable functions is of special interest. For functional data, mixture inner product spaces provide a new perspective, where each realization of the underlying stochastic process falls into one of the component spaces and is represented by a finite number of basis functions, the number of which corresponds to the dimension of the component space. In the mixture representation of functional data, the number of included mixture components used to represent a given random element in $L^{2}$ is specifically adapted to each random trajectory and may be arbitrarily large. Key benefits of this novel approach are, first, that it provides a new perspective on the construction of a probability density in function space under mild regularity conditions, and second, that individual trajectories possess a trajectory-specific dimension that corresponds to a latent random variable, making it possible to use a larger number of components for less smooth and a smaller number for smoother trajectories. This enables flexible and parsimonious modeling of heterogeneous trajectory shapes. We establish estimation consistency of the functional mixture density and introduce an algorithm for fitting the functional mixture model based on a modified expectation-maximization algorithm. Simulations confirm that in comparison to traditional functional principal component analysis the proposed method achieves similar or better data recovery while using fewer components on average. Its practical merits are also demonstrated in an analysis of egg-laying trajectories for medflies.
</p>projecteuclid.org/euclid.aos/1519268434_20180221220038Wed, 21 Feb 2018 22:00 ESTBayesian estimation of sparse signals with a continuous spike-and-slab priorhttps://projecteuclid.org/euclid.aos/1519268435<strong>Veronika Ročková</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 401--437.</p><p><strong>Abstract:</strong><br/>
We introduce a new framework for estimation of sparse normal means, bridging the gap between popular frequentist strategies (LASSO) and popular Bayesian strategies (spike-and-slab). The main thrust of this paper is to introduce the family of Spike-and-Slab LASSO (SS-LASSO) priors, which form a continuum between the Laplace prior and the point-mass spike-and-slab prior. We establish several appealing frequentist properties of SS-LASSO priors, contrasting them with these two limiting cases. First, we adopt the penalized likelihood perspective on Bayesian modal estimation and introduce the framework of Bayesian penalty mixing with spike-and-slab priors. We show that the SS-LASSO global posterior mode is (near) minimax rate-optimal under squared error loss, similarly as the LASSO. Going further, we introduce an adaptive two-step estimator which can achieve provably sharper performance than the LASSO. Second, we show that the whole posterior keeps pace with the global mode and concentrates at the (near) minimax rate, a property that is known \textsl{not to hold} for the single Laplace prior. The minimax-rate optimality is obtained with a suitable class of independent product priors (for known levels of sparsity) as well as with dependent mixing priors (adapting to the unknown levels of sparsity). Up to now, the rate-optimal posterior concentration has been established only for spike-and-slab priors with a point mass at zero. Thus, the SS-LASSO priors, despite being continuous, possess similar optimality properties as the “theoretically ideal” point-mass mixtures. These results provide valuable theoretical justification for our proposed class of priors, underpinning their intuitive appeal and practical potential.
</p>projecteuclid.org/euclid.aos/1519268435_20180221220038Wed, 21 Feb 2018 22:00 ESTOn the asymptotic theory of new bootstrap confidence boundshttps://projecteuclid.org/euclid.aos/1519268436<strong>Charl Pretorius</strong>, <strong>Jan W. H. Swanepoel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 438--456.</p><p><strong>Abstract:</strong><br/>
We propose a new method, based on sample splitting, for constructing bootstrap confidence bounds for a parameter appearing in the regular smooth function model. It has been demonstrated in the literature, for example, by Hall [ Ann. Statist. 16 (1988) 927–985; The Bootstrap and Edgeworth Expansion (1992) Springer], that the well-known percentile-$t$ method for constructing bootstrap confidence bounds typically incurs a coverage error of order $O(n^{-1})$, with $n$ being the sample size. Our version of the percentile-$t$ bound reduces this coverage error to order $O(n^{-3/2})$ and in some cases to $O(n^{-2})$. Furthermore, whereas the standard percentile bounds typically incur coverage error of $O(n^{-1/2})$, the new bounds have reduced error of $O(n^{-1})$. In the case where the parameter of interest is the population mean, we derive for each confidence bound the exact coefficient of the leading term in an asymptotic expansion of the coverage error, although similar results may be obtained for other parameters such as the variance, the correlation coefficient, and the ratio of two means. We show that equal-tailed confidence intervals with coverage error at most $O(n^{-2})$ may be obtained from the newly proposed bounds, as opposed to the typical error $O(n^{-1})$ of the standard intervals. It is also shown that the good properties of the new percentile-$t$ method carry over to regression problems. Results of independent interest are derived, such as a generalisation of a delta method by Cramér [ Mathematical Methods of Statistics (1946) Princeton Univ. Press] and Hurt [ Apl. Mat. 21 (1976) 444–456], and an expression for a polynomial appearing in an Edgeworth expansion of the distribution of a Studentised statistic for the slope parameter in a regression model. A small simulation study illustrates the behavior of the confidence bounds for small to moderate sample sizes.
</p>projecteuclid.org/euclid.aos/1519268436_20180221220038Wed, 21 Feb 2018 22:00 EST