The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTGeometric inference for general high-dimensional linear inverse problemshttp://projecteuclid.org/euclid.aos/1467894707<strong>T. Tony Cai</strong>, <strong>Tengyuan Liang</strong>, <strong>Alexander Rakhlin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1536--1563.</p><p><strong>Abstract:</strong><br/>
This paper presents a unified geometric framework for the statistical analysis of a general ill-posed linear inverse model which includes as special cases noisy compressed sensing, sign vector recovery, trace regression, orthogonal matrix estimation and noisy matrix completion. We propose computationally feasible convex programs for statistical inference including estimation, confidence intervals and hypothesis testing. A theoretical framework is developed to characterize the local estimation rate of convergence and to provide statistical inference guarantees. Our results are built based on the local conic geometry and duality. The difficulty of statistical inference is captured by the geometric characterization of the local tangent cone through the Gaussian width and Sudakov estimate.
</p>projecteuclid.org/euclid.aos/1467894707_20160712135339Tue, 12 Jul 2016 13:53 EDTThe Tracy–Widom law for the largest eigenvalue of F type matriceshttp://projecteuclid.org/euclid.aos/1467894708<strong>Xiao Han</strong>, <strong>Guangming Pan</strong>, <strong>Bo Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1564--1592.</p><p><strong>Abstract:</strong><br/>
Let ${\mathbf{A}}_{p}=\frac{{\mathbf{Y}}{\mathbf{Y}}^{*}}{m}$ and ${\mathbf{B}}_{p}=\frac{{\mathbf{X}}{\mathbf{X}}^{*}}{n}$ be two independent random matrices where ${\mathbf{X}}=(X_{ij})_{p\times n}$ and ${\mathbf{Y}}=(Y_{ij})_{p\times m}$ respectively consist of real (or complex) independent random variables with $\mathbb{E}X_{ij}=\mathbb{E}Y_{ij}=0$, $\mathbb{E}|X_{ij}|^{2}=\mathbb{E}|Y_{ij}|^{2}=1$. Denote by $\lambda_{1}$ the largest root of the determinantal equation $\det(\lambda{\mathbf{A}}_{p}-{\mathbf{B}}_{p})=0$. We establish the Tracy–Widom type universality for $\lambda_{1}$ under some moment conditions on $X_{ij}$ and $Y_{ij}$ when $p/m$ and $p/n$ approach positive constants as $p\rightarrow\infty$.
</p>projecteuclid.org/euclid.aos/1467894708_20160712135339Tue, 12 Jul 2016 13:53 EDTSelf-normalized Cramér-type moderate deviations under dependencehttp://projecteuclid.org/euclid.aos/1467894709<strong>Xiaohong Chen</strong>, <strong>Qi-Man Shao</strong>, <strong>Wei Biao Wu</strong>, <strong>Lihu Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1593--1617.</p><p><strong>Abstract:</strong><br/>
We establish a Cramér-type moderate deviation result for self-normalized sums of weakly dependent random variables, where the moment requirement is much weaker than the non-self-normalized counterpart. The range of the moderate deviation is shown to depend on the moment condition and the degree of dependence of the underlying processes. We consider three types of self-normalization: the equal-block scheme, the big-block-small-block scheme and the interlacing scheme. Simulation study shows that the latter can have a better finite-sample performance. Our result is applied to multiple testing and construction of simultaneous confidence intervals for ultra-high dimensional time series mean vectors.
</p>projecteuclid.org/euclid.aos/1467894709_20160712135339Tue, 12 Jul 2016 13:53 EDTEstimation of semivarying coefficient time series models with ARMA errorshttp://projecteuclid.org/euclid.aos/1467894710<strong>Huang Lei</strong>, <strong>Yingcun Xia</strong>, <strong>Xu Qin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1618--1660.</p><p><strong>Abstract:</strong><br/>
Serial correlation in the residuals of time series models can cause bias in both model estimation and prediction. However, models with such serially correlated residuals are difficult to estimate, especially when the regression function is nonlinear. Existing estimation methods require strong assumption for the relation between the residuals and the regressors, which excludes the commonly used autoregressive models in time series analysis. By extending the Whittle likelihood estimation, this paper investigates in details a semi-parametric autoregressive model with ARMA sequence of residuals. Asymptotic normality of the estimators is established, and a model selection procedure is proposed. Numerical examples are employed to illustrate the performance of the proposed estimation method and the necessity of incorporating the serial correlation in the residuals.
</p>projecteuclid.org/euclid.aos/1467894710_20160712135339Tue, 12 Jul 2016 13:53 EDTDiscriminating quantum states: The multiple Chernoff distancehttp://projecteuclid.org/euclid.aos/1467894711<strong>Ke Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1661--1679.</p><p><strong>Abstract:</strong><br/>
We consider the problem of testing multiple quantum hypotheses $\{\rho_{1}^{\otimes n},\ldots,\rho_{r}^{\otimes n}\}$, where an arbitrary prior distribution is given and each of the $r$ hypotheses is $n$ copies of a quantum state. It is known that the minimal average error probability $P_{e}$ decays exponentially to zero, that is, $P_{e}=\exp\{-\xi n+o(n)\}$. However, this error exponent $\xi$ is generally unknown, except for the case that $r=2$.
In this paper, we solve the long-standing open problem of identifying the above error exponent, by proving Nussbaum and Szkoła’s conjecture that $\xi=\min_{i\neq j}C(\rho_{i},\rho_{j})$. The right-hand side of this equality is called the multiple quantum Chernoff distance, and $C(\rho_{i},\rho_{j}):=\max_{0\leq s\leq1}\{-\log\operatorname{Tr}\rho_{i}^{s}\rho_{j}^{1-s}\}$ has been previously identified as the optimal error exponent for testing two hypotheses, $\rho_{i}^{\otimes n}$ versus $\rho_{j}^{\otimes n}$.
The main ingredient of our proof is a new upper bound for the average error probability, for testing an ensemble of finite-dimensional, but otherwise general, quantum states. This upper bound, up to a states-dependent factor, matches the multiple-state generalization of Nussbaum and Szkoła’s lower bound. Specialized to the case $r=2$, we give an alternative proof to the achievability of the binary-hypothesis Chernoff distance, which was originally proved by Audenaert et al.
</p>projecteuclid.org/euclid.aos/1467894711_20160712135339Tue, 12 Jul 2016 13:53 EDTHigher order elicitability and Osband’s principlehttp://projecteuclid.org/euclid.aos/1467894712<strong>Tobias Fissler</strong>, <strong>Johanna F. Ziegel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1680--1707.</p><p><strong>Abstract:</strong><br/>
A statistical functional, such as the mean or the median, is called elicitable if there is a scoring function or loss function such that the correct forecast of the functional is the unique minimizer of the expected score. Such scoring functions are called strictly consistent for the functional. The elicitability of a functional opens the possibility to compare competing forecasts and to rank them in terms of their realized scores. In this paper, we explore the notion of elicitability for multi-dimensional functionals and give both necessary and sufficient conditions for strictly consistent scoring functions. We cover the case of functionals with elicitable components, but we also show that one-dimensional functionals that are not elicitable can be a component of a higher order elicitable functional. In the case of the variance, this is a known result. However, an important result of this paper is that spectral risk measures with a spectral measure with finite support are jointly elicitable if one adds the “correct” quantiles. A direct consequence of applied interest is that the pair (Value at Risk, Expected Shortfall) is jointly elicitable under mild conditions that are usually fulfilled in risk management applications.
</p>projecteuclid.org/euclid.aos/1467894712_20160712135339Tue, 12 Jul 2016 13:53 EDTOptimal estimation for the functional Cox modelhttp://projecteuclid.org/euclid.aos/1467894713<strong>Simeng Qu</strong>, <strong>Jane-Ling Wang</strong>, <strong>Xiao Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1708--1738.</p><p><strong>Abstract:</strong><br/>
Functional covariates are common in many medical, biodemographic and neuroimaging studies. The aim of this paper is to study functional Cox models with right-censored data in the presence of both functional and scalar covariates. We study the asymptotic properties of the maximum partial likelihood estimator and establish the asymptotic normality and efficiency of the estimator of the finite-dimensional estimator. Under the framework of reproducing kernel Hilbert space, the estimator of the coefficient function for a functional covariate achieves the minimax optimal rate of convergence under a weighted $L_{2}$-risk. This optimal rate is determined jointly by the censoring scheme, the reproducing kernel and the covariance kernel of the functional covariates. Implementation of the estimation approach and the selection of the smoothing parameter are discussed in detail. The finite sample performance is illustrated by simulated examples and a real application.
</p>projecteuclid.org/euclid.aos/1467894713_20160712135339Tue, 12 Jul 2016 13:53 EDTSolution of linear ill-posed problems using overcomplete dictionarieshttp://projecteuclid.org/euclid.aos/1467894714<strong>Marianna Pensky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1739--1764.</p><p><strong>Abstract:</strong><br/>
In the present paper, we consider the application of overcomplete dictionaries to the solution of general ill-posed linear inverse problems. In the context of regression problems, there has been an enormous amount of effort to recover an unknown function using an overcomplete dictionary. One of the most popular methods, Lasso and its variants, is based on maximizing the likelihood, and relies on stringent assumptions on the dictionary, the so-called compatibility conditions, for a proof of its convergence rates. While these conditions may be satisfied for the original dictionary functions, they usually do not hold for their images due to contraction properties imposed by the linear operator.
In what follows, we bypass this difficulty by a novel approach, which is based on inverting each of the dictionary functions and matching the resulting expansion to the true function, thus, avoiding unrealistic assumptions on the dictionary and using Lasso in a predictive setting. We examine both the white noise and the observational model formulations, and also discuss how exact inverse images of the dictionary functions can be replaced by their approximate counterparts. Furthermore, we show how the suggested methodology can be extended to the problem of estimation of a mixing density in a continuous mixture. For all the situations listed above, we provide sharp oracle inequalities for the risk in a non-asymptotic setting.
</p>projecteuclid.org/euclid.aos/1467894714_20160712135339Tue, 12 Jul 2016 13:53 EDTImpact of regularization on spectral clusteringhttp://projecteuclid.org/euclid.aos/1467894715<strong>Antony Joseph</strong>, <strong>Bin Yu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1765--1791.</p><p><strong>Abstract:</strong><br/>
The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et al. [ Ann. Statist. 41 (2013) 2097–2122]. Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter $\tau$ is large, we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than $\log n$) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a ‘bias-variance’-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigengap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique DKest (standing for estimated Davis–Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.
</p>projecteuclid.org/euclid.aos/1467894715_20160712135339Tue, 12 Jul 2016 13:53 EDTMarginalization and conditioning for LWF chain graphshttp://projecteuclid.org/euclid.aos/1467894716<strong>Kayvan Sadeghi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 4, 1792--1816.</p><p><strong>Abstract:</strong><br/>
In this paper, we deal with the problem of marginalization over and conditioning on two disjoint subsets of the node set of chain graphs (CGs) with the LWF Markov property. For this purpose, we define the class of chain mixed graphs (CMGs) with three types of edges and, for this class, provide a separation criterion under which the class of CMGs is stable under marginalization and conditioning and contains the class of LWF CGs as its subclass. We provide a method for generating such graphs after marginalization and conditioning for a given CMG or a given LWF CG. We then define and study the class of anterial graphs, which is also stable under marginalization and conditioning and contains LWF CGs, but has a simpler structure than CMGs.
</p>projecteuclid.org/euclid.aos/1467894716_20160712135339Tue, 12 Jul 2016 13:53 EDTEditorialhttp://projecteuclid.org/euclid.aos/1473685257<strong>Runze Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1817--1820.</p>projecteuclid.org/euclid.aos/1473685257_20160912090107Mon, 12 Sep 2016 09:01 EDTPeter Hall’s contributions to the bootstraphttp://projecteuclid.org/euclid.aos/1473685258<strong>Song Xi Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1821--1836.</p>projecteuclid.org/euclid.aos/1473685258_20160912090107Mon, 12 Sep 2016 09:01 EDTPeter Hall’s contributions to nonparametric function estimation and modelinghttp://projecteuclid.org/euclid.aos/1473685259<strong>Ming-Yen Cheng</strong>, <strong>Jianqing Fan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1837--1853.</p><p><strong>Abstract:</strong><br/>
Peter Hall made wide-ranging and far-reaching contributions to nonparametric modeling. He was one of the leading figures in the developments of nonparametric techniques with over 300 published papers in the field alone. This article gives a selective overview on the contributions of Peter Hall to nonparametric function estimation and modeling. The focuses are on density estimation, nonparametric regression, bandwidth selection, boundary corrections, inference under shape constraints, estimation of residual variances, analysis of wavelet estimators, multivariate regression and applications of nonparametric methods.
</p>projecteuclid.org/euclid.aos/1473685259_20160912090107Mon, 12 Sep 2016 09:01 EDTPeter Hall’s main contributions to deconvolutionhttp://projecteuclid.org/euclid.aos/1473685260<strong>Aurore Delaigle</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1854--1866.</p><p><strong>Abstract:</strong><br/>
Peter Hall died in Melbourne on January 9, 2016. He was an extremely prolific researcher and contributed to many different areas of statistics. In this paper, I talk about my experience with Peter and I summarise his main contributions to deconvolution, which include measurement error problems and problems in image analysis.
</p>projecteuclid.org/euclid.aos/1473685260_20160912090107Mon, 12 Sep 2016 09:01 EDTPeter Hall, functional data analysis and random objectshttp://projecteuclid.org/euclid.aos/1473685261<strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1867--1887.</p><p><strong>Abstract:</strong><br/>
Functional data analysis has become a major branch of nonparametric statistics and is a fast evolving field. Peter Hall has made fundamental contributions to this area and its theoretical underpinnings. He wrote more than 25 papers in functional data analysis between 1998 and 2016 and from 2005 on was a tenured faculty member with a 25% appointment in the Department of Statistics at the University of California, Davis. This article describes aspects of his appointment and academic life in Davis and also some of his major results in functional data analysis, along with a brief history of this area. It concludes with an outlook on new types of functional data and an emerging field of “random objects” that subsumes functional data analysis as it deals with more complex data structures.
</p>projecteuclid.org/euclid.aos/1473685261_20160912090107Mon, 12 Sep 2016 09:01 EDTPeter Hall’s work on high-dimensional data and classificationhttp://projecteuclid.org/euclid.aos/1473685262<strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1888--1895.</p><p><strong>Abstract:</strong><br/>
In this article, I summarise Peter Hall’s contributions to high-dimensional data, including their geometric representations and variable selection methods based on ranking. I also discuss his work on classification problems, concluding with some personal reflections on my own interactions with him. This article complements [ Ann. Statist. 44 (2016) 1821–1836; Ann. Statist. 44 (2016) 1837–1853; Ann. Statist. 44 (2016) 1854–1866 and Ann. Statist. 44 (2016) 1867–1887], which focus on other aspects of Peter’s research.
</p>projecteuclid.org/euclid.aos/1473685262_20160912090107Mon, 12 Sep 2016 09:01 EDTStatistical and computational trade-offs in estimation of sparse principal componentshttp://projecteuclid.org/euclid.aos/1473685263<strong>Tengyao Wang</strong>, <strong>Quentin Berthet</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1896--1930.</p><p><strong>Abstract:</strong><br/>
In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.
</p>projecteuclid.org/euclid.aos/1473685263_20160912090107Mon, 12 Sep 2016 09:01 EDTCramér-type moderate deviations for Studentized two-sample $U$-statistics with applicationshttp://projecteuclid.org/euclid.aos/1473685264<strong>Jinyuan Chang</strong>, <strong>Qi-Man Shao</strong>, <strong>Wen-Xin Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1931--1956.</p><p><strong>Abstract:</strong><br/>
Two-sample $U$-statistics are widely used in a broad range of applications, including those in the fields of biostatistics and econometrics. In this paper, we establish sharp Cramér-type moderate deviation theorems for Studentized two-sample $U$-statistics in a general framework, including the two-sample $t$-statistic and Studentized Mann–Whitney test statistic as prototypical examples. In particular, a refined moderate deviation theorem with second-order accuracy is established for the two-sample $t$-statistic. These results extend the applicability of the existing statistical methodologies from the one-sample $t$-statistic to more general nonlinear statistics. Applications to two-sample large-scale multiple testing problems with false discovery rate control and the regularized bootstrap method are also discussed.
</p>projecteuclid.org/euclid.aos/1473685264_20160912090107Mon, 12 Sep 2016 09:01 EDTEstimation in nonlinear regression with Harris recurrent Markov chainshttp://projecteuclid.org/euclid.aos/1473685265<strong>Degui Li</strong>, <strong>Dag Tjøstheim</strong>, <strong>Jiti Gao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1957--1987.</p><p><strong>Abstract:</strong><br/>
In this paper, we study parametric nonlinear regression under the Harris recurrent Markov chain framework. We first consider the nonlinear least squares estimators of the parameters in the homoskedastic case, and establish asymptotic theory for the proposed estimators. Our results show that the convergence rates for the estimators rely not only on the properties of the nonlinear regression function, but also on the number of regenerations for the Harris recurrent Markov chain. Furthermore, we discuss the estimation of the parameter vector in a conditional volatility function, and apply our results to the nonlinear regression with $I(1)$ processes and derive an asymptotic distribution theory which is comparable to that obtained by Park and Phillips [ Econometrica 69 (2001) 117–161]. Some numerical studies including simulation and empirical application are provided to examine the finite sample performance of the proposed approaches and results.
</p>projecteuclid.org/euclid.aos/1473685265_20160912090107Mon, 12 Sep 2016 09:01 EDTEfficient estimation in semivarying coefficient models for longitudinal/clustered datahttp://projecteuclid.org/euclid.aos/1473685266<strong>Ming-Yen Cheng</strong>, <strong>Toshio Honda</strong>, <strong>Jialiang Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 1988--2017.</p><p><strong>Abstract:</strong><br/>
In semivarying coefficient modeling of longitudinal/clustered data, of primary interest is usually the parametric component which involves unknown constant coefficients. First, we study semiparametric efficiency bound for estimation of the constant coefficients in a general setup. It can be achieved by spline regression using the true within-subject covariance matrices, which are often unavailable in reality. Thus, we propose an estimator when the covariance matrices are unknown and depend only on the index variable. First, we estimate the covariance matrices using residuals obtained from a preliminary estimation based on working independence and both spline and local linear regression. Then, using the covariance matrix estimates, we employ spline regression again to obtain our final estimator. It achieves the semiparametric efficiency bound under normality assumption and has the smallest asymptotic covariance matrix among a class of estimators even when normality is violated. Our theoretical results hold either when the number of within-subject observations diverges or when it is uniformly bounded. In addition, using the local linear estimator of the nonparametric component is superior to using the spline estimator in terms of numerical performance. The proposed method is compared with the working independence estimator and some existing method via simulations and application to a real data example.
</p>projecteuclid.org/euclid.aos/1473685266_20160912090107Mon, 12 Sep 2016 09:01 EDTPartial correlation screening for estimating large precision matrices, with applications to classificationhttp://projecteuclid.org/euclid.aos/1473685267<strong>Shiqiong Huang</strong>, <strong>Jiashun Jin</strong>, <strong>Zhigang Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2018--2057.</p><p><strong>Abstract:</strong><br/>
Given $n$ samples $X_{1},X_{2},\ldots,X_{n}$ from $N(0,\Sigma)$, we are interested in estimating the $p\times p$ precision matrix $\Omega=\Sigma^{-1}$; we assume $\Omega$ is sparse in that each row has relatively few nonzeros.
We propose Partial Correlation Screening (PCS) as a new row-by-row approach. To estimate the $i$th row of $\Omega$, $1\leq i\leq p$, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index $j$ that has the largest empirical partial correlation (in magnitude) with $i$, given the set of indices recruited so far. In the Clean step, PCS reinvestigates all recruited indices, removes false positives and uses the resultant set of indices to reconstruct the $i$th row.
PCS is computationally efficient and modest in memory use: to estimate a row of $\Omega$, it only needs a few rows (determined sequentially) of the empirical covariance matrix. PCS is able to execute an estimation of a large $\Omega$ (e.g., $p=10K$) in a few minutes.
Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential, we need a good estimate of $\Omega$. Note that given an estimate of $\Omega$, we can always combine it with HCT to build a classifier (e.g., HCT-PCS, HCT-glasso).
We have applied HCT-PCS to two microarray data sets ($p=8K$ and $10K$) for classification, where it not only significantly outperforms HCT-glasso, but also is competitive to the Support Vector Machine (SVM) and Random Forest (RF). These suggest that PCS gives more useful estimates of $\Omega$ than the glasso; we study this carefully and have gained some interesting insight.
We show that in a broad context, PCS fully recovers the support of $\Omega$ and HCT-PCS is optimal in classification. Our theoretical study sheds interesting light on the behavior of stage-wise procedures.
</p>projecteuclid.org/euclid.aos/1473685267_20160912090107Mon, 12 Sep 2016 09:01 EDTLocal intrinsic stationarity and its inferencehttp://projecteuclid.org/euclid.aos/1473685268<strong>Tailen Hsing</strong>, <strong>Thomas Brown</strong>, <strong>Brian Thelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2058--2088.</p><p><strong>Abstract:</strong><br/>
Dense spatial data are commonplace nowadays, and they provide the impetus for addressing nonstationarity in a general way. This paper extends the notion of intrinsic random function by allowing the stationary component of the covariance to vary with spatial location. A nonparametric estimation procedure based on gridded data is introduced for the case where the covariance function is regularly varying at any location. An asymptotic theory is developed for the procedure on a fixed domain by letting the grid size tend to zero.
</p>projecteuclid.org/euclid.aos/1473685268_20160912090107Mon, 12 Sep 2016 09:01 EDTAn unexpected encounter with Cauchy and Lévyhttp://projecteuclid.org/euclid.aos/1473685269<strong>Natesh S. Pillai</strong>, <strong>Xiao-Li Meng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2089--2097.</p><p><strong>Abstract:</strong><br/>
The Cauchy distribution is usually presented as a mathematical curiosity, an exception to the Law of Large Numbers, or even as an “Evil” distribution in some introductory courses. It therefore surprised us when Drton and Xiao [ Bernoulli 22 (2016) 38–59] proved the following result for $m=2$ and conjectured it for $m\ge3$. Let $X=(X_{1},\ldots,X_{m})$ and $Y=(Y_{1},\ldots,Y_{m})$ be i.i.d. $\mathrm{N}(0,\Sigma)$, where $\Sigma=\{\sigma_{ij}\}\ge0$ is an $m\times m$ and arbitrary covariance matrix with $\sigma_{jj}>0$ for all $1\leq j\leq m$. Then
\[Z=\sum_{j=1}^{m}w_{j}{\frac{X_{j}}{Y_{j}}}\sim\operatorname{Cauchy}(0,1),\] as long as $\vec{w}=(w_{1},\ldots,w_{m})$ is independent of $(X,Y)$, $w_{j}\ge0,j=1,\ldots,m$, and $\sum_{j=1}^{m}w_{j}=1$. In this note, we present an elementary proof of this conjecture for any $m\geq2$ by linking $Z$ to a geometric characterization of $\operatorname{Cauchy}(0,1)$ given in Willams [ Ann. Math. Stat. 40 (1969) 1083–1085]. This general result is essential to the large sample behavior of Wald tests in many applications such as factor models and contingency tables. It also leads to other unexpected results such as
\[\sum_{i=1}^{m}\sum_{j=1}^{m}\frac{w_{i}w_{j}\sigma_{ij}}{X_{i}X_{j}}\sim \mbox{Lévy}(0,1).\] This generalizes the “super Cauchy phenomenon” that the average of $m$ i.i.d. standard Lévy variables (i.e., inverse chi-squared variables with one degree of freedom) has the same distribution as that of a single standard Lévy variable multiplied by $m$ (which is obtained by taking $w_{j}=1/m$ and $\Sigma$ to be the identity matrix).
</p>projecteuclid.org/euclid.aos/1473685269_20160912090107Mon, 12 Sep 2016 09:01 EDTInnovated scalable efficient estimation in ultra-large Gaussian graphical modelshttp://projecteuclid.org/euclid.aos/1473685270<strong>Yingying Fan</strong>, <strong>Jinchi Lv</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2098--2126.</p><p><strong>Abstract:</strong><br/>
Large-scale precision matrix estimation is of fundamental importance yet challenging in many contemporary applications for recovering Gaussian graphical models. In this paper, we suggest a new approach of innovated scalable efficient estimation (ISEE) for estimating large precision matrix. Motivated by the innovated transformation, we convert the original problem into that of large covariance matrix estimation. The suggested method combines the strengths of recent advances in high-dimensional sparse modeling and large covariance matrix estimation. Compared to existing approaches, our method is scalable and can deal with much larger precision matrices with simple tuning. Under mild regularity conditions, we establish that this procedure can recover the underlying graphical structure with significant probability and provide efficient estimation of link strengths. Both computational and theoretical advantages of the procedure are evidenced through simulation and real data examples.
</p>projecteuclid.org/euclid.aos/1473685270_20160912090107Mon, 12 Sep 2016 09:01 EDTOn high-dimensional misspecified mixed model analysis in genome-wide association studyhttp://projecteuclid.org/euclid.aos/1473685271<strong>Jiming Jiang</strong>, <strong>Cong Li</strong>, <strong>Debashis Paul</strong>, <strong>Can Yang</strong>, <strong>Hongyu Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2127--2160.</p><p><strong>Abstract:</strong><br/>
We study behavior of the restricted maximum likelihood (REML) estimator under a misspecified linear mixed model (LMM) that has received much attention in recent genome-wide association studies. The asymptotic analysis establishes consistency of the REML estimator of the variance of the errors in the LMM, and convergence in probability of the REML estimator of the variance of the random effects in the LMM to a certain limit, which is equal to the true variance of the random effects multiplied by the limiting proportion of the nonzero random effects present in the LMM. The asymptotic results also establish convergence rate (in probability) of the REML estimators as well as a result regarding convergence of the asymptotic conditional variance of the REML estimator. The asymptotic results are fully supported by the results of empirical studies, which include extensive simulation studies that compare the performance of the REML estimator (under the misspecified LMM) with other existing methods, and real data applications (only one example is presented) that have important genetic implications.
</p>projecteuclid.org/euclid.aos/1473685271_20160912090107Mon, 12 Sep 2016 09:01 EDTAsymptotic theory for the first projective directionhttp://projecteuclid.org/euclid.aos/1473685272<strong>Michael G. Akritas</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2161--2189.</p><p><strong>Abstract:</strong><br/>
For a response variable $Y$, and a $d$ dimensional vector of covariates $\mathbf{X}$, the first projective direction, $\mathbf{\vartheta}$, is defined as the direction that accounts for the most variability in $Y$. The asymptotic distribution of an estimator of a trimmed version of $\mathbf{\vartheta}$ has been characterized only under the assumption of the single index model (SIM). This paper proposes the use of a flexible trimming function in the objective function, which results in the consistent estimation of $\mathbf{\vartheta}$. It also derives the asymptotic normality of the proposed estimator, and characterizes the components of the asymptotic variance which vanish when the SIM holds.
</p>projecteuclid.org/euclid.aos/1473685272_20160912090107Mon, 12 Sep 2016 09:01 EDTNonparametric covariate-adjusted regressionhttp://projecteuclid.org/euclid.aos/1473685273<strong>Aurore Delaigle</strong>, <strong>Peter Hall</strong>, <strong>Wen-Xin Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2190--2220.</p><p><strong>Abstract:</strong><br/>
We consider nonparametric estimation of a regression curve when the data are observed with multiplicative distortion which depends on an observed confounding variable. We suggest several estimators, ranging from a relatively simple one that relies on restrictive assumptions usually made in the literature, to a sophisticated piecewise approach that involves reconstructing a smooth curve from an estimator of a constant multiple of its absolute value, and which can be applied in much more general scenarios. We show that, although our nonparametric estimators are constructed from predictors of the unobserved undistorted data, they have the same first-order asymptotic properties as the standard estimators that could be computed if the undistorted data were available. We illustrate the good numerical performance of our methods on both simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1473685273_20160912090107Mon, 12 Sep 2016 09:01 EDTOptimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flowhttp://projecteuclid.org/euclid.aos/1473685274<strong>T. Tony Cai</strong>, <strong>Xiaodong Li</strong>, <strong>Zongming Ma</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2221--2251.</p><p><strong>Abstract:</strong><br/>
This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal $\mathbf{x}\in\mathbb{R}^{p}$ from noisy quadratic measurements $y_{j}=(\mathbf{a}_{j}'\mathbf{x})^{2}+\varepsilon_{j}$, $j=1,\ldots,m$, with independent sub-exponential noise $\varepsilon_{j}$. The goals are to understand the effect of the sparsity of $\mathbf{x}$ on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates adaptively. Inspired by the Wirtinger Flow [ IEEE Trans. Inform. Theory 61 (2015) 1985–2007] proposed for non-sparse and noiseless phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the $\mathbf{a}_{j}$’s are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of $\mathbf{x}$.
</p>projecteuclid.org/euclid.aos/1473685274_20160912090107Mon, 12 Sep 2016 09:01 EDTMinimax rates of community detection in stochastic block modelshttp://projecteuclid.org/euclid.aos/1473685275<strong>Anderson Y. Zhang</strong>, <strong>Harrison H. Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2252--2280.</p><p><strong>Abstract:</strong><br/>
Recently, network analysis has gained more and more attention in statistics, as well as in computer science, probability and applied mathematics. Community detection for the stochastic block model (SBM) is probably the most studied topic in network analysis. Many methodologies have been proposed. Some beautiful and significant phase transition results are obtained in various settings. In this paper, we provide a general minimax theory for community detection. It gives minimax rates of the mis-match ratio for a wide rage of settings including homogeneous and inhomogeneous SBMs, dense and sparse networks, finite and growing number of communities. The minimax rates are exponential, different from polynomial rates we often see in statistical literature. An immediate consequence of the result is to establish threshold phenomenon for strong consistency (exact recovery) as well as weak consistency (partial recovery). We obtain the upper bound by a range of penalized likelihood-type approaches. The lower bound is achieved by a novel reduction from a global mis-match ratio to a local clustering problem for one node through an exchangeability property.
</p>projecteuclid.org/euclid.aos/1473685275_20160912090107Mon, 12 Sep 2016 09:01 EDTFrom sparse to dense functional data and beyondhttp://projecteuclid.org/euclid.aos/1473685276<strong>Xiaoke Zhang</strong>, <strong>Jane-Ling Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 5, 2281--2321.</p><p><strong>Abstract:</strong><br/>
Nonparametric estimation of mean and covariance functions is important in functional data analysis. We investigate the performance of local linear smoothers for both mean and covariance functions with a general weighing scheme, which includes two commonly used schemes, equal weight per observation (OBS), and equal weight per subject (SUBJ), as two special cases. We provide a comprehensive analysis of their asymptotic properties on a unified platform for all types of sampling plan, be it dense, sparse or neither. Three types of asymptotic properties are investigated in this paper: asymptotic normality, $L^{2}$ convergence and uniform convergence. The asymptotic theories are unified on two aspects: (1) the weighing scheme is very general; (2) the magnitude of the number $N_{i}$ of measurements for the $i$th subject relative to the sample size $n$ can vary freely. Based on the relative order of $N_{i}$ to $n$, functional data are partitioned into three types: non-dense, dense and ultra-dense functional data for the OBS and SUBJ schemes. These two weighing schemes are compared both theoretically and numerically. We also propose a new class of weighing schemes in terms of a mixture of the OBS and SUBJ weights, of which theoretical and numerical performances are examined and compared.
</p>projecteuclid.org/euclid.aos/1473685276_20160912090107Mon, 12 Sep 2016 09:01 EDTInfluential features PCA for high dimensional clusteringhttp://projecteuclid.org/euclid.aos/1479891617<strong>Jiashun Jin</strong>, <strong>Wanjie Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2323--2359.</p><p><strong>Abstract:</strong><br/>
We consider a clustering problem where we observe feature vectors $X_{i}\in R^{p}$, $i=1,2,\ldots,n$, from $K$ possible classes. The class labels are unknown and the main interest is to estimate them. We are primarily interested in the modern regime of $p\gg n$, where classical clustering methods face challenges.
We propose Influential Features PCA (IF-PCA) as a new clustering procedure. In IF-PCA, we select a small fraction of features with the largest Kolmogorov–Smirnov (KS) scores, obtain the first $(K-1)$ left singular vectors of the post-selection normalized data matrix, and then estimate the labels by applying the classical $k$-means procedure to these singular vectors. In this procedure, the only tuning parameter is the threshold in the feature selection step. We set the threshold in a data-driven fashion by adapting the recent notion of Higher Criticism. As a result, IF-PCA is a tuning-free clustering method.
We apply IF-PCA to $10$ gene microarray data sets. The method has competitive performance in clustering. Especially, in three of the data sets, the error rates of IF-PCA are only $29\%$ or less of the error rates by other methods. We have also rediscovered a phenomenon on empirical null by Efron [ J. Amer. Statist. Assoc. 99 (2004) 96–104] on microarray data.
With delicate analysis, especially post-selection eigen-analysis, we derive tight probability bounds on the Kolmogorov–Smirnov statistics and show that IF-PCA yields clustering consistency in a broad context. The clustering problem is connected to the problems of sparse PCA and low-rank matrix recovery, but it is different in important ways. We reveal an interesting phase transition phenomenon associated with these problems and identify the range of interest for each.
</p>projecteuclid.org/euclid.aos/1479891617_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891618<strong>Ery Arias-Castro</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2360--2365.</p>projecteuclid.org/euclid.aos/1479891618_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891619<strong>Boaz Nadler</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2366--2371.</p>projecteuclid.org/euclid.aos/1479891619_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential feature PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891620<strong>T. Tony Cai</strong>, <strong>Linjun Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2372--2381.</p>projecteuclid.org/euclid.aos/1479891620_20161123040048Wed, 23 Nov 2016 04:00 ESTDiscussion of “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891621<strong>Natalia A. Stepanova</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2382--2386.</p>projecteuclid.org/euclid.aos/1479891621_20161123040048Wed, 23 Nov 2016 04:00 ESTRejoinder: “Influential features PCA for high dimensional clustering”http://projecteuclid.org/euclid.aos/1479891622<strong>Jiashun Jin</strong>, <strong>Wanjie Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2387--2400.</p>projecteuclid.org/euclid.aos/1479891622_20161123040048Wed, 23 Nov 2016 04:00 ESTNonparametric estimation of dynamics of monotone trajectorieshttp://projecteuclid.org/euclid.aos/1479891623<strong>Debashis Paul</strong>, <strong>Jie Peng</strong>, <strong>Prabir Burman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2401--2432.</p><p><strong>Abstract:</strong><br/>
We study a class of nonlinear nonparametric inverse problems. Specifically, we propose a nonparametric estimator of the dynamics of a monotonically increasing trajectory defined on a finite time interval. Under suitable regularity conditions, we show that in terms of $L^{2}$-loss, the optimal rate of convergence for the proposed estimator is the same as that for the estimation of the derivative of a function. We conduct simulation studies to examine the finite sample behavior of the proposed estimator and apply it to the Berkeley growth data.
</p>projecteuclid.org/euclid.aos/1479891623_20161123040048Wed, 23 Nov 2016 04:00 ESTCausal inference with a graphical hierarchy of interventionshttp://projecteuclid.org/euclid.aos/1479891624<strong>Ilya Shpitser</strong>, <strong>Eric Tchetgen Tchetgen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2433--2466.</p><p><strong>Abstract:</strong><br/>
Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another.
Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures.
In this paper, we give a unifying view of a large class of causal effects of interest, including novel effects not previously considered, in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula.
Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl’s front-door criterion.
</p>projecteuclid.org/euclid.aos/1479891624_20161123040048Wed, 23 Nov 2016 04:00 ESTConsistent model selection criteria for quadratically supported riskshttp://projecteuclid.org/euclid.aos/1479891625<strong>Yongdai Kim</strong>, <strong>Jong-June Jeon</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2467--2496.</p><p><strong>Abstract:</strong><br/>
In this paper, we study asymptotic properties of model selection criteria for high-dimensional regression models where the number of covariates is much larger than the sample size. In particular, we consider a class of loss functions called the class of quadratically supported risks which is large enough to include the quadratic loss, Huber loss, quantile loss and logistic loss. We provide sufficient conditions for the model selection criteria, which are applicable to the class of quadratically supported risks. Our results extend most previous sufficient conditions for model selection consistency. In addition, sufficient conditions for pathconsistency of the Lasso and nonconvex penalized estimators are presented. Here, pathconsistency means that the probability of the solution path that includes the true model converges to 1. Pathconsistency makes it practically feasible to apply consistent model selection criteria to high-dimensional data. The data-adaptive model selection procedure is proposed which is selection consistent and performs well for finite samples. Results of simulation studies as well as real data analysis are presented to compare the finite sample performances of the proposed data-adaptive model selection criterion with other competitors.
</p>projecteuclid.org/euclid.aos/1479891625_20161123040048Wed, 23 Nov 2016 04:00 ESTOn the computational complexity of high-dimensional Bayesian variable selectionhttp://projecteuclid.org/euclid.aos/1479891626<strong>Yun Yang</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2497--2532.</p><p><strong>Abstract:</strong><br/>
We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis–Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.
</p>projecteuclid.org/euclid.aos/1479891626_20161123040048Wed, 23 Nov 2016 04:00 ESTFamily-Wise Separation Rates for multiple testinghttp://projecteuclid.org/euclid.aos/1479891627<strong>Magalie Fromont</strong>, <strong>Matthieu Lerasle</strong>, <strong>Patricia Reynaud-Bouret</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2533--2563.</p><p><strong>Abstract:</strong><br/>
Starting from a parallel between some minimax adaptive tests of a single null hypothesis, based on aggregation approaches, and some tests of multiple hypotheses, we propose a new second kind error-related evaluation criterion, as the core of an emergent minimax theory for multiple tests. Aggregation-based tests, proposed for instance by Baraud [ Bernoulli 8 (2002) 577–606], Baraud, Huet and Laurent [ Ann. Statist. 31 (2003) 225–251] or Fromont and Laurent [ Ann. Statist. 34 (2006) 680–720], are justified through their first kind error rate, which is controlled by the prescribed level on the one hand, and through their separation rates over various classes of alternatives, rates which are minimax on the other hand. We show that some of these tests can be viewed as the first steps of classical step-down multiple testing procedures, and accordingly be evaluated from the multiple testing point of view also, through a control of their Family-Wise Error Rate (FWER). Conversely, many multiple testing procedures, from the historical ones of Bonferroni and Holm, to more recent ones like min-$p$ procedures or randomized procedures such as the ones proposed by Romano and Wolf [ J. Amer. Statist. Assoc. 100 (2005) 94–108], can be investigated from the minimax adaptive testing point of view. To this end, we extend the notion of separation rate to the multiple testing field, by defining the weak Family-Wise Separation Rate and its stronger counterpart, the Family-Wise Separation Rate (FWSR). As for nonparametric tests of a single null hypothesis, we prove that these new concepts allow an accurate analysis of the second kind error of a multiple testing procedure, leading to clear definitions of minimax and minimax adaptive multiple tests. Some illustrations in classical Gaussian frameworks corroborate several expected results under particular conditions on the tested hypotheses, but also lead to new questions and perspectives.
</p>projecteuclid.org/euclid.aos/1479891627_20161123040048Wed, 23 Nov 2016 04:00 ESTMinimax optimal rates of estimation in high dimensional additive modelshttp://projecteuclid.org/euclid.aos/1479891628<strong>Ming Yuan</strong>, <strong>Ding-Xuan Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2564--2593.</p><p><strong>Abstract:</strong><br/>
We establish minimax optimal rates of convergence for estimation in a high dimensional additive model assuming that it is approximately sparse. Our results reveal a behavior universal to this class of high dimensional problems. In the sparse regime when the components are sufficiently smooth or the dimensionality is sufficiently large, the optimal rates are identical to those for high dimensional linear regression and, therefore, there is no additional cost to entertain a nonparametric model. Otherwise, in the so-called smooth regime , the rates coincide with the optimal rates for estimating a univariate function and, therefore, they are immune to the “curse of dimensionality.”
</p>projecteuclid.org/euclid.aos/1479891628_20161123040048Wed, 23 Nov 2016 04:00 ESTOn marginal sliced inverse regression for ultrahigh dimensional model-free feature selectionhttp://projecteuclid.org/euclid.aos/1479891629<strong>Zhou Yu</strong>, <strong>Yuexiao Dong</strong>, <strong>Jun Shao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2594--2623.</p><p><strong>Abstract:</strong><br/>
Model-free variable selection has been implemented under the sufficient dimension reduction framework since the seminal paper of Cook [ Ann. Statist. 32 (2004) 1062–1092]. In this paper, we extend the marginal coordinate test for sliced inverse regression (SIR) in Cook (2004) and propose a novel marginal SIR utility for the purpose of ultrahigh dimensional feature selection. Two distinct procedures, Dantzig selector and sparse precision matrix estimation, are incorporated to get two versions of sample level marginal SIR utilities. Both procedures lead to model-free variable selection consistency with predictor dimensionality $p$ diverging at an exponential rate of the sample size $n$. As a special case of marginal SIR, we ignore the correlation among the predictors and propose marginal independence SIR. Marginal independence SIR is closely related to many existing independence screening procedures in the literature, and achieves model-free screening consistency in the ultrahigh dimensional setting. The finite sample performances of the proposed procedures are studied through synthetic examples and an application to the small round blue cell tumors data.
</p>projecteuclid.org/euclid.aos/1479891629_20161123040048Wed, 23 Nov 2016 04:00 ESTFaithful variable screening for high-dimensional convex regressionhttp://projecteuclid.org/euclid.aos/1479891630<strong>Min Xu</strong>, <strong>Minhua Chen</strong>, <strong>John Lafferty</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2624--2660.</p><p><strong>Abstract:</strong><br/>
We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensional concave regression fits on the residuals. In contrast to previous methods for sparse additive models, the optimization is finite dimensional and requires no tuning parameters for smoothness. Under appropriate assumptions, we prove that the procedure is faithful in the population setting, yielding no false negatives. We give a finite sample statistical analysis, and introduce algorithms for efficiently carrying out the required quadratic programs. The approach leads to computational and statistical advantages over fitting a full model, and provides an effective, practical approach to variable screening in convex regression.
</p>projecteuclid.org/euclid.aos/1479891630_20161123040048Wed, 23 Nov 2016 04:00 ESTHigh-dimensional generalizations of asymmetric least squares regression and their applicationshttp://projecteuclid.org/euclid.aos/1479891631<strong>Yuwen Gu</strong>, <strong>Hui Zou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2661--2694.</p><p><strong>Abstract:</strong><br/>
Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high dimensions where the penalty functions include the Lasso and nonconvex penalties. We develop a unified efficient algorithm for fitting SALES and establish its theoretical properties. As an important application, SALES is used to detect heteroscedasticity in high-dimensional data. Another method for detecting heteroscedasticity is the sparse quantile regression. However, both SALES and the sparse quantile regression may fail to tell which variables are important for the conditional mean and which variables are important for the conditional scale/variance, especially when there are variables that are important for both the mean and the scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression which can be efficiently solved by an algorithm similar to that for solving SALES. We establish theoretical properties of COSALES. In particular, COSALES using the SCAD penalty or MCP is shown to consistently identify the two important subsets for the mean and scale simultaneously, even when the two subsets overlap. We demonstrate the empirical performance of SALES and COSALES by simulated and real data.
</p>projecteuclid.org/euclid.aos/1479891631_20161123040048Wed, 23 Nov 2016 04:00 ESTSub-Gaussian mean estimatorshttp://projecteuclid.org/euclid.aos/1479891632<strong>Luc Devroye</strong>, <strong>Matthieu Lerasle</strong>, <strong>Gabor Lugosi</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2695--2725.</p><p><strong>Abstract:</strong><br/>
We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a nonasymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
</p>projecteuclid.org/euclid.aos/1479891632_20161123040048Wed, 23 Nov 2016 04:00 ESTConvergence rates of parameter estimation for some weakly identifiable finite mixtureshttp://projecteuclid.org/euclid.aos/1479891633<strong>Nhat Ho</strong>, <strong>XuanLong Nguyen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2726--2755.</p><p><strong>Abstract:</strong><br/>
We establish minimax lower bounds and maximum likelihood convergence rates of parameter estimation for mean-covariance multivariate Gaussian mixtures, shape-rate Gamma mixtures and some variants of finite mixture models, including the setting where the number of mixing components is bounded but unknown. These models belong to what we call “weakly identifiable” classes, which exhibit specific interactions among mixing parameters driven by the algebraic structures of the class of kernel densities and their partial derivatives. Accordingly, both the minimax bounds and the maximum likelihood parameter estimation rates in these models, obtained under some compactness conditions on the parameter space, are shown to be typically much slower than the usual $n^{-1/2}$ or $n^{-1/4}$ rates of convergence.
</p>projecteuclid.org/euclid.aos/1479891633_20161123040048Wed, 23 Nov 2016 04:00 ESTGlobal rates of convergence in log-concave density estimationhttp://projecteuclid.org/euclid.aos/1479891634<strong>Arlene K. H. Kim</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2756--2779.</p><p><strong>Abstract:</strong><br/>
The estimation of a log-concave density on $\mathbb{R}^{d}$ represents a central problem in the area of nonparametric inference under shape constraints. In this paper, we study the performance of log-concave density estimators with respect to global loss functions, and adopt a minimax approach. We first show that no statistical procedure based on a sample of size $n$ can estimate a log-concave density with respect to the squared Hellinger loss function with supremum risk smaller than order $n^{-4/5}$, when $d=1$, and order $n^{-2/(d+1)}$ when $d\geq2$. In particular, this reveals a sense in which, when $d\geq3$, log-concave density estimation is fundamentally more challenging than the estimation of a density with two bounded derivatives (a problem to which it has been compared). Second, we show that for $d\leq3$, the Hellinger $\varepsilon$-bracketing entropy of a class of log-concave densities with small mean and covariance matrix close to the identity grows like $\max\{\varepsilon^{-d/2},\varepsilon^{-(d-1)}\}$ (up to a logarithmic factor when $d=2$). This enables us to prove that when $d\leq3$ the log-concave maximum likelihood estimator achieves the minimax optimal rate (up to logarithmic factors when $d=2,3$) with respect to squared Hellinger loss.
</p>projecteuclid.org/euclid.aos/1479891634_20161123040048Wed, 23 Nov 2016 04:00 ESTTensor decompositions and sparse log-linear modelshttp://projecteuclid.org/euclid.aos/1487667616<strong>James E. Johndrow</strong>, <strong>Anirban Bhattacharya</strong>, <strong>David B. Dunson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 1--38.</p><p><strong>Abstract:</strong><br/>
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
</p>projecteuclid.org/euclid.aos/1487667616_20170221040055Tue, 21 Feb 2017 04:00 ESTA lava attack on the recovery of sums of dense and sparse signalshttp://projecteuclid.org/euclid.aos/1487667617<strong>Victor Chernozhukov</strong>, <strong>Christian Hansen</strong>, <strong>Yuan Liao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 39--76.</p><p><strong>Abstract:</strong><br/>
Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a “sparse $+$ dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein’s unbiased estimator for lava’s prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava’s improved performance relative to these benchmarks.
</p>projecteuclid.org/euclid.aos/1487667617_20170221040055Tue, 21 Feb 2017 04:00 ESTStatistical guarantees for the EM algorithm: From population to sample-based analysishttp://projecteuclid.org/euclid.aos/1487667618<strong>Sivaraman Balakrishnan</strong>, <strong>Martin J. Wainwright</strong>, <strong>Bin Yu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 77--120.</p><p><strong>Abstract:</strong><br/>
The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.
</p>projecteuclid.org/euclid.aos/1487667618_20170221040055Tue, 21 Feb 2017 04:00 ESTNormal approximation and concentration of spectral projectors of sample covariancehttp://projecteuclid.org/euclid.aos/1487667619<strong>Vladimir Koltchinskii</strong>, <strong>Karim Lounici</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 121--157.</p><p><strong>Abstract:</strong><br/>
Let $X,X_{1},\dots,X_{n}$ be i.i.d. Gaussian random variables in a separable Hilbert space $\mathbb{H}$ with zero mean and covariance operator $\Sigma=\mathbb{E}(X\otimes X)$, and let $\hat{\Sigma}:=n^{-1}\sum_{j=1}^{n}(X_{j}\otimes X_{j})$ be the sample (empirical) covariance operator based on $(X_{1},\dots,X_{n})$. Denote by $P_{r}$ the spectral projector of $\Sigma$ corresponding to its $r$th eigenvalue $\mu_{r}$ and by $\hat{P}_{r}$ the empirical counterpart of $P_{r}$. The main goal of the paper is to obtain tight bounds on
\[\sup_{x\in\mathbb{R}}\vert\mathbb{P} \{\frac{\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}-\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}}{\operatorname{Var}^{1/2}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})}\leq x\}-\Phi (x)\vert ,\] where $\Vert \cdot \Vert_{2}$ denotes the Hilbert–Schmidt norm and $\Phi$ is the standard normal distribution function. Such accuracy of normal approximation of the distribution of squared Hilbert–Schmidt error is characterized in terms of so-called effective rank of $\Sigma$ defined as ${\mathbf{r}}(\Sigma)=\frac{\operatorname{tr}(\Sigma)}{\Vert \Sigma \Vert_{\infty}}$, where $\operatorname{tr}(\Sigma)$ is the trace of $\Sigma$ and $\Vert \Sigma \Vert_{\infty}$ is its operator norm, as well as another parameter characterizing the size of $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$. Other results include nonasymptotic bounds and asymptotic representations for the mean squared Hilbert–Schmidt norm error $\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ and the variance $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$, and concentration inequalities for $\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ around its expectation.
</p>projecteuclid.org/euclid.aos/1487667619_20170221040055Tue, 21 Feb 2017 04:00 ESTA general theory of hypothesis tests and confidence regions for sparse high dimensional modelshttp://projecteuclid.org/euclid.aos/1487667620<strong>Yang Ning</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 158--195.</p><p><strong>Abstract:</strong><br/>
We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a novel decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our method provides a general framework for high dimensional inference and is applicable to a wide variety of applications. In particular, we apply this general framework to study five illustrative examples: linear regression, logistic regression, Poisson regression, Gaussian graphical model and additive hazards model. For hypothesis testing, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers. For confidence region construction, we show that the decorrelated score function can be used to construct point estimators that are asymptotically normal and semiparametrically efficient. We further generalize this framework to handle the settings of misspecified models. Thorough numerical results are provided to back up the developed theory.
</p>projecteuclid.org/euclid.aos/1487667620_20170221040055Tue, 21 Feb 2017 04:00 ESTA Bayesian approach for envelope modelshttp://projecteuclid.org/euclid.aos/1487667621<strong>Kshitij Khare</strong>, <strong>Subhadip Pal</strong>, <strong>Zhihua Su</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 196--222.</p><p><strong>Abstract:</strong><br/>
The envelope model is a new paradigm to address estimation and prediction in multivariate analysis. Using sufficient dimension reduction techniques, it has the potential to achieve substantial efficiency gains compared to standard models. This model was first introduced by [Statist. Sinica 20 (2010) 927–960] for multivariate linear regression, and has since been adapted to many other contexts. However, a Bayesian approach for analyzing envelope models has not yet been investigated in the literature. In this paper, we develop a comprehensive Bayesian framework for estimation and model selection in envelope models in the context of multivariate linear regression. Our framework has the following attractive features. First, we use the matrix Bingham distribution to construct a prior on the orthogonal basis matrix of the envelope subspace. This prior respects the manifold structure of the envelope model, and can directly incorporate prior information about the envelope subspace through the specification of hyperparamaters. This feature has potential applications in the broader Bayesian sufficient dimension reduction area. Second, sampling from the resulting posterior distribution can be achieved by using a block Gibbs sampler with standard associated conditionals. This in turn facilitates computationally efficient estimation and model selection. Third, unlike the current frequentist approach, our approach can accommodate situations where the sample size is smaller than the number of responses. Lastly, the Bayesian approach inherently offers comprehensive uncertainty characterization through the posterior distribution. We illustrate the utility of our approach on simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1487667621_20170221040055Tue, 21 Feb 2017 04:00 ESTMonge–Kantorovich depth, quantiles, ranks and signshttp://projecteuclid.org/euclid.aos/1487667622<strong>Victor Chernozhukov</strong>, <strong>Alfred Galichon</strong>, <strong>Marc Hallin</strong>, <strong>Marc Henry</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 223--256.</p><p><strong>Abstract:</strong><br/>
We propose new concepts of statistical depth, multivariate quantiles, vector quantiles and ranks, ranks and signs, based on canonical transportation maps between a distribution of interest on $\mathbb{R}^{d}$ and a reference distribution on the $d$-dimensional unit ball. The new depth concept, called Monge–Kantorovich depth , specializes to halfspace depth for $d=1$ and in the case of spherical distributions, but for more general distributions, differs from the latter in the ability for its contours to account for non-convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge–Kantorovich depth contours, quantiles, ranks, signs and vector quantiles and ranks, and show their consistency by establishing a uniform convergence property for empirical (forward and reverse) transport maps, which is the main theoretical result of this paper.
</p>projecteuclid.org/euclid.aos/1487667622_20170221040055Tue, 21 Feb 2017 04:00 ESTIdentifying the number of factors from singular values of a large sample auto-covariance matrixhttp://projecteuclid.org/euclid.aos/1487667623<strong>Zeng Li</strong>, <strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 257--288.</p><p><strong>Abstract:</strong><br/>
Identifying the number of factors in a high-dimensional factor model has attracted much attention in recent years and a general solution to the problem is still lacking. A promising ratio estimator based on singular values of lagged sample auto-covariance matrices has been recently proposed in the literature with a reasonably good performance under some specific assumption on the strength of the factors. Inspired by this ratio estimator and as a first main contribution, this paper proposes a complete theory of such sample singular values for both the factor part and the noise part under the large-dimensional scheme where the dimension and the sample size proportionally grow to infinity. In particular, we provide an exact description of the phase transition phenomenon that determines whether a factor is strong enough to be detected with the observed sample singular values. Based on these findings and as a second main contribution of the paper, we propose a new estimator of the number of factors which is strongly consistent for the detection of all significant factors (which are the only theoretically detectable ones). In particular, factors are assumed to have the minimum strength above the phase transition boundary which is of the order of a constant; they are thus not required to grow to infinity together with the dimension (as assumed in most of the existing papers on high-dimensional factor models). Empirical Monte-Carlo study as well as the analysis of stock returns data attest a very good performance of the proposed estimator. In all the tested cases, the new estimator largely outperforms the existing estimator using the same ratios of singular values.
</p>projecteuclid.org/euclid.aos/1487667623_20170221040055Tue, 21 Feb 2017 04:00 ESTConsistency of spectral hypergraph partitioning under planted partition modelhttp://projecteuclid.org/euclid.aos/1487667624<strong>Debarghya Ghoshdastidar</strong>, <strong>Ambedkar Dukkipati</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 289--315.</p><p><strong>Abstract:</strong><br/>
Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. Many algorithms for hypergraph partitioning have been proposed that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph partitioning. For instance, consistency results of spectral graph partitioning under the stochastic block model are well known. In this paper, we present a planted partition model for sparse random nonuniform hypergraphs that generalizes the stochastic block model. We derive an error bound for a spectral hypergraph partitioning algorithm under this model using matrix concentration inequalities. To the best of our knowledge, this is the first consistency result related to partitioning nonuniform hypergraphs.
</p>projecteuclid.org/euclid.aos/1487667624_20170221040055Tue, 21 Feb 2017 04:00 ESTOracle inequalities for network models and sparse graphon estimationhttp://projecteuclid.org/euclid.aos/1487667625<strong>Olga Klopp</strong>, <strong>Alexandre B. Tsybakov</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 316--354.</p><p><strong>Abstract:</strong><br/>
Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. We consider the problem of statistical estimation of the matrix of connection probabilities based on the observations of the adjacency matrix of the network. Taking the stochastic block model as an approximation, we construct estimators of network connection probabilities—the ordinary block constant least squares estimator, and its restricted version. We show that they satisfy oracle inequalities with respect to the block constant oracle. As a consequence, we derive optimal rates of estimation of the probability matrix. Our results cover the important setting of sparse networks. Another consequence consists in establishing upper bounds on the minimax risks for graphon estimation in the $L_{2}$ norm when the probability matrix is sampled according to a graphon model. These bounds include an additional term accounting for the “agnostic” error induced by the variability of the latent unobserved variables of the graphon model. In this setting, the optimal rates are influenced not only by the bias and variance components as in usual nonparametric problems but also include the third component, which is the agnostic error. The results shed light on the differences between estimation under the empirical loss (the probability matrix estimation) and under the integrated loss (the graphon estimation).
</p>projecteuclid.org/euclid.aos/1487667625_20170221040055Tue, 21 Feb 2017 04:00 ESTApproximate group context treehttp://projecteuclid.org/euclid.aos/1487667626<strong>Alexandre Belloni</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 355--385.</p><p><strong>Abstract:</strong><br/>
We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $\beta$-mixing. In particular, model misspecification is allowed.
These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes’ continuity rates. Similar guarantees are also derived for renewal processes.
Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work by Galves et al. [ Ann. Appl. Stat. 6 (2012) 186–209], of the rhythmic differences between Brazilian and European Portuguese.
</p>projecteuclid.org/euclid.aos/1487667626_20170221040055Tue, 21 Feb 2017 04:00 ESTFlexible results for quadratic forms with applications to variance components estimationhttp://projecteuclid.org/euclid.aos/1487667627<strong>Lee H. Dicker</strong>, <strong>Murat A. Erdogdu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 386--414.</p><p><strong>Abstract:</strong><br/>
We derive convenient uniform concentration bounds and finite sample multivariate normal approximation results for quadratic forms, then describe some applications involving variance components estimation in linear random-effects models. Random-effects models and variance components estimation are classical topics in statistics, with a corresponding well-established asymptotic theory. However, our finite sample results for quadratic forms provide additional flexibility for easily analyzing random-effects models in nonstandard settings, which are becoming more important in modern applications (e.g., genomics). For instance, in addition to deriving novel non-asymptotic bounds for variance components estimators in classical linear random-effects models, we provide a concentration bound for variance components estimators in linear models with correlated random-effects and discuss an application involving sparse random-effects models. Our general concentration bound is a uniform version of the Hanson–Wright inequality. The main normal approximation result in the paper is derived using Reinert and Röllin [ Ann. Probab. (2009) 37 2150–2173] embedding technique for Stein’s method of exchangeable pairs.
</p>projecteuclid.org/euclid.aos/1487667627_20170221040055Tue, 21 Feb 2017 04:00 ESTExtreme eigenvalues of large-dimensional spiked Fisher matrices with applicationhttp://projecteuclid.org/euclid.aos/1487667628<strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 415--460.</p><p><strong>Abstract:</strong><br/>
Consider two $p$-variate populations, not necessarily Gaussian, with covariance matrices $\Sigma_{1}$ and $\Sigma_{2}$, respectively. Let $S_{1}$ and $S_{2}$ be the corresponding sample covariance matrices with degrees of freedom $m$ and $n$. When the difference $\Delta$ between $\Sigma_{1}$ and $\Sigma_{2}$ is of small rank compared to $p,m$ and $n$, the Fisher matrix $S:=S_{2}^{-1}S_{1}$ is called a spiked Fisher matrix . When $p,m$ and $n$ grow to infinity proportionally, we establish a phase transition for the extreme eigenvalues of the Fisher matrix: a displacement formula showing that when the eigenvalues of $\Delta$ ( spikes ) are above (or under) a critical value, the associated extreme eigenvalues of $S$ will converge to some point outside the support of the global limit (LSD) of other eigenvalues (become outliers); otherwise, they will converge to the edge points of the LSD. Furthermore, we derive central limit theorems for those outlier eigenvalues of $S$. The limiting distributions are found to be Gaussian if and only if the corresponding population spike eigenvalues in $\Delta$ are simple . Two applications are introduced. The first application uses the largest eigenvalue of the Fisher matrix to test the equality between two high-dimensional covariance matrices, and explicit power function is found under the spiked alternative. The second application is in the field of signal detection, where an estimator for the number of signals is proposed while the covariance structure of the noise is arbitrary.
</p>projecteuclid.org/euclid.aos/1487667628_20170221040055Tue, 21 Feb 2017 04:00 EST