The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTOn marginal sliced inverse regression for ultrahigh dimensional model-free feature selectionhttp://projecteuclid.org/euclid.aos/1479891629<strong>Zhou Yu</strong>, <strong>Yuexiao Dong</strong>, <strong>Jun Shao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2594--2623.</p><p><strong>Abstract:</strong><br/>
Model-free variable selection has been implemented under the sufficient dimension reduction framework since the seminal paper of Cook [ Ann. Statist. 32 (2004) 1062–1092]. In this paper, we extend the marginal coordinate test for sliced inverse regression (SIR) in Cook (2004) and propose a novel marginal SIR utility for the purpose of ultrahigh dimensional feature selection. Two distinct procedures, Dantzig selector and sparse precision matrix estimation, are incorporated to get two versions of sample level marginal SIR utilities. Both procedures lead to model-free variable selection consistency with predictor dimensionality $p$ diverging at an exponential rate of the sample size $n$. As a special case of marginal SIR, we ignore the correlation among the predictors and propose marginal independence SIR. Marginal independence SIR is closely related to many existing independence screening procedures in the literature, and achieves model-free screening consistency in the ultrahigh dimensional setting. The finite sample performances of the proposed procedures are studied through synthetic examples and an application to the small round blue cell tumors data.
</p>projecteuclid.org/euclid.aos/1479891629_20161123040048Wed, 23 Nov 2016 04:00 ESTFaithful variable screening for high-dimensional convex regressionhttp://projecteuclid.org/euclid.aos/1479891630<strong>Min Xu</strong>, <strong>Minhua Chen</strong>, <strong>John Lafferty</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2624--2660.</p><p><strong>Abstract:</strong><br/>
We study the problem of variable selection in convex nonparametric regression. Under the assumption that the true regression function is convex and sparse, we develop a screening procedure to select a subset of variables that contains the relevant variables. Our approach is a two-stage quadratic programming method that estimates a sum of one-dimensional convex functions, followed by one-dimensional concave regression fits on the residuals. In contrast to previous methods for sparse additive models, the optimization is finite dimensional and requires no tuning parameters for smoothness. Under appropriate assumptions, we prove that the procedure is faithful in the population setting, yielding no false negatives. We give a finite sample statistical analysis, and introduce algorithms for efficiently carrying out the required quadratic programs. The approach leads to computational and statistical advantages over fitting a full model, and provides an effective, practical approach to variable screening in convex regression.
</p>projecteuclid.org/euclid.aos/1479891630_20161123040048Wed, 23 Nov 2016 04:00 ESTHigh-dimensional generalizations of asymmetric least squares regression and their applicationshttp://projecteuclid.org/euclid.aos/1479891631<strong>Yuwen Gu</strong>, <strong>Hui Zou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2661--2694.</p><p><strong>Abstract:</strong><br/>
Asymmetric least squares regression is an important method that has wide applications in statistics, econometrics and finance. The existing work on asymmetric least squares only considers the traditional low dimension and large sample setting. In this paper, we systematically study the Sparse Asymmetric LEast Squares (SALES) regression under high dimensions where the penalty functions include the Lasso and nonconvex penalties. We develop a unified efficient algorithm for fitting SALES and establish its theoretical properties. As an important application, SALES is used to detect heteroscedasticity in high-dimensional data. Another method for detecting heteroscedasticity is the sparse quantile regression. However, both SALES and the sparse quantile regression may fail to tell which variables are important for the conditional mean and which variables are important for the conditional scale/variance, especially when there are variables that are important for both the mean and the scale. To that end, we further propose a COupled Sparse Asymmetric LEast Squares (COSALES) regression which can be efficiently solved by an algorithm similar to that for solving SALES. We establish theoretical properties of COSALES. In particular, COSALES using the SCAD penalty or MCP is shown to consistently identify the two important subsets for the mean and scale simultaneously, even when the two subsets overlap. We demonstrate the empirical performance of SALES and COSALES by simulated and real data.
</p>projecteuclid.org/euclid.aos/1479891631_20161123040048Wed, 23 Nov 2016 04:00 ESTSub-Gaussian mean estimatorshttp://projecteuclid.org/euclid.aos/1479891632<strong>Luc Devroye</strong>, <strong>Matthieu Lerasle</strong>, <strong>Gabor Lugosi</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2695--2725.</p><p><strong>Abstract:</strong><br/>
We discuss the possibilities and limitations of estimating the mean of a real-valued random variable from independent and identically distributed observations from a nonasymptotic point of view. In particular, we define estimators with a sub-Gaussian behavior even for certain heavy-tailed distributions. We also prove various impossibility results for mean estimators.
</p>projecteuclid.org/euclid.aos/1479891632_20161123040048Wed, 23 Nov 2016 04:00 ESTConvergence rates of parameter estimation for some weakly identifiable finite mixtureshttp://projecteuclid.org/euclid.aos/1479891633<strong>Nhat Ho</strong>, <strong>XuanLong Nguyen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2726--2755.</p><p><strong>Abstract:</strong><br/>
We establish minimax lower bounds and maximum likelihood convergence rates of parameter estimation for mean-covariance multivariate Gaussian mixtures, shape-rate Gamma mixtures and some variants of finite mixture models, including the setting where the number of mixing components is bounded but unknown. These models belong to what we call “weakly identifiable” classes, which exhibit specific interactions among mixing parameters driven by the algebraic structures of the class of kernel densities and their partial derivatives. Accordingly, both the minimax bounds and the maximum likelihood parameter estimation rates in these models, obtained under some compactness conditions on the parameter space, are shown to be typically much slower than the usual $n^{-1/2}$ or $n^{-1/4}$ rates of convergence.
</p>projecteuclid.org/euclid.aos/1479891633_20161123040048Wed, 23 Nov 2016 04:00 ESTGlobal rates of convergence in log-concave density estimationhttp://projecteuclid.org/euclid.aos/1479891634<strong>Arlene K. H. Kim</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 44, Number 6, 2756--2779.</p><p><strong>Abstract:</strong><br/>
The estimation of a log-concave density on $\mathbb{R}^{d}$ represents a central problem in the area of nonparametric inference under shape constraints. In this paper, we study the performance of log-concave density estimators with respect to global loss functions, and adopt a minimax approach. We first show that no statistical procedure based on a sample of size $n$ can estimate a log-concave density with respect to the squared Hellinger loss function with supremum risk smaller than order $n^{-4/5}$, when $d=1$, and order $n^{-2/(d+1)}$ when $d\geq2$. In particular, this reveals a sense in which, when $d\geq3$, log-concave density estimation is fundamentally more challenging than the estimation of a density with two bounded derivatives (a problem to which it has been compared). Second, we show that for $d\leq3$, the Hellinger $\varepsilon$-bracketing entropy of a class of log-concave densities with small mean and covariance matrix close to the identity grows like $\max\{\varepsilon^{-d/2},\varepsilon^{-(d-1)}\}$ (up to a logarithmic factor when $d=2$). This enables us to prove that when $d\leq3$ the log-concave maximum likelihood estimator achieves the minimax optimal rate (up to logarithmic factors when $d=2,3$) with respect to squared Hellinger loss.
</p>projecteuclid.org/euclid.aos/1479891634_20161123040048Wed, 23 Nov 2016 04:00 ESTTensor decompositions and sparse log-linear modelshttp://projecteuclid.org/euclid.aos/1487667616<strong>James E. Johndrow</strong>, <strong>Anirban Bhattacharya</strong>, <strong>David B. Dunson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 1--38.</p><p><strong>Abstract:</strong><br/>
Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. We derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
</p>projecteuclid.org/euclid.aos/1487667616_20170221040055Tue, 21 Feb 2017 04:00 ESTA lava attack on the recovery of sums of dense and sparse signalshttp://projecteuclid.org/euclid.aos/1487667617<strong>Victor Chernozhukov</strong>, <strong>Christian Hansen</strong>, <strong>Yuan Liao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 39--76.</p><p><strong>Abstract:</strong><br/>
Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a “sparse $+$ dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein’s unbiased estimator for lava’s prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava’s improved performance relative to these benchmarks.
</p>projecteuclid.org/euclid.aos/1487667617_20170221040055Tue, 21 Feb 2017 04:00 ESTStatistical guarantees for the EM algorithm: From population to sample-based analysishttp://projecteuclid.org/euclid.aos/1487667618<strong>Sivaraman Balakrishnan</strong>, <strong>Martin J. Wainwright</strong>, <strong>Bin Yu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 77--120.</p><p><strong>Abstract:</strong><br/>
The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.
</p>projecteuclid.org/euclid.aos/1487667618_20170221040055Tue, 21 Feb 2017 04:00 ESTNormal approximation and concentration of spectral projectors of sample covariancehttp://projecteuclid.org/euclid.aos/1487667619<strong>Vladimir Koltchinskii</strong>, <strong>Karim Lounici</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 121--157.</p><p><strong>Abstract:</strong><br/>
Let $X,X_{1},\dots,X_{n}$ be i.i.d. Gaussian random variables in a separable Hilbert space $\mathbb{H}$ with zero mean and covariance operator $\Sigma=\mathbb{E}(X\otimes X)$, and let $\hat{\Sigma}:=n^{-1}\sum_{j=1}^{n}(X_{j}\otimes X_{j})$ be the sample (empirical) covariance operator based on $(X_{1},\dots,X_{n})$. Denote by $P_{r}$ the spectral projector of $\Sigma$ corresponding to its $r$th eigenvalue $\mu_{r}$ and by $\hat{P}_{r}$ the empirical counterpart of $P_{r}$. The main goal of the paper is to obtain tight bounds on
\[\sup_{x\in\mathbb{R}}\vert\mathbb{P} \{\frac{\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}-\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}}{\operatorname{Var}^{1/2}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})}\leq x\}-\Phi (x)\vert ,\] where $\Vert \cdot \Vert_{2}$ denotes the Hilbert–Schmidt norm and $\Phi$ is the standard normal distribution function. Such accuracy of normal approximation of the distribution of squared Hilbert–Schmidt error is characterized in terms of so-called effective rank of $\Sigma$ defined as ${\mathbf{r}}(\Sigma)=\frac{\operatorname{tr}(\Sigma)}{\Vert \Sigma \Vert_{\infty}}$, where $\operatorname{tr}(\Sigma)$ is the trace of $\Sigma$ and $\Vert \Sigma \Vert_{\infty}$ is its operator norm, as well as another parameter characterizing the size of $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$. Other results include nonasymptotic bounds and asymptotic representations for the mean squared Hilbert–Schmidt norm error $\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ and the variance $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$, and concentration inequalities for $\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ around its expectation.
</p>projecteuclid.org/euclid.aos/1487667619_20170221040055Tue, 21 Feb 2017 04:00 ESTA general theory of hypothesis tests and confidence regions for sparse high dimensional modelshttp://projecteuclid.org/euclid.aos/1487667620<strong>Yang Ning</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 158--195.</p><p><strong>Abstract:</strong><br/>
We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a novel decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our method provides a general framework for high dimensional inference and is applicable to a wide variety of applications. In particular, we apply this general framework to study five illustrative examples: linear regression, logistic regression, Poisson regression, Gaussian graphical model and additive hazards model. For hypothesis testing, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers. For confidence region construction, we show that the decorrelated score function can be used to construct point estimators that are asymptotically normal and semiparametrically efficient. We further generalize this framework to handle the settings of misspecified models. Thorough numerical results are provided to back up the developed theory.
</p>projecteuclid.org/euclid.aos/1487667620_20170221040055Tue, 21 Feb 2017 04:00 ESTA Bayesian approach for envelope modelshttp://projecteuclid.org/euclid.aos/1487667621<strong>Kshitij Khare</strong>, <strong>Subhadip Pal</strong>, <strong>Zhihua Su</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 196--222.</p><p><strong>Abstract:</strong><br/>
The envelope model is a new paradigm to address estimation and prediction in multivariate analysis. Using sufficient dimension reduction techniques, it has the potential to achieve substantial efficiency gains compared to standard models. This model was first introduced by [Statist. Sinica 20 (2010) 927–960] for multivariate linear regression, and has since been adapted to many other contexts. However, a Bayesian approach for analyzing envelope models has not yet been investigated in the literature. In this paper, we develop a comprehensive Bayesian framework for estimation and model selection in envelope models in the context of multivariate linear regression. Our framework has the following attractive features. First, we use the matrix Bingham distribution to construct a prior on the orthogonal basis matrix of the envelope subspace. This prior respects the manifold structure of the envelope model, and can directly incorporate prior information about the envelope subspace through the specification of hyperparamaters. This feature has potential applications in the broader Bayesian sufficient dimension reduction area. Second, sampling from the resulting posterior distribution can be achieved by using a block Gibbs sampler with standard associated conditionals. This in turn facilitates computationally efficient estimation and model selection. Third, unlike the current frequentist approach, our approach can accommodate situations where the sample size is smaller than the number of responses. Lastly, the Bayesian approach inherently offers comprehensive uncertainty characterization through the posterior distribution. We illustrate the utility of our approach on simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1487667621_20170221040055Tue, 21 Feb 2017 04:00 ESTMonge–Kantorovich depth, quantiles, ranks and signshttp://projecteuclid.org/euclid.aos/1487667622<strong>Victor Chernozhukov</strong>, <strong>Alfred Galichon</strong>, <strong>Marc Hallin</strong>, <strong>Marc Henry</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 223--256.</p><p><strong>Abstract:</strong><br/>
We propose new concepts of statistical depth, multivariate quantiles, vector quantiles and ranks, ranks and signs, based on canonical transportation maps between a distribution of interest on $\mathbb{R}^{d}$ and a reference distribution on the $d$-dimensional unit ball. The new depth concept, called Monge–Kantorovich depth , specializes to halfspace depth for $d=1$ and in the case of spherical distributions, but for more general distributions, differs from the latter in the ability for its contours to account for non-convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge–Kantorovich depth contours, quantiles, ranks, signs and vector quantiles and ranks, and show their consistency by establishing a uniform convergence property for empirical (forward and reverse) transport maps, which is the main theoretical result of this paper.
</p>projecteuclid.org/euclid.aos/1487667622_20170221040055Tue, 21 Feb 2017 04:00 ESTIdentifying the number of factors from singular values of a large sample auto-covariance matrixhttp://projecteuclid.org/euclid.aos/1487667623<strong>Zeng Li</strong>, <strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 257--288.</p><p><strong>Abstract:</strong><br/>
Identifying the number of factors in a high-dimensional factor model has attracted much attention in recent years and a general solution to the problem is still lacking. A promising ratio estimator based on singular values of lagged sample auto-covariance matrices has been recently proposed in the literature with a reasonably good performance under some specific assumption on the strength of the factors. Inspired by this ratio estimator and as a first main contribution, this paper proposes a complete theory of such sample singular values for both the factor part and the noise part under the large-dimensional scheme where the dimension and the sample size proportionally grow to infinity. In particular, we provide an exact description of the phase transition phenomenon that determines whether a factor is strong enough to be detected with the observed sample singular values. Based on these findings and as a second main contribution of the paper, we propose a new estimator of the number of factors which is strongly consistent for the detection of all significant factors (which are the only theoretically detectable ones). In particular, factors are assumed to have the minimum strength above the phase transition boundary which is of the order of a constant; they are thus not required to grow to infinity together with the dimension (as assumed in most of the existing papers on high-dimensional factor models). Empirical Monte-Carlo study as well as the analysis of stock returns data attest a very good performance of the proposed estimator. In all the tested cases, the new estimator largely outperforms the existing estimator using the same ratios of singular values.
</p>projecteuclid.org/euclid.aos/1487667623_20170221040055Tue, 21 Feb 2017 04:00 ESTConsistency of spectral hypergraph partitioning under planted partition modelhttp://projecteuclid.org/euclid.aos/1487667624<strong>Debarghya Ghoshdastidar</strong>, <strong>Ambedkar Dukkipati</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 289--315.</p><p><strong>Abstract:</strong><br/>
Hypergraph partitioning lies at the heart of a number of problems in machine learning and network sciences. Many algorithms for hypergraph partitioning have been proposed that extend standard approaches for graph partitioning to the case of hypergraphs. However, theoretical aspects of such methods have seldom received attention in the literature as compared to the extensive studies on the guarantees of graph partitioning. For instance, consistency results of spectral graph partitioning under the stochastic block model are well known. In this paper, we present a planted partition model for sparse random nonuniform hypergraphs that generalizes the stochastic block model. We derive an error bound for a spectral hypergraph partitioning algorithm under this model using matrix concentration inequalities. To the best of our knowledge, this is the first consistency result related to partitioning nonuniform hypergraphs.
</p>projecteuclid.org/euclid.aos/1487667624_20170221040055Tue, 21 Feb 2017 04:00 ESTOracle inequalities for network models and sparse graphon estimationhttp://projecteuclid.org/euclid.aos/1487667625<strong>Olga Klopp</strong>, <strong>Alexandre B. Tsybakov</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 316--354.</p><p><strong>Abstract:</strong><br/>
Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. We consider the problem of statistical estimation of the matrix of connection probabilities based on the observations of the adjacency matrix of the network. Taking the stochastic block model as an approximation, we construct estimators of network connection probabilities—the ordinary block constant least squares estimator, and its restricted version. We show that they satisfy oracle inequalities with respect to the block constant oracle. As a consequence, we derive optimal rates of estimation of the probability matrix. Our results cover the important setting of sparse networks. Another consequence consists in establishing upper bounds on the minimax risks for graphon estimation in the $L_{2}$ norm when the probability matrix is sampled according to a graphon model. These bounds include an additional term accounting for the “agnostic” error induced by the variability of the latent unobserved variables of the graphon model. In this setting, the optimal rates are influenced not only by the bias and variance components as in usual nonparametric problems but also include the third component, which is the agnostic error. The results shed light on the differences between estimation under the empirical loss (the probability matrix estimation) and under the integrated loss (the graphon estimation).
</p>projecteuclid.org/euclid.aos/1487667625_20170221040055Tue, 21 Feb 2017 04:00 ESTApproximate group context treehttp://projecteuclid.org/euclid.aos/1487667626<strong>Alexandre Belloni</strong>, <strong>Roberto I. Oliveira</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 355--385.</p><p><strong>Abstract:</strong><br/>
We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $\beta$-mixing. In particular, model misspecification is allowed.
These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes’ continuity rates. Similar guarantees are also derived for renewal processes.
Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work by Galves et al. [ Ann. Appl. Stat. 6 (2012) 186–209], of the rhythmic differences between Brazilian and European Portuguese.
</p>projecteuclid.org/euclid.aos/1487667626_20170221040055Tue, 21 Feb 2017 04:00 ESTFlexible results for quadratic forms with applications to variance components estimationhttp://projecteuclid.org/euclid.aos/1487667627<strong>Lee H. Dicker</strong>, <strong>Murat A. Erdogdu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 386--414.</p><p><strong>Abstract:</strong><br/>
We derive convenient uniform concentration bounds and finite sample multivariate normal approximation results for quadratic forms, then describe some applications involving variance components estimation in linear random-effects models. Random-effects models and variance components estimation are classical topics in statistics, with a corresponding well-established asymptotic theory. However, our finite sample results for quadratic forms provide additional flexibility for easily analyzing random-effects models in nonstandard settings, which are becoming more important in modern applications (e.g., genomics). For instance, in addition to deriving novel non-asymptotic bounds for variance components estimators in classical linear random-effects models, we provide a concentration bound for variance components estimators in linear models with correlated random-effects and discuss an application involving sparse random-effects models. Our general concentration bound is a uniform version of the Hanson–Wright inequality. The main normal approximation result in the paper is derived using Reinert and Röllin [ Ann. Probab. (2009) 37 2150–2173] embedding technique for Stein’s method of exchangeable pairs.
</p>projecteuclid.org/euclid.aos/1487667627_20170221040055Tue, 21 Feb 2017 04:00 ESTExtreme eigenvalues of large-dimensional spiked Fisher matrices with applicationhttp://projecteuclid.org/euclid.aos/1487667628<strong>Qinwen Wang</strong>, <strong>Jianfeng Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 1, 415--460.</p><p><strong>Abstract:</strong><br/>
Consider two $p$-variate populations, not necessarily Gaussian, with covariance matrices $\Sigma_{1}$ and $\Sigma_{2}$, respectively. Let $S_{1}$ and $S_{2}$ be the corresponding sample covariance matrices with degrees of freedom $m$ and $n$. When the difference $\Delta$ between $\Sigma_{1}$ and $\Sigma_{2}$ is of small rank compared to $p,m$ and $n$, the Fisher matrix $S:=S_{2}^{-1}S_{1}$ is called a spiked Fisher matrix . When $p,m$ and $n$ grow to infinity proportionally, we establish a phase transition for the extreme eigenvalues of the Fisher matrix: a displacement formula showing that when the eigenvalues of $\Delta$ ( spikes ) are above (or under) a critical value, the associated extreme eigenvalues of $S$ will converge to some point outside the support of the global limit (LSD) of other eigenvalues (become outliers); otherwise, they will converge to the edge points of the LSD. Furthermore, we derive central limit theorems for those outlier eigenvalues of $S$. The limiting distributions are found to be Gaussian if and only if the corresponding population spike eigenvalues in $\Delta$ are simple . Two applications are introduced. The first application uses the largest eigenvalue of the Fisher matrix to test the equality between two high-dimensional covariance matrices, and explicit power function is found under the spiked alternative. The second application is in the field of signal detection, where an estimator for the number of signals is proposed while the covariance structure of the noise is arbitrary.
</p>projecteuclid.org/euclid.aos/1487667628_20170221040055Tue, 21 Feb 2017 04:00 ESTMimicking counterfactual outcomes to estimate causal effectshttp://projecteuclid.org/euclid.aos/1494921947<strong>Judith J. Lok</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 461--499.</p><p><strong>Abstract:</strong><br/>
In observational studies, treatment may be adapted to covariates at several times without a fixed protocol, in continuous time. Treatment influences covariates, which influence treatment, which influences covariates and so on. Then even time-dependent Cox-models cannot be used to estimate the net treatment effect. Structural nested models have been applied in this setting. Structural nested models are based on counterfactuals: the outcome a person would have had had treatment been withheld after a certain time. Previous work on continuous-time structural nested models assumes that counterfactuals depend deterministically on observed data, while conjecturing that this assumption can be relaxed. This article proves that one can mimic counterfactuals by constructing random variables, solutions to a differential equation, that have the same distribution as the counterfactuals, even given past observed data. These “mimicking” variables can be used to estimate the parameters of structural nested models without assuming the treatment effect to be deterministic.
</p>projecteuclid.org/euclid.aos/1494921947_20170516040557Tue, 16 May 2017 04:05 EDTLikelihood-based model selection for stochastic block modelshttp://projecteuclid.org/euclid.aos/1494921948<strong>Y. X. Rachel Wang</strong>, <strong>Peter J. Bickel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 500--528.</p><p><strong>Abstract:</strong><br/>
The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model (DCSBM). In practice, the likelihood function can be estimated using more computationally efficient variational methods or consistent label estimation algorithms, allowing the criterion to be applied to large networks.
</p>projecteuclid.org/euclid.aos/1494921948_20170516040557Tue, 16 May 2017 04:05 EDTMultiple testing of local maxima for detection of peaks in random fieldshttp://projecteuclid.org/euclid.aos/1494921949<strong>Dan Cheng</strong>, <strong>Armin Schwartzman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 529--556.</p><p><strong>Abstract:</strong><br/>
A topological multiple testing scheme is presented for detecting peaks in images under stationary ergodic Gaussian noise, where tests are performed at local maxima of the smoothed observed signals. The procedure generalizes the one-dimensional scheme of Schwartzman, Gavrilov and Adler [ Ann. Statist. 39 (2011) 3290–3319] to Euclidean domains of arbitrary dimension. Two methods are developed according to two different ways of computing p-values: (i) using the exact distribution of the height of local maxima, available explicitly when the noise field is isotropic [ Extremes 18 (2015) 213–240; Expected number and height distribution of critical points of smooth isotropic Gaussian random fields (2015) Preprint]; (ii) using an approximation to the overshoot distribution of local maxima above a pre-threshold, applicable when the exact distribution is unknown, such as when the stationary noise field is nonisotropic [ Extremes 18 (2015) 213–240]. The algorithms, combined with the Benjamini–Hochberg procedure for thresholding p-values, provide asymptotic strong control of the False Discovery Rate (FDR) and power consistency, with specific rates, as the search space and signal strength get large. The optimal smoothing bandwidth and optimal pre-threshold are obtained to achieve maximum power. Simulations show that FDR levels are maintained in nonasymptotic conditions. The methods are illustrated in the analysis of functional magnetic resonance images of the brain.
</p>projecteuclid.org/euclid.aos/1494921949_20170516040557Tue, 16 May 2017 04:05 EDTA rate optimal procedure for recovering sparse differences between high-dimensional means under dependencehttp://projecteuclid.org/euclid.aos/1494921950<strong>Jun Li</strong>, <strong>Ping-Shou Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 557--590.</p><p><strong>Abstract:</strong><br/>
The paper considers the problem of recovering the sparse different components between two high-dimensional means of column-wise dependent random vectors. We show that dependence can be utilized to lower the identification boundary for signal recovery. Moreover, an optimal convergence rate for the marginal false nondiscovery rate (mFNR) is established under dependence. The convergence rate is faster than the optimal rate without dependence. To recover the sparse signal bearing dimensions, we propose a Dependence-Assisted Thresholding and Excising (DATE) procedure, which is shown to be rate optimal for the mFNR with the marginal false discovery rate (mFDR) controlled at a pre-specified level. Extensions of the DATE to recover the differences in contrasts among multiple population means and differences between two covariance matrices are also provided. Simulation studies and case study are given to demonstrate the performance of the proposed signal identification procedure.
</p>projecteuclid.org/euclid.aos/1494921950_20170516040557Tue, 16 May 2017 04:05 EDTOnline estimation of the geometric median in Hilbert spaces: Nonasymptotic confidence ballshttp://projecteuclid.org/euclid.aos/1494921951<strong>Hervé Cardot</strong>, <strong>Peggy Cénac</strong>, <strong>Antoine Godichon-Baggioni</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 591--614.</p><p><strong>Abstract:</strong><br/>
Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with very large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged nonlinear stochastic gradient algorithm has been developed by [ Bernoulli 19 (2013) 18–43]. This work aims at studying more precisely the nonasymptotic behavior of this nonlinear algorithm by giving nonasymptotic confidence balls in general separable Hilbert spaces. This new result is based on the derivation of improved $L^{2}$ rates of convergence as well as an exponential inequality for the nearly martingale terms of the recursive nonlinear Robbins–Monro algorithm.
</p>projecteuclid.org/euclid.aos/1494921951_20170516040557Tue, 16 May 2017 04:05 EDTConfidence intervals for high-dimensional linear regression: Minimax rates and adaptivityhttp://projecteuclid.org/euclid.aos/1494921952<strong>T. Tony Cai</strong>, <strong>Zijian Guo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 615--646.</p><p><strong>Abstract:</strong><br/>
Confidence sets play a fundamental role in statistical inference. In this paper, we consider confidence intervals for high-dimensional linear regression with random design. We first establish the convergence rates of the minimax expected length for confidence intervals in the oracle setting where the sparsity parameter is given. The focus is then on the problem of adaptation to sparsity for the construction of confidence intervals. Ideally, an adaptive confidence interval should have its length automatically adjusted to the sparsity of the unknown regression vector, while maintaining a pre-specified coverage probability. It is shown that such a goal is in general not attainable, except when the sparsity parameter is restricted to a small region over which the confidence intervals have the optimal length of the usual parametric rate. It is further demonstrated that the lack of adaptivity is not due to the conservativeness of the minimax framework, but is fundamentally caused by the difficulty of learning the bias accurately.
</p>projecteuclid.org/euclid.aos/1494921952_20170516040557Tue, 16 May 2017 04:05 EDTEstimating the effect of joint interventions from observational data in sparse high-dimensional settingshttp://projecteuclid.org/euclid.aos/1494921953<strong>Preetam Nandy</strong>, <strong>Marloes H. Maathuis</strong>, <strong>Thomas S. Richardson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 647--674.</p><p><strong>Abstract:</strong><br/>
We consider the estimation of joint causal effects from observational data. In particular, we propose new methods to estimate the effect of multiple simultaneous interventions (e.g., multiple gene knockouts), under the assumption that the observational data come from an unknown linear structural equation model with independent errors. We derive asymptotic variances of our estimators when the underlying causal structure is partly known, as well as high-dimensional consistency when the causal structure is fully unknown and the joint distribution is multivariate Gaussian. We also propose a generalization of our methodology to the class of nonparanormal distributions. We evaluate the estimators in simulation studies and also illustrate them on data from the DREAM4 challenge.
</p>projecteuclid.org/euclid.aos/1494921953_20170516040557Tue, 16 May 2017 04:05 EDTIdentifiability of restricted latent class models with binary responseshttp://projecteuclid.org/euclid.aos/1494921954<strong>Gongjun Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 675--707.</p><p><strong>Abstract:</strong><br/>
Statistical latent class models are widely used in social and psychological researches, yet it is often difficult to establish the identifiability of the model parameters. In this paper, we consider the identifiability issue of a family of restricted latent class models, where the restriction structures are needed to reflect pre-specified assumptions on the related assessment. We establish the identifiability results in the strict sense and specify which types of restriction structure would give the identifiability of the model parameters. The results not only guarantee the validity of many of the popularly used models, but also provide a guideline for the related experimental design, where in the current applications the design is usually experience based and identifiability is not guaranteed. Theoretically, we develop a new technique to establish the identifiability result, which may be extended to other restricted latent class models.
</p>projecteuclid.org/euclid.aos/1494921954_20170516040557Tue, 16 May 2017 04:05 EDTA Bernstein-type inequality for some mixing processes and dynamical systems with an application to learninghttp://projecteuclid.org/euclid.aos/1494921955<strong>Hanyuan Hang</strong>, <strong>Ingo Steinwart</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 708--743.</p><p><strong>Abstract:</strong><br/>
We establish a Bernstein-type inequality for a class of stochastic processes that includes the classical geometrically $\phi$-mixing processes, Rio’s generalization of these processes and many time-discrete dynamical systems. Modulo a logarithmic factor and some constants, our Bernstein-type inequality coincides with the classical Bernstein inequality for i.i.d. data. We further use this new Bernstein-type inequality to derive an oracle inequality for generic regularized empirical risk minimization algorithms and data generated by such processes. Applying this oracle inequality to support vector machines using the Gaussian kernels for binary classification, we obtain essentially the same rate as for i.i.d. processes, and for least squares and quantile regression; it turns out that the resulting learning rates match, up to some arbitrarily small extra term in the exponent, the optimal rates for i.i.d. processes.
</p>projecteuclid.org/euclid.aos/1494921955_20170516040557Tue, 16 May 2017 04:05 EDTConsistency of likelihood estimation for Gibbs point processeshttp://projecteuclid.org/euclid.aos/1494921956<strong>David Dereudre</strong>, <strong>Frédéric Lavancier</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 744--770.</p><p><strong>Abstract:</strong><br/>
Strong consistency of the maximum likelihood estimator (MLE) for parametric Gibbs point process models is established. The setting is very general. It includes pairwise pair potentials, finite and infinite multibody interactions and geometrical interactions, where the range can be finite or infinite. The Gibbs interaction may depend linearly or nonlinearly on the parameters, a particular case being hardcore parameters and interaction range parameters. As important examples, we deduce the consistency of the MLE for all parameters of the Strauss model, the hardcore Strauss model, the Lennard–Jones model and the area-interaction model.
</p>projecteuclid.org/euclid.aos/1494921956_20170516040557Tue, 16 May 2017 04:05 EDTTests for high-dimensional data based on means, spatial signs and spatial rankshttp://projecteuclid.org/euclid.aos/1494921957<strong>Anirvan Chakraborty</strong>, <strong>Probal Chaudhuri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 771--799.</p><p><strong>Abstract:</strong><br/>
Tests based on mean vectors and spatial signs and ranks for a zero mean in one-sample problems and for the equality of means in two-sample problems have been studied in the recent literature for high-dimensional data with the dimension larger than the sample size. For the above testing problems, we show that under suitable sequences of alternatives, the powers of the mean-based tests and the tests based on spatial signs and ranks tend to be same as the data dimension tends to infinity for any sample size when the coordinate variables satisfy appropriate mixing conditions. Further, their limiting powers do not depend on the heaviness of the tails of the distributions. This is in striking contrast to the asymptotic results obtained in the classical multivariate setting. On the other hand, we show that in the presence of stronger dependence among the coordinate variables, the spatial-sign- and rank-based tests for high-dimensional data can be asymptotically more powerful than the mean-based tests if, in addition to the data dimension, the sample size also tends to infinity. The sizes of some mean-based tests for high-dimensional data studied in the recent literature are observed to be significantly different from their nominal levels. This is due to the inadequacy of the asymptotic approximations used for the distributions of those test statistics. However, our asymptotic approximations for the tests based on spatial signs and ranks are observed to work well when the tests are applied on a variety of simulated and real datasets.
</p>projecteuclid.org/euclid.aos/1494921957_20170516040557Tue, 16 May 2017 04:05 EDTInference on the mode of weak directional signals: A Le Cam perspective on hypothesis testing near singularitieshttp://projecteuclid.org/euclid.aos/1494921958<strong>Davy Paindaveine</strong>, <strong>Thomas Verdebout</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 800--832.</p><p><strong>Abstract:</strong><br/>
We revisit, in an original and challenging perspective, the problem of testing the null hypothesis that the mode of a directional signal is equal to a given value. Motivated by a real data example where the signal is weak, we consider this problem under asymptotic scenarios for which the signal strength goes to zero at an arbitrary rate $\eta_{n}$. Both under the null and the alternative, we focus on rotationally symmetric distributions. We show that, while they are asymptotically equivalent under fixed signal strength, the classical Wald and Watson tests exhibit very different (null and nonnull) behaviours when the signal becomes arbitrarily weak. To fully characterize how challenging the problem is as a function of $\eta_{n}$, we adopt a Le Cam, convergence-of-statistical-experiments, point of view and show that the resulting limiting experiments crucially depend on $\eta_{n}$. In the light of these results, the Watson test is shown to be adaptively rate-consistent and essentially adaptively Le Cam optimal. Throughout, our theoretical findings are illustrated via Monte-Carlo simulations. The practical relevance of our results is also shown on the real data example that motivated the present work.
</p>projecteuclid.org/euclid.aos/1494921958_20170516040557Tue, 16 May 2017 04:05 EDTAsymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimatorhttp://projecteuclid.org/euclid.aos/1494921959<strong>Judith Rousseau</strong>, <strong>Botond Szabo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 833--865.</p><p><strong>Abstract:</strong><br/>
We consider the asymptotic behaviour of the marginal maximum likelihood empirical Bayes posterior distribution in general setting. First, we characterize the set where the maximum marginal likelihood estimator is located with high probability. Then we provide oracle type of upper and lower bounds for the contraction rates of the empirical Bayes posterior. We also show that the hierarchical Bayes posterior achieves the same contraction rate as the maximum marginal likelihood empirical Bayes posterior. We demonstrate the applicability of our general results for various models and prior distributions by deriving upper and lower bounds for the contraction rates of the corresponding empirical and hierarchical Bayes posterior distributions.
</p>projecteuclid.org/euclid.aos/1494921959_20170516040557Tue, 16 May 2017 04:05 EDTStatistical consistency and asymptotic normality for high-dimensional robust $M$-estimatorshttp://projecteuclid.org/euclid.aos/1494921960<strong>Po-Ling Loh</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 866--896.</p><p><strong>Abstract:</strong><br/>
We study theoretical properties of regularized robust $M$-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an $\ell_{1}$-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex $M$-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex $M$-estimator to achieve consistency and a nonconvex $M$-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.
</p>projecteuclid.org/euclid.aos/1494921960_20170516040557Tue, 16 May 2017 04:05 EDTInteraction pursuit in high-dimensional multi-response regression via distance correlationhttp://projecteuclid.org/euclid.aos/1494921961<strong>Yinfei Kong</strong>, <strong>Daoji Li</strong>, <strong>Yingying Fan</strong>, <strong>Jinchi Lv</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 2, 897--922.</p><p><strong>Abstract:</strong><br/>
Feature interactions can contribute to a large proportion of variation in many prediction models. In the era of big data, the coexistence of high dimensionality in both responses and covariates poses unprecedented challenges in identifying important interactions. In this paper, we suggest a two-stage interaction identification method, called the interaction pursuit via distance correlation (IPDC), in the setting of high-dimensional multi-response interaction models that exploits feature screening applied to transformed variables with distance correlation followed by feature selection. Such a procedure is computationally efficient, generally applicable beyond the heredity assumption, and effective even when the number of responses diverges with the sample size. Under mild regularity conditions, we show that this method enjoys nice theoretical properties including the sure screening property, support union recovery and oracle inequalities in prediction and estimation for both interactions and main effects. The advantages of our method are supported by several simulation studies and real data analysis.
</p>projecteuclid.org/euclid.aos/1494921961_20170516040557Tue, 16 May 2017 04:05 EDTMinimax estimation of linear and quadratic functionals on sparsity classeshttp://projecteuclid.org/euclid.aos/1497319684<strong>Olivier Collier</strong>, <strong>Laëtitia Comminges</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 923--958.</p><p><strong>Abstract:</strong><br/>
For the Gaussian sequence model, we obtain nonasymptotic minimax rates of estimation of the linear, quadratic and the $\ell_{2}$-norm functionals on classes of sparse vectors and construct optimal estimators that attain these rates. The main object of interest is the class $B_{0}(s)$ of $s$-sparse vectors $\theta=(\theta_{1},\dots,\theta_{d})$, for which we also provide completely adaptive estimators (independent of $s$ and of the noise variance $\sigma $) having logarithmically slower rates than the minimax ones. Furthermore, we obtain the minimax rates on the $\ell_{q}$-balls $B_{q}(r)=\{\theta\in\mathbb{R}^{d}:\|\theta\|_{q}\le r\}$ where $0<q\le2$, and $\|\theta\|_{q}=(\sum_{i=1}^{d}|\theta_{i}|^{q})^{1/q}$. This analysis shows that there are, in general, three zones in the rates of convergence that we call the sparse zone, the dense zone and the degenerate zone, while a fourth zone appears for estimation of the quadratic functional. We show that, as opposed to estimation of $\theta$, the correct logarithmic terms in the optimal rates for the sparse zone scale as $\log(d/s^{2})$ and not as $\log(d/s)$. For the class $B_{0}(s)$, the rates of estimation of the linear functional and of the $\ell_{2}$-norm have a simple elbow at $s=\sqrt{d}$ (boundary between the sparse and the dense zones) and exhibit similar performances, whereas the estimation of the quadratic functional $Q(\theta)$ reveals more complex effects: the minimax risk on $B_{0}(s)$ is infinite and the sparseness assumption needs to be combined with a bound on the $\ell_{2}$-norm. Finally, we apply our results on estimation of the $\ell_{2}$-norm to the problem of testing against sparse alternatives. In particular, we obtain a nonasymptotic analog of the Ingster–Donoho–Jin theory revealing some effects that were not captured by the previous asymptotic analysis.
</p>projecteuclid.org/euclid.aos/1497319684_20170612220820Mon, 12 Jun 2017 22:08 EDTSupport consistency of direct sparse-change learning in Markov networkshttp://projecteuclid.org/euclid.aos/1497319685<strong>Song Liu</strong>, <strong>Taiji Suzuki</strong>, <strong>Raissa Relator</strong>, <strong>Jun Sese</strong>, <strong>Masashi Sugiyama</strong>, <strong>Kenji Fukumizu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 959--990.</p><p><strong>Abstract:</strong><br/>
We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models. In this paper, we give sufficient conditions for successful change detection with respect to the sample size $n_{p},n_{q}$, the dimension of data $m$ and the number of changed edges $d$. When using an unbounded density ratio model, we prove that the true sparse changes can be consistently identified for $n_{p}=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ and $n_{q}=\Omega({n_{p}^{2}})$, with an exponentially decaying upper-bound on learning error. Such sample complexity can be improved to $\min(n_{p},n_{q})=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ when the boundedness of the density ratio model is assumed. Our theoretical guarantee can be applied to a wide range of discrete/continuous Markov networks.
</p>projecteuclid.org/euclid.aos/1497319685_20170612220820Mon, 12 Jun 2017 22:08 EDTRandomized sketches for kernels: Fast and optimal nonparametric regressionhttp://projecteuclid.org/euclid.aos/1497319686<strong>Yun Yang</strong>, <strong>Mert Pilanci</strong>, <strong>Martin J. Wainwright</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 991--1023.</p><p><strong>Abstract:</strong><br/>
Kernel ridge regression (KRR) is a standard method for performing nonparametric regression over reproducing kernel Hilbert spaces. Given $n$ samples, the time and space complexity of computing the KRR estimate scale as $\mathcal{O}(n^{3})$ and $\mathcal{O}(n^{2})$, respectively, and so is prohibitive in many cases. We propose approximations of KRR based on $m$-dimensional randomized sketches of the kernel matrix, and study how small the projection dimension $m$ can be chosen while still preserving minimax optimality of the approximate KRR estimate. For various classes of randomized sketches, including those based on Gaussian and randomized Hadamard matrices, we prove that it suffices to choose the sketch dimension $m$ proportional to the statistical dimension (modulo logarithmic factors). Thus, we obtain fast and minimax optimal approximations to the KRR estimate for nonparametric regression. In doing so, we prove a novel lower bound on the minimax risk of kernel regression in terms of the localized Rademacher complexity.
</p>projecteuclid.org/euclid.aos/1497319686_20170612220820Mon, 12 Jun 2017 22:08 EDTTesting uniformity on high-dimensional spheres against monotone rotationally symmetric alternativeshttp://projecteuclid.org/euclid.aos/1497319687<strong>Christine Cutting</strong>, <strong>Davy Paindaveine</strong>, <strong>Thomas Verdebout</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1024--1058.</p><p><strong>Abstract:</strong><br/>
We consider the problem of testing uniformity on high-dimensional unit spheres. We are primarily interested in nonnull issues. We show that rotationally symmetric alternatives lead to two Local Asymptotic Normality (LAN) structures. The first one is for fixed modal location $\mathbf{{\theta}}$ and allows to derive locally asymptotically most powerful tests under specified $\mathbf{{\theta}}$. The second one, that addresses the Fisher–von Mises–Langevin (FvML) case, relates to the unspecified-$\mathbf{{\theta}}$ problem and shows that the high-dimensional Rayleigh test is locally asymptotically most powerful invariant. Under mild assumptions, we derive the asymptotic nonnull distribution of this test, which allows to extend away from the FvML case the asymptotic powers obtained there from Le Cam’s third lemma. Throughout, we allow the dimension $p$ to go to infinity in an arbitrary way as a function of the sample size $n$. Some of our results also strengthen the local optimality properties of the Rayleigh test in low dimensions. We perform a Monte Carlo study to illustrate our asymptotic results. Finally, we treat an application related to testing for sphericity in high dimensions.
</p>projecteuclid.org/euclid.aos/1497319687_20170612220820Mon, 12 Jun 2017 22:08 EDTNonlinear sufficient dimension reduction for functional datahttp://projecteuclid.org/euclid.aos/1497319688<strong>Bing Li</strong>, <strong>Jun Song</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1059--1095.</p><p><strong>Abstract:</strong><br/>
We propose a general theory and the estimation procedures for nonlinear sufficient dimension reduction where both the predictor and the response may be random functions. The relation between the response and predictor can be arbitrary and the sets of observed time points can vary from subject to subject. The functional and nonlinear nature of the problem leads to construction of two functional spaces: the first representing the functional data, assumed to be a Hilbert space, and the second characterizing nonlinearity, assumed to be a reproducing kernel Hilbert space. A particularly attractive feature of our construction is that the two spaces are nested, in the sense that the kernel for the second space is determined by the inner product of the first. We propose two estimators for this general dimension reduction problem, and establish the consistency and convergence rate for one of them. These asymptotic results are flexible enough to accommodate both fully and partially observed functional data. We investigate the performances of our estimators by simulations, and applied them to data sets about speech recognition and handwritten symbols.
</p>projecteuclid.org/euclid.aos/1497319688_20170612220820Mon, 12 Jun 2017 22:08 EDTNetwork vector autoregressionhttp://projecteuclid.org/euclid.aos/1497319689<strong>Xuening Zhu</strong>, <strong>Rui Pan</strong>, <strong>Guodong Li</strong>, <strong>Yuewen Liu</strong>, <strong>Hansheng Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1096--1123.</p><p><strong>Abstract:</strong><br/>
We consider here a large-scale social network with a continuous response observed for each node at equally spaced time points. The responses from different nodes constitute an ultra-high dimensional vector, whose time series dynamic is to be investigated. In addition, the network structure is also taken into consideration, for which we propose a network vector autoregressive (NAR) model. The NAR model assumes each node’s response at a given time point as a linear combination of (a) its previous value, (b) the average of its connected neighbors, (c) a set of node-specific covariates and (d) an independent noise. The corresponding coefficients are referred to as the momentum effect, the network effect and the nodal effect, respectively. Conditions for strict stationarity of the NAR models are obtained. In order to estimate the NAR model, an ordinary least squares type estimator is developed, and its asymptotic properties are investigated. We further illustrate the usefulness of the NAR model through a number of interesting potential applications. Simulation studies and an empirical example are presented.
</p>projecteuclid.org/euclid.aos/1497319689_20170612220820Mon, 12 Jun 2017 22:08 EDTOn coverage and local radial rates of credible setshttp://projecteuclid.org/euclid.aos/1497319690<strong>Eduard Belitser</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1124--1151.</p><p><strong>Abstract:</strong><br/>
In the mildly ill-posed inverse signal-in-white-noise model, we construct confidence sets as credible balls with respect to the empirical Bayes posterior resulting from a certain two-level hierarchical prior. The quality of the posterior is characterized by the contraction rate which we allow to be local, that is, depending on the parameter. The issue of optimality of the constructed confidence sets is addressed via a trade-off between its “size” (the local radial rate ) and its coverage probability. We introduce excessive bias restriction (EBR), more general than self-similarity and polished tail condition recently studied in the literature. Under EBR, we establish the confidence optimality of our credible set with some local ( oracle ) radial rate. We also derive the oracle estimation inequality and the oracle posterior contraction rate. The obtained local results are more powerful than global: adaptive minimax results for a number of smoothness scales follow as consequence, in particular, the ones considered by Szabó et al. [ Ann. Statist. 43 (2015) 1391–1428].
</p>projecteuclid.org/euclid.aos/1497319690_20170612220820Mon, 12 Jun 2017 22:08 EDTTotal positivity in Markov structureshttp://projecteuclid.org/euclid.aos/1497319691<strong>Shaun Fallat</strong>, <strong>Steffen Lauritzen</strong>, <strong>Kayvan Sadeghi</strong>, <strong>Caroline Uhler</strong>, <strong>Nanny Wermuth</strong>, <strong>Piotr Zwiernik</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1152--1184.</p><p><strong>Abstract:</strong><br/>
We discuss properties of distributions that are multivariate totally positive of order two ($\mathrm{MTP}_{2}$) related to conditional independence. In particular, we show that any independence model generated by an $\mathrm{MTP}_{2}$ distribution is a compositional semi-graphoid which is upward-stable and singleton-transitive. In addition, we prove that any $\mathrm{MTP}_{2}$ distribution satisfying an appropriate support condition is faithful to its concentration graph. Finally, we analyze factorization properties of $\mathrm{MTP}_{2}$ distributions and discuss ways of constructing $\mathrm{MTP}_{2}$ distributions; in particular, we give conditions on the log-linear parameters of a discrete distribution which ensure $\mathrm{MTP}_{2}$ and characterize conditional Gaussian distributions which satisfy $\mathrm{MTP}_{2}$.
</p>projecteuclid.org/euclid.aos/1497319691_20170612220820Mon, 12 Jun 2017 22:08 EDTTests for covariance structures with high-dimensional repeated measurementshttp://projecteuclid.org/euclid.aos/1497319692<strong>Ping-Shou Zhong</strong>, <strong>Wei Lan</strong>, <strong>Peter X. K. Song</strong>, <strong>Chih-Ling Tsai</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1185--1213.</p><p><strong>Abstract:</strong><br/>
In regression analysis with repeated measurements, such as longitudinal data and panel data, structured covariance matrices characterized by a small number of parameters have been widely used and play an important role in parameter estimation and statistical inference. To assess the adequacy of a specified covariance structure, one often adopts the classical likelihood-ratio test when the dimension of the repeated measurements ($p$) is smaller than the sample size ($n$). However, this assessment becomes quite challenging when $p$ is bigger than $n$, since the classical likelihood-ratio test is no longer applicable. This paper proposes an adjusted goodness-of-fit test to examine a broad range of covariance structures under the scenario of “large $p$, small $n$.” Analytical examples are presented to illustrate the effectiveness of the adjustment. In addition, large sample properties of the proposed test are established. Moreover, simulation studies and a real data example are provided to demonstrate the finite sample performance and the practical utility of the test.
</p>projecteuclid.org/euclid.aos/1497319692_20170612220820Mon, 12 Jun 2017 22:08 EDTWeak signal identification and inference in penalized model selectionhttp://projecteuclid.org/euclid.aos/1497319693<strong>Peibei Shi</strong>, <strong>Annie Qu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1214--1253.</p><p><strong>Abstract:</strong><br/>
Weak signal identification and inference are very important in the area of penalized model selection, yet they are underdeveloped and not well studied. Existing inference procedures for penalized estimators are mainly focused on strong signals. In this paper, we propose an identification procedure for weak signals in finite samples, and provide a transition phase in-between noise and strong signal strengths. We also introduce a new two-step inferential method to construct better confidence intervals for the identified weak signals. Our theory development assumes that variables are orthogonally designed. Both theory and numerical studies indicate that the proposed method leads to better confidence coverage for weak signals, compared with those using asymptotic inference. In addition, the proposed method outperforms the perturbation and bootstrap resampling approaches. We illustrate our method for HIV antiretroviral drug susceptibility data to identify genetic mutations associated with HIV drug resistance.
</p>projecteuclid.org/euclid.aos/1497319693_20170612220820Mon, 12 Jun 2017 22:08 EDTSemimartingale detection and goodness-of-fit testshttp://projecteuclid.org/euclid.aos/1497319694<strong>Adam D. Bull</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1254--1283.</p><p><strong>Abstract:</strong><br/>
In quantitative finance, we often fit a parametric semimartingale model to asset prices. To ensure our model is correct, we must then perform goodness-of-fit tests. In this paper, we give a new goodness-of-fit test for volatility-like processes, which is easily applied to a variety of semimartingale models. In each case, we reduce the problem to the detection of a semimartingale observed under noise. In this setting, we then describe a wavelet-thresholding test, which obtains adaptive and near-optimal detection rates.
</p>projecteuclid.org/euclid.aos/1497319694_20170612220820Mon, 12 Jun 2017 22:08 EDTTesting for time-varying jump activity for pure jump semimartingaleshttp://projecteuclid.org/euclid.aos/1497319695<strong>Viktor Todorov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1284--1311.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a test for deciding whether the jump activity index of a discretely observed Itô semimartingale of pure-jump type (i.e., one without a diffusion) varies over a fixed interval of time. The asymptotic setting is based on observations within a fixed time interval with mesh of the observation grid shrinking to zero. The test is derived for semimartingales whose “spot” jump compensator around zero is like that of a stable process, but importantly the stability index can vary over the time interval. The test is based on forming a sequence of local estimators of the jump activity over blocks of shrinking time span and contrasting their variability around a global activity estimator based on the whole data set. The local and global jump activity estimates are constructed from the real part of the empirical characteristic function of the increments of the process scaled by local power variations. We derive the asymptotic distribution of the test statistic under the null hypothesis of constant jump activity and show that the test has asymptotic power of one against fixed alternatives of processes with time-varying jump activity.
</p>projecteuclid.org/euclid.aos/1497319695_20170612220820Mon, 12 Jun 2017 22:08 EDTOperational time and in-sample density forecastinghttp://projecteuclid.org/euclid.aos/1497319696<strong>Young K. Lee</strong>, <strong>Enno Mammen</strong>, <strong>Jens P. Nielsen</strong>, <strong>Byeong U. Park</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1312--1341.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider a new structural model for in-sample density forecasting. In-sample density forecasting is to estimate a structured density on a region where data are observed and then reuse the estimated structured density on some region where data are not observed. Our structural assumption is that the density is a product of one-dimensional functions with one function sitting on the scale of a transformed space of observations. The transformation involves another unknown one-dimensional function, so that our model is formulated via a known smooth function of three underlying unknown one-dimensional functions. We present an innovative way of estimating the one-dimensional functions and show that all the estimators of the three components achieve the optimal one-dimensional rate of convergence. We illustrate how one can use our approach by analyzing a real dataset, and also verify the tractable finite sample performance of the method via a simulation study.
</p>projecteuclid.org/euclid.aos/1497319696_20170612220820Mon, 12 Jun 2017 22:08 EDTAsymptotics of empirical eigenstructure for high dimensional spiked covariancehttp://projecteuclid.org/euclid.aos/1497319697<strong>Weichen Wang</strong>, <strong>Jianqing Fan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 3, 1342--1374.</p><p><strong>Abstract:</strong><br/>
We derive the asymptotic distributions of the spiked eigenvalues and eigenvectors under a generalized and unified asymptotic regime, which takes into account the magnitude of spiked eigenvalues, sample size and dimensionality. This regime allows high dimensionality and diverging eigenvalues and provides new insights into the roles that the leading eigenvalues, sample size and dimensionality play in principal component analysis. Our results are a natural extension of those in [ Statist. Sinica 17 (2007) 1617–1642] to a more general setting and solve the rates of convergence problems in [ Statist. Sinica 26 (2016) 1747–1770]. They also reveal the biases of estimating leading eigenvalues and eigenvectors by using principal component analysis, and lead to a new covariance estimator for the approximate factor model, called Shrinkage Principal Orthogonal complEment Thresholding (S-POET), that corrects the biases. Our results are successfully applied to outstanding problems in estimation of risks for large portfolios and false discovery proportions for dependent test statistics and are illustrated by simulation studies.
</p>projecteuclid.org/euclid.aos/1497319697_20170612220820Mon, 12 Jun 2017 22:08 EDTOn the optimality of Bayesian change-point detectionhttp://projecteuclid.org/euclid.aos/1498636860<strong>Dong Han</strong>, <strong>Fugee Tsung</strong>, <strong>Jinguo Xian</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1375--1402.</p><p><strong>Abstract:</strong><br/>
By introducing suitable loss random variables of detection, we obtain optimal tests in terms of the stopping time or alarm time for Bayesian change-point detection not only for a general prior distribution of change-points but also for observations being a Markov process. Moreover, the optimal (minimal) average detection delay is proved to be equal to $1$ for any (possibly large) average run length to false alarm if the number of possible change-points is finite.
</p>projecteuclid.org/euclid.aos/1498636860_20170628040134Wed, 28 Jun 2017 04:01 EDTComputational and statistical boundaries for submatrix localization in a large noisy matrixhttp://projecteuclid.org/euclid.aos/1498636861<strong>T. Tony Cai</strong>, <strong>Tengyuan Liang</strong>, <strong>Alexander Rakhlin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1403--1430.</p><p><strong>Abstract:</strong><br/>
We study in this paper computational and statistical boundaries for submatrix localization . Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude $\lambda$ and size $k_{m}\times k_{n}$) embedded in a large noise matrix (of size $m\times n$), the goal is to optimal identify the support of the signal submatrix computationally and statistically.
Two transition thresholds for the signal-to-noise ratio $\lambda/\sigma$ are established in terms of $m$, $n$, $k_{m}$ and $k_{n}$. The first threshold, $\sf SNR_{c}$, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold $\sf SNR_{c}$. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis . The second threshold, $\sf SNR_{s}$, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that $\sf SNR_{c}$ is always significantly larger than $\sf SNR_{s}$ under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.
</p>projecteuclid.org/euclid.aos/1498636861_20170628040134Wed, 28 Jun 2017 04:01 EDTTests for separability in nonparametric covariance operators of random surfaceshttp://projecteuclid.org/euclid.aos/1498636862<strong>John A. D. Aston</strong>, <strong>Davide Pigoli</strong>, <strong>Shahin Tavakoli</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1431--1461.</p><p><strong>Abstract:</strong><br/>
The assumption of separability of the covariance operator for a random image or hypersurface can be of substantial use in applications, especially in situations where the accurate estimation of the full covariance structure is unfeasible, either for computational reasons, or due to a small sample size. However, inferential tools to verify this assumption are somewhat lacking in high-dimensional or functional data analysis settings, where this assumption is most relevant. We propose here to test separability by focusing on $K$-dimensional projections of the difference between the covariance operator and a nonparametric separable approximation. The subspace we project onto is one generated by the eigenfunctions of the covariance operator estimated under the separability hypothesis, negating the need to ever estimate the full nonseparable covariance. We show that the rescaled difference of the sample covariance operator with its separable approximation is asymptotically Gaussian. As a by-product of this result, we derive asymptotically pivotal tests under Gaussian assumptions, and propose bootstrap methods for approximating the distribution of the test statistics. We probe the finite sample performance through simulations studies, and present an application to log-spectrogram images from a phonetic linguistics dataset.
</p>projecteuclid.org/euclid.aos/1498636862_20170628040134Wed, 28 Jun 2017 04:01 EDTIdentification of universally optimal circular designs for the interference modelhttp://projecteuclid.org/euclid.aos/1498636863<strong>Wei Zheng</strong>, <strong>Mingyao Ai</strong>, <strong>Kang Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1462--1487.</p><p><strong>Abstract:</strong><br/>
Many applications of block designs exhibit neighbor and edge effects. A popular remedy is to use the circular design coupled with the interference model. The search for optimal or efficient designs has been intensively studied in recent years. The circular neighbor balanced designs at distances 1 and 2 (CNBD2), including orthogonal array of type I ($\mathrm{OA}_{I}$) of strength $2$, are the two major designs proposed in literature for the purpose of estimating the direct treatment effects. They are shown to be optimal within some reasonable subclasses of designs. By using benchmark designs in approximate design theory, we show that CNBD2 is highly efficient among all possible designs when the error terms are homoscedastic and uncorrelated. However, when the error terms are correlated, these designs will be outperformed significantly by other designs. Note that CNBD2 fall into the special catalog of pseudo symmetric designs, and they only exist when the number of treatments is larger than the block size and the number of blocks is multiple of some constants. In this paper, we elaborate equivalent conditions for any design, pseudo symmetric or not, to be universally optimal for any size of experiment and any covariance structure of the error terms. This result is novel for circular designs and sheds light on other similar models in the search for optimal or efficient asymmetric designs.
</p>projecteuclid.org/euclid.aos/1498636863_20170628040134Wed, 28 Jun 2017 04:01 EDTCo-clustering of nonsmooth graphonshttp://projecteuclid.org/euclid.aos/1498636864<strong>David Choi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1488--1515.</p><p><strong>Abstract:</strong><br/>
Performance bounds are given for exploratory co-clustering/blockmodeling of bipartite graph data, where we assume the rows and columns of the data matrix are samples from an arbitrary population. This is equivalent to assuming that the data is generated from a nonsmooth graphon. It is shown that co-clusters found by any method can be extended to the row and column populations, or equivalently that the estimated blockmodel approximates a blocked version of the generative graphon, with estimation error bounded by $O_{P}(n^{-1/2})$. Analogous performance bounds are also given for degree-corrected blockmodels and random dot product graphs, with error rates depending on the dimensionality of the latent variable space.
</p>projecteuclid.org/euclid.aos/1498636864_20170628040134Wed, 28 Jun 2017 04:01 EDTMinimax theory of estimation of linear functionals of the deconvolution density with or without sparsityhttp://projecteuclid.org/euclid.aos/1498636865<strong>Marianna Pensky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1516--1541.</p><p><strong>Abstract:</strong><br/>
The present paper considers the problem of estimating a linear functional $\Phi=\int_{-\infty}^{\infty}\varphi(x)f(x)\,dx$ of an unknown deconvolution density $f$ on the basis of $n$ i.i.d. observations, $Y_{1},\ldots,Y_{n}$ of $Y=\theta+\xi$, where $\xi$ has a known pdf $g$, and $f$ is the pdf of $\theta$. The objective of the present paper is to develop the a general minimax theory of estimating $\Phi$, and to relate this problem to estimation of functionals $\Phi_{n}=n^{-1}\sum_{i=1}^{n}\varphi(\theta_{i})$ in indirect observations. In particular, we offer a general, Fourier transform based approach to estimation of $\Phi$ (and $\Phi_{n}$) and derive upper and minimax lower bounds for the risk for an arbitrary square integrable function $\varphi$. Furthermore, using technique of inversion formulas, we extend the theory to a number of situations when the Fourier transform of $\varphi$ does not exist, but $\Phi$ can be presented as a functional of the Fourier transform of $f$ and its derivatives. The latter enables us to construct minimax estimators of the functionals that have never been handled before such as the odd absolute moments and the generalized moments of the deconvolution density. Finally, we generalize our results to the situation when the vector $\mathbf{{\theta}}$ is sparse and the objective is estimating $\Phi$ (or $\Phi_{n}$) over the nonzero components only. As a direct application of the proposed theory, we automatically recover multiple recent results and obtain a variety of new ones such as, for example, estimation of the mixing probability density function with classical and Berkson errors and estimation of the $(2M+1)$-th absolute moment of the deconvolution density.
</p>projecteuclid.org/euclid.aos/1498636865_20170628040134Wed, 28 Jun 2017 04:01 EDTNonparametric change-point analysis of volatilityhttp://projecteuclid.org/euclid.aos/1498636866<strong>Markus Bibinger</strong>, <strong>Moritz Jirak</strong>, <strong>Mathias Vetter</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1542--1578.</p><p><strong>Abstract:</strong><br/>
In this work, we develop change-point methods for statistics of high-frequency data. The main interest is in the volatility of an Itô semimartingale, the latter being discretely observed over a fixed time horizon. We construct a minimax-optimal test to discriminate continuous paths from paths with volatility jumps, and it is shown that the test can be embedded into a more general theory to infer the smoothness of volatilities. In a high-frequency setting, we prove weak convergence of the test statistic under the hypothesis to an extreme value distribution. Moreover, we develop methods to infer changes in the Hurst parameters of fractional volatility processes. A simulation study is conducted to demonstrate the performance of our methods in finite-sample applications.
</p>projecteuclid.org/euclid.aos/1498636866_20170628040134Wed, 28 Jun 2017 04:01 EDTA new approach to optimal designs for correlated observationshttp://projecteuclid.org/euclid.aos/1498636867<strong>Holger Dette</strong>, <strong>Maria Konstantinou</strong>, <strong>Anatoly Zhigljavsky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1579--1608.</p><p><strong>Abstract:</strong><br/>
This paper presents a new and efficient method for the construction of optimal designs for regression models with dependent error processes. In contrast to most of the work in this field, which starts with a model for a finite number of observations and considers the asymptotic properties of estimators and designs as the sample size converges to infinity, our approach is based on a continuous time model. We use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in this model. Based on the BLUE, we construct an efficient linear estimator and corresponding optimal designs in the model for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator) and the optimal design points, in particular in the multiparameter case.
In contrast to previous work on the subject, the resulting estimators and corresponding optimal designs are very efficient and easy to implement. This means that they are practically not distinguishable from the weighted least squares estimator and the corresponding optimal designs, which have to be found numerically by nonconvex discrete optimization. The advantages of the new approach are illustrated in several numerical examples.
</p>projecteuclid.org/euclid.aos/1498636867_20170628040134Wed, 28 Jun 2017 04:01 EDTRare-event analysis for extremal eigenvalues of white Wishart matriceshttp://projecteuclid.org/euclid.aos/1498636868<strong>Tiefeng Jiang</strong>, <strong>Kevin Leder</strong>, <strong>Gongjun Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1609--1637.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider the extreme behavior of the extremal eigenvalues of white Wishart matrices, which plays an important role in multivariate analysis. In particular, we focus on the case when the dimension of the feature $p$ is much larger than or comparable to the number of observations $n$, a common situation in modern data analysis. We provide asymptotic approximations and bounds for the tail probabilities of the extremal eigenvalues. Moreover, we construct efficient Monte Carlo simulation algorithms to compute the tail probabilities. Simulation results show that our method has the best performance among known approximation approaches, and furthermore provides an efficient and accurate way for evaluating the tail probabilities in practice.
</p>projecteuclid.org/euclid.aos/1498636868_20170628040134Wed, 28 Jun 2017 04:01 EDTRobust discrimination designs over Hellinger neighbourhoodshttp://projecteuclid.org/euclid.aos/1498636869<strong>Rui Hu</strong>, <strong>Douglas P. Wiens</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1638--1663.</p><p><strong>Abstract:</strong><br/>
To aid in the discrimination between two, possibly nonlinear, regression models, we study the construction of experimental designs. Considering that each of these two models might be only approximately specified, robust “maximin” designs are proposed. The rough idea is as follows. We impose neighbourhood structures on each regression response, to describe the uncertainty in the specifications of the true underlying models. We determine the least favourable—in terms of Kullback–Leibler divergence—members of these neighbourhoods. Optimal designs are those maximizing this minimum divergence. Sequential, adaptive approaches to this maximization are studied. Asymptotic optimality is established.
</p>projecteuclid.org/euclid.aos/1498636869_20170628040134Wed, 28 Jun 2017 04:01 EDTNonparametric Bayesian posterior contraction rates for discretely observed scalar diffusionshttp://projecteuclid.org/euclid.aos/1498636870<strong>Richard Nickl</strong>, <strong>Jakob Söhl</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1664--1693.</p><p><strong>Abstract:</strong><br/>
We consider nonparametric Bayesian inference in a reflected diffusion model $dX_{t}=b(X_{t})\,dt+\sigma(X_{t})\,dW_{t}$, with discretely sampled observations $X_{0},X_{\Delta},\ldots,X_{n\Delta}$. We analyse the nonlinear inverse problem corresponding to the “low frequency sampling” regime where $\Delta>0$ is fixed and $n\to\infty$. A general theorem is proved that gives conditions for prior distributions $\Pi$ on the diffusion coefficient $\sigma$ and the drift function $b$ that ensure minimax optimal contraction rates of the posterior distribution over Hölder–Sobolev smoothness classes. These conditions are verified for natural examples of nonparametric random wavelet series priors. For the proofs, we derive new concentration inequalities for empirical processes arising from discretely observed diffusions that are of independent interest.
</p>projecteuclid.org/euclid.aos/1498636870_20170628040134Wed, 28 Jun 2017 04:01 EDTAsymptotic and finite-sample properties of estimators based on stochastic gradientshttp://projecteuclid.org/euclid.aos/1498636871<strong>Panos Toulis</strong>, <strong>Edoardo M. Airoldi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1694--1727.</p><p><strong>Abstract:</strong><br/>
Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.
</p>projecteuclid.org/euclid.aos/1498636871_20170628040134Wed, 28 Jun 2017 04:01 EDTFunctional central limit theorems for single-stage sampling designshttp://projecteuclid.org/euclid.aos/1498636872<strong>Hélène Boistard</strong>, <strong>Hendrik P. Lopuhaä</strong>, <strong>Anne Ruiz-Gazen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1728--1758.</p><p><strong>Abstract:</strong><br/>
For a joint model-based and design-based inference, we establish functional central limit theorems for the Horvitz–Thompson empirical process and the Hájek empirical process centered by their finite population mean as well as by their super-population mean in a survey sampling framework. The results apply to single-stage unequal probability sampling designs and essentially only require conditions on higher order correlations. We apply our main results to a Hadamard differentiable statistical functional and illustrate its limit behavior by means of a computer simulation.
</p>projecteuclid.org/euclid.aos/1498636872_20170628040134Wed, 28 Jun 2017 04:01 EDTAsymptotic normality of scrambled geometric net quadraturehttp://projecteuclid.org/euclid.aos/1498636873<strong>Kinjal Basu</strong>, <strong>Rajarshi Mukherjee</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1759--1788.</p><p><strong>Abstract:</strong><br/>
In a very recent work, Basu and Owen [ Found. Comput. Math. 17 (2017) 467–496] propose the use of scrambled geometric nets in numerical integration when the domain is a product of $s$ arbitrary spaces of dimension $d$ having a certain partitioning constraint. It was shown that for a class of smooth functions, the integral estimate has variance $O(n^{-1-2/d}(\log n)^{s-1})$ for scrambled geometric nets compared to $O(n^{-1})$ for ordinary Monte Carlo. The main idea of this paper is to expand on the work by Loh [ Ann. Statist. 31 (2003) 1282–1324] to show that the scrambled geometric net estimate has an asymptotic normal distribution for certain smooth functions defined on products of suitable subsets of $\mathbb{R}^{d}$.
</p>projecteuclid.org/euclid.aos/1498636873_20170628040134Wed, 28 Jun 2017 04:01 EDTYule’s “nonsense correlation” solved!http://projecteuclid.org/euclid.aos/1498636874<strong>Philip A. Ernst</strong>, <strong>Larry A. Shepp</strong>, <strong>Abraham J. Wyner</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1789--1809.</p><p><strong>Abstract:</strong><br/>
In this paper, we resolve a longstanding open statistical problem. The problem is to mathematically prove Yule’s 1926 empirical finding of “nonsense correlation” [ J. Roy. Statist. Soc. 89 (1926) 1–63], which we do by analytically determining the second moment of the empirical correlation coefficient \begin{eqnarray*}&&\theta:=\frac{\int_{0}^{1}W_{1}(t)W_{2}(t)\,dt-\int_{0}^{1}W_{1}(t)\,dt\int_{0}^{1}W_{2}(t)\,dt}{\sqrt{\int_{0}^{1}W^{2}_{1}(t)\,dt-(\int_{0}^{1}W_{1}(t)\,dt)^{2}}\sqrt{\int_{0}^{1}W^{2}_{2}(t)\,dt-(\int_{0}^{1}W_{2}(t)\,dt)^{2}}},\end{eqnarray*} of two independent Wiener processes, $W_{1},W_{2}$. Using tools from Fredholm integral equation theory, we successfully calculate the second moment of $\theta$ to obtain a value for the standard deviation of $\theta$ of nearly 0.5. The “nonsense” correlation, which we call “volatile” correlation, is volatile in the sense that its distribution is heavily dispersed and is frequently large in absolute value. It is induced because each Wiener process is “self-correlated” in time. This is because a Wiener process is an integral of pure noise, and thus its values at different time points are correlated. In addition to providing an explicit formula for the second moment of $\theta$, we offer implicit formulas for higher moments of $\theta$.
</p>projecteuclid.org/euclid.aos/1498636874_20170628040134Wed, 28 Jun 2017 04:01 EDTSharp detection in PCA under correlations: All eigenvalues matterhttp://projecteuclid.org/euclid.aos/1498636875<strong>Edgar Dobriban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 45, Number 4, 1810--1833.</p><p><strong>Abstract:</strong><br/>
Principal component analysis (PCA) is a widely used method for dimension reduction. In high-dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over all eigenvalues—were recently shown to achieve some power.
We consider a “local alternatives” model for the spectrum of covariance matrices that allows a general correlation structure. We develop new tests to detect PCs in this model. While the top eigenvalue contains little information, due to the strong correlations between the eigenvalues we can detect weak PCs by averaging over all eigenvalues using LSS. We show that it is possible to find the optimal LSS, by solving a certain integral equation. To solve this equation, we develop efficient algorithms that build on our recent method for computing the limit empirical spectrum [Dobriban (2015)]. The solvability of this equation also presents a new perspective on phase transitions in spiked models.
</p>projecteuclid.org/euclid.aos/1498636875_20170628040134Wed, 28 Jun 2017 04:01 EDT