The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTBayesian estimation of sparse signals with a continuous spike-and-slab priorhttps://projecteuclid.org/euclid.aos/1519268435<strong>Veronika Ročková</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 401--437.</p><p><strong>Abstract:</strong><br/>
We introduce a new framework for estimation of sparse normal means, bridging the gap between popular frequentist strategies (LASSO) and popular Bayesian strategies (spike-and-slab). The main thrust of this paper is to introduce the family of Spike-and-Slab LASSO (SS-LASSO) priors, which form a continuum between the Laplace prior and the point-mass spike-and-slab prior. We establish several appealing frequentist properties of SS-LASSO priors, contrasting them with these two limiting cases. First, we adopt the penalized likelihood perspective on Bayesian modal estimation and introduce the framework of Bayesian penalty mixing with spike-and-slab priors. We show that the SS-LASSO global posterior mode is (near) minimax rate-optimal under squared error loss, similarly as the LASSO. Going further, we introduce an adaptive two-step estimator which can achieve provably sharper performance than the LASSO. Second, we show that the whole posterior keeps pace with the global mode and concentrates at the (near) minimax rate, a property that is known \textsl{not to hold} for the single Laplace prior. The minimax-rate optimality is obtained with a suitable class of independent product priors (for known levels of sparsity) as well as with dependent mixing priors (adapting to the unknown levels of sparsity). Up to now, the rate-optimal posterior concentration has been established only for spike-and-slab priors with a point mass at zero. Thus, the SS-LASSO priors, despite being continuous, possess similar optimality properties as the “theoretically ideal” point-mass mixtures. These results provide valuable theoretical justification for our proposed class of priors, underpinning their intuitive appeal and practical potential.
</p>projecteuclid.org/euclid.aos/1519268435_20180221220038Wed, 21 Feb 2018 22:00 ESTOn the asymptotic theory of new bootstrap confidence boundshttps://projecteuclid.org/euclid.aos/1519268436<strong>Charl Pretorius</strong>, <strong>Jan W. H. Swanepoel</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 1, 438--456.</p><p><strong>Abstract:</strong><br/>
We propose a new method, based on sample splitting, for constructing bootstrap confidence bounds for a parameter appearing in the regular smooth function model. It has been demonstrated in the literature, for example, by Hall [ Ann. Statist. 16 (1988) 927–985; The Bootstrap and Edgeworth Expansion (1992) Springer], that the well-known percentile-$t$ method for constructing bootstrap confidence bounds typically incurs a coverage error of order $O(n^{-1})$, with $n$ being the sample size. Our version of the percentile-$t$ bound reduces this coverage error to order $O(n^{-3/2})$ and in some cases to $O(n^{-2})$. Furthermore, whereas the standard percentile bounds typically incur coverage error of $O(n^{-1/2})$, the new bounds have reduced error of $O(n^{-1})$. In the case where the parameter of interest is the population mean, we derive for each confidence bound the exact coefficient of the leading term in an asymptotic expansion of the coverage error, although similar results may be obtained for other parameters such as the variance, the correlation coefficient, and the ratio of two means. We show that equal-tailed confidence intervals with coverage error at most $O(n^{-2})$ may be obtained from the newly proposed bounds, as opposed to the typical error $O(n^{-1})$ of the standard intervals. It is also shown that the good properties of the new percentile-$t$ method carry over to regression problems. Results of independent interest are derived, such as a generalisation of a delta method by Cramér [ Mathematical Methods of Statistics (1946) Princeton Univ. Press] and Hurt [ Apl. Mat. 21 (1976) 444–456], and an expression for a polynomial appearing in an Edgeworth expansion of the distribution of a Studentised statistic for the slope parameter in a regression model. A small simulation study illustrates the behavior of the confidence bounds for small to moderate sample sizes.
</p>projecteuclid.org/euclid.aos/1519268436_20180221220038Wed, 21 Feb 2018 22:00 ESTStrong orthogonal arrays of strength two plushttps://projecteuclid.org/euclid.aos/1522742425<strong>Yuanzhen He</strong>, <strong>Ching-Shui Cheng</strong>, <strong>Boxin Tang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 457--468.</p><p><strong>Abstract:</strong><br/>
Strong orthogonal arrays were recently introduced and studied in He and Tang [ Biometrika 100 (2013) 254–260] as a class of space-filling designs for computer experiments. To enjoy the benefits of better space-filling properties, when compared to ordinary orthogonal arrays, strong orthogonal arrays need to have strength three or higher, which may require run sizes that are too large for experimenters to afford. To address this problem, we introduce a new class of arrays, called strong orthogonal arrays of strength two plus. These arrays, while being more economical than strong orthogonal arrays of strength three, still enjoy the better two-dimensional space-filling property of the latter. Among the many results we have obtained on the characterizations and constructions of strong orthogonal arrays of strength two plus, worth special mention is their intimate connection with second-order saturated designs.
</p>projecteuclid.org/euclid.aos/1522742425_20180403220226Tue, 03 Apr 2018 22:02 EDTStatistical inference for spatial statistics defined in the Fourier domainhttps://projecteuclid.org/euclid.aos/1522742426<strong>Suhasini Subba Rao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 469--499.</p><p><strong>Abstract:</strong><br/>
A class of Fourier based statistics for irregular spaced spatial data is introduced. Examples include the Whittle likelihood, a parametric estimator of the covariance function based on the $L_{2}$-contrast function and a simple nonparametric estimator of the spatial autocovariance which is a nonnegative function. The Fourier based statistic is a quadratic form of a discrete Fourier-type transform of the spatial data. Evaluation of the statistic is computationally tractable, requiring $O(nb^{})$ operations, where $b$ are the number of Fourier frequencies used in the definition of the statistic and $n$ is the sample size. The asymptotic sampling properties of the statistic are derived using both increasing domain and fixed-domain spatial asymptotics. These results are used to construct a statistic which is asymptotically pivotal.
</p>projecteuclid.org/euclid.aos/1522742426_20180403220226Tue, 03 Apr 2018 22:02 EDTOn the inference about the spectral distribution of high-dimensional covariance matrix based on high-frequency noisy observationshttps://projecteuclid.org/euclid.aos/1522742427<strong>Ningning Xia</strong>, <strong>Xinghua Zheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 500--525.</p><p><strong>Abstract:</strong><br/>
In practice, observations are often contaminated by noise, making the resulting sample covariance matrix a signal-plus-noise sample covariance matrix. Aiming to make inferences about the spectral distribution of the population covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (signal) sample covariance matrices depends on that of signal-plus-noise-type sample covariance matrices. As an application, we consider inferences about the spectral distribution of integrated covolatility (ICV) matrices of high-dimensional diffusion processes based on high-frequency data with microstructure noise. The (slightly modified) pre-averaging estimator is a signal-plus-noise sample covariance matrix, and the aforementioned result, together with a (generalized) connection between the spectral distribution of signal sample covariance matrices and that of the population covariance matrix, enables us to propose a two-step procedure to consistently estimate the spectral distribution of ICV for a class of diffusion processes. An alternative approach is further proposed, which possesses several desirable properties: it is more robust, it eliminates the effects of microstructure noise, and the asymptotic relationship that enables consistent estimation of the spectral distribution of ICV is the standard Marčenko–Pastur equation. The performance of the two approaches is examined via simulation studies under both synchronous and asynchronous observation settings.
</p>projecteuclid.org/euclid.aos/1522742427_20180403220226Tue, 03 Apr 2018 22:02 EDTOnline rules for control of false discovery rate and false discovery exceedancehttps://projecteuclid.org/euclid.aos/1522742428<strong>Adel Javanmard</strong>, <strong>Andrea Montanari</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 526--554.</p><p><strong>Abstract:</strong><br/>
Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses $\mathcal{H}(n)=(H_{1},\ldots,H_{n})$, Benjamini and Hochberg [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 (1995) 289–300] introduced the false discovery rate ($\mathrm{FDR}$), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls $\mathrm{FDR}$ below a pre-assigned significance level. Nowadays $\mathrm{FDR}$ is the criterion of choice for large-scale multiple hypothesis testing.
In this paper we consider the problem of controlling $\mathrm{FDR}$ in an online manner . Concretely, we consider an ordered—possibly infinite—sequence of null hypotheses $\mathcal{H}=(H_{1},H_{2},H_{3},\ldots)$ where, at each step $i$, the statistician must decide whether to reject hypothesis $H_{i}$ having access only to the previous decisions. This model was introduced by Foster and Stine [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 (2008) 429–444].
We study a class of generalized alpha investing procedures, first introduced by Aharoni and Rosset [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 (2014) 771–794]. We prove that any rule in this class controls online $\mathrm{FDR}$, provided $p$-values corresponding to true nulls are independent from the other $p$-values. Earlier work only established $\mathrm{mFDR}$ control. Next, we obtain conditions under which generalized alpha investing controls $\mathrm{FDR}$ in the presence of general $p$-values dependencies. We also develop a modified set of procedures that allow to control the false discovery exceedance (the tail of the proportion of false discoveries). Finally, we evaluate the performance of online procedures on both synthetic and real data, comparing them with offline approaches, such as adaptive Benjamini–Hochberg.
</p>projecteuclid.org/euclid.aos/1522742428_20180403220226Tue, 03 Apr 2018 22:02 EDTFrequency domain minimum distance inference for possibly noninvertible and noncausal ARMA modelshttps://projecteuclid.org/euclid.aos/1522742429<strong>Carlos Velasco</strong>, <strong>Ignacio N. Lobato</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 555--579.</p><p><strong>Abstract:</strong><br/>
This article introduces frequency domain minimum distance procedures for performing inference in general, possibly non causal and/or noninvertible, autoregressive moving average (ARMA) models. We use information from higher order moments to achieve identification on the location of the roots of the AR and MA polynomials for non-Gaussian time series. We propose a minimum distance estimator that optimally combines the information contained in second, third, and fourth moments. Contrary to existing estimators, the proposed one is consistent under general assumptions, and may improve on the efficiency of estimators based on only second order moments. Our procedures are also applicable for processes for which either the third or the fourth order spectral density is the zero function.
</p>projecteuclid.org/euclid.aos/1522742429_20180403220226Tue, 03 Apr 2018 22:02 EDTOn consistency and sparsity for sliced inverse regression in high dimensionshttps://projecteuclid.org/euclid.aos/1522742430<strong>Qian Lin</strong>, <strong>Zhigen Zhao</strong>, <strong>Jun S. Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 580--610.</p><p><strong>Abstract:</strong><br/>
We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by Li [ J. Amer. Statist. Assoc. 86 (1991) 316–342]. Under mild conditions, the asymptotic ratio $\rho=\lim p/n$ is the phase transition parameter and the SIR estimator is consistent if and only if $\rho=0$. When dimension $p$ is greater than $n$, we propose a diagonal thresholding screening SIR (DT-SIR) algorithm. This method provides us with an estimate of the eigenspace of $\operatorname{var}(\mathbb{E}[\boldsymbol{x}|y])$, the covariance matrix of the conditional expectation. The desired dimension reduction space is then obtained by multiplying the inverse of the covariance matrix on the eigenspace. Under certain sparsity assumptions on both the covariance matrix of predictors and the loadings of the directions, we prove the consistency of DT-SIR in estimating the dimension reduction space in high-dimensional data analysis. Extensive numerical experiments demonstrate superior performances of the proposed method in comparison to its competitors.
</p>projecteuclid.org/euclid.aos/1522742430_20180403220226Tue, 03 Apr 2018 22:02 EDTRegularization and the small-ball method I: Sparse recoveryhttps://projecteuclid.org/euclid.aos/1522742431<strong>Guillaume Lecué</strong>, <strong>Shahar Mendelson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 611--641.</p><p><strong>Abstract:</strong><br/>
We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*}\hat{f}\in\mathop{\operatorname{argmin}}_{f\in F}(\frac{1}{N}\sum_{i=1}^{N}(Y_{i}-f(X_{i}))^{2}+\lambda \Psi(f))\end{equation*} when $\Psi$ is a norm and $F$ is convex.
Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $\Psi$ in a neighborhood of the true minimizer.
As “proof of concept” we extend the known estimates for the LASSO, SLOPE and trace norm regularization.
</p>projecteuclid.org/euclid.aos/1522742431_20180403220226Tue, 03 Apr 2018 22:02 EDTGaussian and bootstrap approximations for high-dimensional U-statistics and their applicationshttps://projecteuclid.org/euclid.aos/1522742432<strong>Xiaohui Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 642--678.</p><p><strong>Abstract:</strong><br/>
This paper studies the Gaussian and bootstrap approximations for the probabilities of a nondegenerate U-statistic belonging to the hyperrectangles in $\mathbb{R}^{d}$ when the dimension $d$ is large. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence uniformly in the class of all hyperrectangles in $\mathbb{R}^{d}$ that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical bootstrap, the randomly reweighted bootstrap and the Gaussian multiplier bootstrap with the jackknife estimator of covariance matrix as randomly reweighted quadratic forms and we establish their validity. We show that all three methods are inferentially first-order equivalent for high-dimensional U-statistics in the sense that they achieve the same uniform rate of convergence over all $d$-dimensional hyperrectangles. In particular, they are asymptotically valid when the dimension $d$ can be as large as $O(e^{n^{c}})$ for some constant $c\in(0,1/7)$.
The bootstrap methods are applied to statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that for a class of sub-Gaussian data, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold. In addition, we also show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution.
</p>projecteuclid.org/euclid.aos/1522742432_20180403220226Tue, 03 Apr 2018 22:02 EDTSelective inference with a randomized responsehttps://projecteuclid.org/euclid.aos/1522742433<strong>Xiaoying Tian</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 679--710.</p><p><strong>Abstract:</strong><br/>
Inspired by sample splitting and the reusable holdout introduced in the field of differential privacy, we consider selective inference with a randomized response. We discuss two major advantages of using a randomized response for model selection. First, the selectively valid tests are more powerful after randomized selection. Second, it allows consistent estimation and weak convergence of selective inference procedures. Under independent sampling, we prove a selective (or privatized) central limit theorem that transfers procedures valid under asymptotic normality without selection to their corresponding selective counterparts. This allows selective inference in nonparametric settings. Finally, we propose a framework of inference after combining multiple randomized selection procedures. We focus on the classical asymptotic setting, leaving the interesting high-dimensional asymptotic questions for future work.
</p>projecteuclid.org/euclid.aos/1522742433_20180403220226Tue, 03 Apr 2018 22:02 EDTMultiscale blind source separationhttps://projecteuclid.org/euclid.aos/1522742434<strong>Merle Behr</strong>, <strong>Chris Holmes</strong>, <strong>Axel Munk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 711--744.</p><p><strong>Abstract:</strong><br/>
We provide a new methodology for statistical recovery of single linear mixtures of piecewise constant signals (sources) with unknown mixing weights and change points in a multiscale fashion. We show exact recovery within an $\varepsilon$-neighborhood of the mixture when the sources take only values in a known finite alphabet. Based on this we provide the SLAM (Separates Linear Alphabet Mixtures) estimators for the mixing weights and sources. For Gaussian error, we obtain uniform confidence sets and optimal rates (up to log-factors) for all quantities. SLAM is efficiently computed as a nonconvex optimization problem by a dynamic program tailored to the finite alphabet assumption. Its performance is investigated in a simulation study. Finally, it is applied to assign copy-number aberrations from genetic sequencing data to different clones and to estimate their proportions.
</p>projecteuclid.org/euclid.aos/1522742434_20180403220226Tue, 03 Apr 2018 22:02 EDTSharp oracle inequalities for Least Squares estimators in shape restricted regressionhttps://projecteuclid.org/euclid.aos/1522742435<strong>Pierre C. Bellec</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 745--780.</p><p><strong>Abstract:</strong><br/>
The performance of Least Squares (LS) estimators is studied in shape-constrained regression models under Gaussian and sub-Gaussian noise. General bounds on the performance of LS estimators over closed convex sets are provided. These results have the form of sharp oracle inequalities that account for the model misspecification error. In the presence of misspecification, these bounds imply that the LS estimator estimates the projection of the true parameter at the same rate as in the well-specified case.
In isotonic and unimodal regression, the LS estimator achieves the nonparametric rate $n^{-2/3}$ as well as a parametric rate of order $k/n$ up to logarithmic factors, where $k$ is the number of constant pieces of the true parameter. In univariate convex regression, the LS estimator satisfies an adaptive risk bound of order $q/n$ up to logarithmic factors, where $q$ is the number of affine pieces of the true regression function. This adaptive risk bound holds for any collection of design points. While Guntuboyina and Sen [ Probab. Theory Related Fields 163 (2015) 379–411] established that the nonparametric rate of convex regression is of order $n^{-4/5}$ for equispaced design points, we show that the nonparametric rate of convex regression can be as slow as $n^{-2/3}$ for some worst-case design points. This phenomenon can be explained as follows: Although convexity brings more structure than unimodality, for some worst-case design points this extra structure is uninformative and the nonparametric rates of unimodal regression and convex regression are both $n^{-2/3}$. Higher order cones, such as the cone of $\beta $-monotone sequences, are also studied.
</p>projecteuclid.org/euclid.aos/1522742435_20180403220226Tue, 03 Apr 2018 22:02 EDTOracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert spacehttps://projecteuclid.org/euclid.aos/1522742436<strong>Shaogao Lv</strong>, <strong>Huazhen Lin</strong>, <strong>Heng Lian</strong>, <strong>Jian Huang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 781--813.</p><p><strong>Abstract:</strong><br/>
This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle rates without the light tail requirement. In particular, the proposed estimator achieves the minimax lower bounds established for sparse additive mean regression. As a by-product, we also establish the concentration inequality for estimating the population mean when the general Lipschitz loss is involved. The practical effectiveness of the new method is demonstrated by competitive numerical results.
</p>projecteuclid.org/euclid.aos/1522742436_20180403220226Tue, 03 Apr 2018 22:02 EDTI-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical errorhttps://projecteuclid.org/euclid.aos/1522742437<strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Qiang Sun</strong>, <strong>Tong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 814--841.</p><p><strong>Abstract:</strong><br/>
We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an improved linear rate of convergence. Though this framework is completely algorithmic, it provides solutions with optimal statistical performances and controlled algorithmic complexity for a large family of nonconvex optimization problems. The iteration effects on statistical errors are clearly demonstrated via a contraction property. Our theory relies on a localized version of the sparse/restricted eigenvalue condition, which allows us to analyze a large family of loss and penalty functions and provide optimality guarantees under very weak assumptions (e.g., I-LAMM requires much weaker minimal signal strength than other procedures). Thorough numerical results are provided to support the obtained theory.
</p>projecteuclid.org/euclid.aos/1522742437_20180403220226Tue, 03 Apr 2018 22:02 EDTOn Bayesian index policies for sequential resource allocationhttps://projecteuclid.org/euclid.aos/1522742438<strong>Emilie Kaufmann</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 842--865.</p><p><strong>Abstract:</strong><br/>
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite-Horizon Gittins indices provide a justification for the kl-UCB$^{+}$ and kl-UCB-H$^{+}$ algorithms, whose asymptotic optimality is also established.
</p>projecteuclid.org/euclid.aos/1522742438_20180403220226Tue, 03 Apr 2018 22:02 EDTTesting independence with high-dimensional correlated sampleshttps://projecteuclid.org/euclid.aos/1522742439<strong>Xi Chen</strong>, <strong>Weidong Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 866--894.</p><p><strong>Abstract:</strong><br/>
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p\times n$ data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical power and prove its minimax optimality. As an important by-product of the test statistic, a ratio-consistent estimator for the quadratic functional of a covariance matrix from correlated samples is developed. We further study the effect of correlation among samples to an important high-dimensional inference problem—large-scale multiple testing of Pearson’s correlation coefficients. Indeed, blindly using classical inference results based on the assumed independence of samples will lead to many false discoveries, which suggests the need for conducting independence testing before applying existing methods. To address the challenge arising from correlation among samples, we propose a “sandwich estimator” of Pearson’s correlation coefficient by de-correlating the samples. Based on this approach, the resulting multiple testing procedure asymptotically controls the overall false discovery rate at the nominal level while maintaining good statistical power. Both simulated and real data experiments are carried out to demonstrate the advantages of the proposed methods.
</p>projecteuclid.org/euclid.aos/1522742439_20180403220226Tue, 03 Apr 2018 22:02 EDTDetecting rare and faint signals via thresholding maximum likelihood estimatorshttps://projecteuclid.org/euclid.aos/1522742440<strong>Yumou Qiu</strong>, <strong>Song Xi Chen</strong>, <strong>Dan Nettleton</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 2, 895--923.</p><p><strong>Abstract:</strong><br/>
Motivated by the analysis of RNA sequencing (RNA-seq) data for genes differentially expressed across multiple conditions, we consider detecting rare and faint signals in high-dimensional response variables. We address the signal detection problem under a general framework, which includes generalized linear models for count-valued responses as special cases. We propose a test statistic that carries out a multi-level thresholding on maximum likelihood estimators (MLEs) of the signals, based on a new Cramér-type moderate deviation result for multidimensional MLEs. Based on the multi-level thresholding test, a multiple testing procedure is proposed for signal identification. Numerical simulations and a case study on maize RNA-seq data are conducted to demonstrate the effectiveness of the proposed approaches on signal detection and identification.
</p>projecteuclid.org/euclid.aos/1522742440_20180403220226Tue, 03 Apr 2018 22:02 EDTHigh-dimensional $A$-learning for optimal dynamic treatment regimeshttps://projecteuclid.org/euclid.aos/1525313071<strong>Chengchun Shi</strong>, <strong>Ailin Fan</strong>, <strong>Rui Song</strong>, <strong>Wenbin Lu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 925--957.</p><p><strong>Abstract:</strong><br/>
Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This makes variable selection an emerging need in precision medicine.
In this paper, we propose a penalized multi-stage $A$-learning for deriving the optimal dynamic treatment regime when the number of covariates is of the nonpolynomial (NP) order of the sample size. To preserve the double robustness property of the $A$-learning method, we adopt the Dantzig selector, which directly penalizes the A-leaning estimating equations. Oracle inequalities of the proposed estimators for the parameters in the optimal dynamic treatment regime and error bounds on the difference between the value functions of the estimated optimal dynamic treatment regime and the true optimal dynamic treatment regime are established. Empirical performance of the proposed approach is evaluated by simulations and illustrated with an application to data from the STAR∗D study.
</p>projecteuclid.org/euclid.aos/1525313071_20180502220435Wed, 02 May 2018 22:04 EDTTest for high-dimensional regression coefficients using refitted cross-validation variance estimationhttps://projecteuclid.org/euclid.aos/1525313072<strong>Hengjian Cui</strong>, <strong>Wenwen Guo</strong>, <strong>Wei Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 958--988.</p><p><strong>Abstract:</strong><br/>
Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte Carlo simulations, which show that the new test based on the RCV estimator achieves higher powers, especially for the sparse cases. We also demonstrate an application by an empirical analysis of a microarray data set on Yorkshire gilts.
</p>projecteuclid.org/euclid.aos/1525313072_20180502220435Wed, 02 May 2018 22:04 EDTAre discoveries spurious? Distributions of maximum spurious correlations and their applicationshttps://projecteuclid.org/euclid.aos/1525313073<strong>Jianqing Fan</strong>, <strong>Qi-Man Shao</strong>, <strong>Wen-Xin Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 989--1017.</p><p><strong>Abstract:</strong><br/>
Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$ with the best $s$ linear combinations of $p$ covariates $\mathbf{X}$, even when $\mathbf{X}$ and $Y$ are independent. When the covariance matrix of $\mathbf{X}$ possesses the restricted eigenvalue property, we derive such distributions for both a finite $s$ and a diverging $s$, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of $\mathbf{X}$. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where the residuals are from regularized fits. Our approach is then used to construct the upper confidence limit for the maximum spurious correlation and to test the exogeneity of the covariates. The former provides a baseline for guarding against false discoveries and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated with both numerical examples and real data analysis.
</p>projecteuclid.org/euclid.aos/1525313073_20180502220435Wed, 02 May 2018 22:04 EDTAdaptive estimation of planar convex setshttps://projecteuclid.org/euclid.aos/1525313074<strong>T. Tony Cai</strong>, <strong>Adityanand Guntuboyina</strong>, <strong>Yuting Wei</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1018--1049.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function. Both the problem of estimating the support function at a point and that of estimating the whole convex set are studied. For pointwise estimation, we consider the problem in a general nonasymptotic framework, which evaluates the performance of a procedure at each individual set, instead of the worst case performance over a large parameter space as in conventional minimax theory. A data-driven adaptive estimator is proposed and is shown to be optimally adaptive to every compact, convex set. For estimating the whole convex set, we propose estimators that are shown to adaptively achieve the optimal rate of convergence. In both of these problems, our analysis makes no smoothness assumptions on the boundary of the unknown convex set.
</p>projecteuclid.org/euclid.aos/1525313074_20180502220435Wed, 02 May 2018 22:04 EDTConsistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysishttps://projecteuclid.org/euclid.aos/1525313075<strong>Zhidong Bai</strong>, <strong>Kwok Pui Choi</strong>, <strong>Yasunori Fujikoshi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1050--1076.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the problem of estimating the number of significant components in principal component analysis (PCA), which corresponds to the number of dominant eigenvalues of the covariance matrix of $p$ variables. Our purpose is to examine the consistency of the estimation criteria AIC and BIC based on the model selection criteria by Akaike [In 2nd International Symposium on Information Theory (1973) 267–281, Akadémia Kiado] and Schwarz [ Estimating the dimension of a model 6 (1978) 461–464] under a high-dimensional asymptotic framework. Using random matrix theory techniques, we derive sufficient conditions for the criterion to be strongly consistent for the case when the dominant population eigenvalues are bounded, and when the dominant eigenvalues tend to infinity. Moreover, the asymptotic results are obtained without normality assumption on the population distribution. Simulation studies are also conducted, and results show that the sufficient conditions in our theorems are essential.
</p>projecteuclid.org/euclid.aos/1525313075_20180502220435Wed, 02 May 2018 22:04 EDTOn the systematic and idiosyncratic volatility with large panel high-frequency datahttps://projecteuclid.org/euclid.aos/1525313076<strong>Xin-Bing Kong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1077--1108.</p><p><strong>Abstract:</strong><br/>
In this paper, we separate the integrated (spot) volatility of an individual Itô process into integrated (spot) systematic and idiosyncratic volatilities, and estimate them by aggregation of local factor analysis (localization) with large-dimensional high-frequency data. We show that, when both the sampling frequency $n$ and the dimensionality $p$ go to infinity and $p\geq C\sqrt{n}$ for some constant $C$, our estimators of High dimensional Itô process; common driving process; specific driving process, integrated High dimensional Itô process, common driving process, specific driving process, systematic and idiosyncratic volatilities are $\sqrt{n}$ ($n^{1/4}$ for spot estimates) consistent, the best rate achieved in estimating the integrated (spot) volatility which is readily identified even with univariate high-frequency data. However, when $Cn^{1/4}\leq p<C\sqrt{n}$, aggregation of $n^{1/4}$-consistent local estimates of systematic and idiosyncratic volatilities results in $p$-consistent (not $\sqrt{n}$-consistent) estimates of integrated systematic and idiosyncratic volatilities. Even more interesting, when $p<Cn^{1/4}$, the integrated estimate has the same convergence rate as the spot estimate, both being $p$-consistent. This reveals a distinctive feature from aggregating local estimates in the low-dimensional high-frequency data setting. We also present estimators of the integrated (spot) idiosyncratic volatility matrices as well as their inverse matrices under some sparsity assumption. We finally present a factor-based estimator of the inverse of the spot volatility matrix. Numerical studies including the Monte Carlo experiments and real data analysis justify the performance of our estimators.
</p>projecteuclid.org/euclid.aos/1525313076_20180502220435Wed, 02 May 2018 22:04 EDTBall Divergence: Nonparametric two sample testhttps://projecteuclid.org/euclid.aos/1525313077<strong>Wenliang Pan</strong>, <strong>Yuan Tian</strong>, <strong>Xueqin Wang</strong>, <strong>Heping Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1109--1137.</p><p><strong>Abstract:</strong><br/>
In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. We show that this multivariate two sample test statistic is consistent with the Ball Divergence, and it converges to a mixture of $\chi^{2}$ distributions under the null hypothesis and a normal distribution under the alternative hypothesis. Importantly, we prove its consistency against a general alternative hypothesis. Moreover, this result does not depend on the ratio of the two imbalanced sample sizes, ensuring that can be applied to imbalanced data. Numerical studies confirm that our test is superior to several existing tests in terms of Type I error and power. We conclude our paper with two applications of our method: one is for virtual screening in drug development process and the other is for genome wide expression analysis in hormone replacement therapy.
</p>projecteuclid.org/euclid.aos/1525313077_20180502220435Wed, 02 May 2018 22:04 EDTA smooth block bootstrap for quantile regression with time serieshttps://projecteuclid.org/euclid.aos/1525313078<strong>Karl B. Gregory</strong>, <strong>Soumendra N. Lahiri</strong>, <strong>Daniel J. Nordman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1138--1166.</p><p><strong>Abstract:</strong><br/>
Quantile regression allows for broad (conditional) characterizations of a response distribution beyond conditional means and is of increasing interest in economic and financial applications. Because quantile regression estimators have complex limiting distributions, several bootstrap methods for the independent data setting have been proposed, many of which involve smoothing steps to improve bootstrap approximations. Currently, no similar advances in smoothed bootstraps exist for quantile regression with dependent data. To this end, we establish a smooth tapered block bootstrap procedure for approximating the distribution of quantile regression estimators for time series. This bootstrap involves two rounds of smoothing in resampling: individual observations are resampled via kernel smoothing techniques and resampled data blocks are smoothed by tapering. The smooth bootstrap results in performance improvements over previous unsmoothed versions of the block bootstrap as well as normal approximations based on Powell’s kernel variance estimator, which are common in application. Our theoretical results correct errors in proofs for earlier and simpler versions of the (unsmoothed) moving blocks bootstrap for quantile regression and broaden the validity of block bootstraps for this problem under weak conditions. We illustrate the smooth bootstrap through numerical studies and examples.
</p>projecteuclid.org/euclid.aos/1525313078_20180502220435Wed, 02 May 2018 22:04 EDTAsymptotic distribution-free tests for semiparametric regressions with dependent datahttps://projecteuclid.org/euclid.aos/1525313079<strong>Juan Carlos Escanciano</strong>, <strong>Juan Carlos Pardo-Fernández</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1167--1196.</p><p><strong>Abstract:</strong><br/>
This article proposes a new general methodology for constructing nonparametric and semiparametric Asymptotically Distribution-Free (ADF) tests for semiparametric hypotheses in regression models for possibly dependent data coming from a strictly stationary process. Classical tests based on the difference between the estimated distributions of the restricted and unrestricted regression errors are not ADF. In this article, we introduce a novel transformation of this difference that leads to ADF tests with well-known critical values. The general methodology is illustrated with applications to testing for parametric models against nonparametric or semiparametric alternatives, and semiparametric constrained mean–variance models. Several Monte Carlo studies and an empirical application show that the finite sample performance of the proposed tests is satisfactory in moderate sample sizes.
</p>projecteuclid.org/euclid.aos/1525313079_20180502220435Wed, 02 May 2018 22:04 EDTGradient-based structural change detection for nonstationary time series M-estimationhttps://projecteuclid.org/euclid.aos/1525313080<strong>Weichi Wu</strong>, <strong>Zhou Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1197--1224.</p><p><strong>Abstract:</strong><br/>
We consider structural change testing for a wide class of time series M-estimation with nonstationary predictors and errors. Flexible predictor-error relationships, including exogenous, state-heteroscedastic and autoregressive regressions and their mixtures, are allowed. New uniform Bahadur representations are established with nearly optimal approximation rates. A CUSUM-type test statistic based on the gradient vectors of the regression is considered. In this paper, a simple bootstrap method is proposed and is proved to be consistent for M-estimation structural change detection under both abrupt and smooth nonstationarity and temporal dependence. Our bootstrap procedure is shown to have certain asymptotically optimal properties in terms of accuracy and power. A public health time series dataset is used to illustrate our methodology, and asymmetry of structural changes in high and low quantiles is found.
</p>projecteuclid.org/euclid.aos/1525313080_20180502220435Wed, 02 May 2018 22:04 EDTModerate deviations and nonparametric inference for monotone functionshttps://projecteuclid.org/euclid.aos/1525313081<strong>Fuqing Gao</strong>, <strong>Jie Xiong</strong>, <strong>Xingqiu Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1225--1254.</p><p><strong>Abstract:</strong><br/>
This paper considers self-normalized limits and moderate deviations of nonparametric maximum likelihood estimators for monotone functions. We obtain their self-normalized Cramér-type moderate deviations and limit distribution theorems for the nonparametric maximum likelihood estimator in the current status model and the Grenander-type estimator. As applications of the results, we present a new procedure to construct asymptotical confidence intervals and asymptotical rejection regions of hypothesis testing for monotone functions. The theoretical results can guarantee that the new test has the probability of type II error tending to 0 exponentially. Simulation studies also show that the new nonparametric test works well for the most commonly used parametric survival functions such as exponential and Weibull survival distributions.
</p>projecteuclid.org/euclid.aos/1525313081_20180502220435Wed, 02 May 2018 22:04 EDTUniform asymptotic inference and the bootstrap after model selectionhttps://projecteuclid.org/euclid.aos/1525313082<strong>Ryan J. Tibshirani</strong>, <strong>Alessandro Rinaldo</strong>, <strong>Rob Tibshirani</strong>, <strong>Larry Wasserman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1255--1287.</p><p><strong>Abstract:</strong><br/>
Recently, Tibshirani et al. [ J. Amer. Statist. Assoc. 111 (2016) 600–620] proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples $n$ grows and the dimension $d$ of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that is provably (asymptotically) conservative, and in practice, often delivers shorter intervals than those from the original normality-based approach. Finally, we prove that the test statistic of Tibshirani et al. (2016) does not enjoy uniform validity in a high-dimensional setting, when the dimension $d$ is allowed grow.
</p>projecteuclid.org/euclid.aos/1525313082_20180502220435Wed, 02 May 2018 22:04 EDTDetection thresholds for the $\beta$-model on sparse graphshttps://projecteuclid.org/euclid.aos/1525313083<strong>Rajarshi Mukherjee</strong>, <strong>Sumit Mukherjee</strong>, <strong>Subhabrata Sen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1288--1317.</p><p><strong>Abstract:</strong><br/>
In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a version of the higher criticism test which is provably sharp up to optimal constants in the regime of sparse signals. The theoretical results are further verified by numerical simulations.
</p>projecteuclid.org/euclid.aos/1525313083_20180502220435Wed, 02 May 2018 22:04 EDTAdaptive sup-norm estimation of the Wigner function in noisy quantum homodyne tomographyhttps://projecteuclid.org/euclid.aos/1525313084<strong>Karim Lounici</strong>, <strong>Katia Meziani</strong>, <strong>Gabriel Peyré</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1318--1351.</p><p><strong>Abstract:</strong><br/>
In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb{R}^{2}$, which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2<\eta\leq1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb{L}_{\infty}$-risk over a class of infinitely differentiable functions. We also compute the lower bound for the $\mathbb{L}_{2}$-risk. We construct an adaptive estimator, that is, which does not depend on the smoothness parameters, and prove that it attains the minimax rates for the corresponding smoothness of the class of functions up to a logarithmic factor in the sample size. Finite sample behaviour of our adaptive procedure is explored through numerical experiments.
</p>projecteuclid.org/euclid.aos/1525313084_20180502220435Wed, 02 May 2018 22:04 EDTDistributed testing and estimation under sparse high dimensional modelshttps://projecteuclid.org/euclid.aos/1525313085<strong>Heather Battey</strong>, <strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Junwei Lu</strong>, <strong>Ziwei Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 3, 1352--1382.</p><p><strong>Abstract:</strong><br/>
This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large $k$ can be, as $n$ grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.
</p>projecteuclid.org/euclid.aos/1525313085_20180502220435Wed, 02 May 2018 22:04 EDTLarge covariance estimation through elliptical factor modelshttps://projecteuclid.org/euclid.aos/1530086420<strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Weichen Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1383--1414.</p><p><strong>Abstract:</strong><br/>
We propose a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model. A set of high-level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms is established to better understand how POET works. Such a framework allows us to recover existing results for sub-Gaussian data in a more transparent way that only depends on the concentration properties of the sample covariance matrix. As a new theoretical contribution, for the first time, such a framework allows us to exploit conditional sparsity covariance structure for the heavy-tailed data. In particular, for the elliptical distribution, we propose a robust estimator based on the marginal and spatial Kendall’s tau to satisfy these conditions. In addition, we study conditional graphical model under the same framework. The technical tools developed in this paper are of general interest to high-dimensional principal component analysis. Thorough numerical results are also provided to back up the developed theory.
</p>projecteuclid.org/euclid.aos/1530086420_20180627040039Wed, 27 Jun 2018 04:00 EDTCurrent status linear regressionhttps://projecteuclid.org/euclid.aos/1530086421<strong>Piet Groeneboom</strong>, <strong>Kim Hendrickx</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1415--1444.</p><p><strong>Abstract:</strong><br/>
We construct $\sqrt{n}$-consistent and asymptotically normal estimates for the finite dimensional regression parameter in the current status linear regression model, which do not require any smoothing device and are based on maximum likelihood estimates (MLEs) of the infinite dimensional parameter. We also construct estimates, again only based on these MLEs, which are arbitrarily close to efficient estimates, if the generalized Fisher information is finite. This type of efficiency is also derived under minimal conditions for estimates based on smooth nonmonotone plug-in estimates of the distribution function. Algorithms for computing the estimates and for selecting the bandwidth of the smooth estimates with a bootstrap method are provided. The connection with results in the econometric literature is also pointed out.
</p>projecteuclid.org/euclid.aos/1530086421_20180627040039Wed, 27 Jun 2018 04:00 EDTJump filtering and efficient drift estimation for Lévy-driven SDEshttps://projecteuclid.org/euclid.aos/1530086422<strong>Arnaud Gloter</strong>, <strong>Dasha Loukianova</strong>, <strong>Hilmar Mai</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1445--1480.</p><p><strong>Abstract:</strong><br/>
The problem of drift estimation for the solution $X$ of a stochastic differential equation with Lévy-type jumps is considered under discrete high-frequency observations with a growing observation window. An efficient and asymptotically normal estimator for the drift parameter is constructed under minimal conditions on the jump behavior and the sampling scheme. In the case of a bounded jump measure density, these conditions reduce to $n\Delta_{n}^{3-\varepsilon}\rightarrow 0$, where $n$ is the number of observations and $\Delta_{n}$ is the maximal sampling step. This result relaxes the condition $n\Delta_{n}^{2}\rightarrow 0$ usually required for joint estimation of drift and diffusion coefficient for SDEs with jumps. The main challenge in this estimation problem stems from the appearance of the unobserved continuous part $X^{c}$ in the likelihood function. In order to construct the drift estimator, we recover this continuous part from discrete observations. More precisely, we estimate, in a nonparametric way, stochastic integrals with respect to $X^{c}$. Convergence results of independent interest are proved for these nonparametric estimators.
</p>projecteuclid.org/euclid.aos/1530086422_20180627040039Wed, 27 Jun 2018 04:00 EDTConsistency and convergence rate of phylogenetic inference via regularizationhttps://projecteuclid.org/euclid.aos/1530086423<strong>Vu Dinh</strong>, <strong>Lam Si Tung Ho</strong>, <strong>Marc A. Suchard</strong>, <strong>Frederick A. Matsen IV</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1481--1512.</p><p><strong>Abstract:</strong><br/>
It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct “gene tree.” Although the gene tree may deviate from the “species tree” due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a nonstandard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera–Holmes–Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is “adaptive fast converging,” meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.
</p>projecteuclid.org/euclid.aos/1530086423_20180627040039Wed, 27 Jun 2018 04:00 EDTPareto quantiles of unlabeled tree objectshttps://projecteuclid.org/euclid.aos/1530086424<strong>Ela Sienkiewicz</strong>, <strong>Haonan Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1513--1540.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider a set of unlabeled tree objects with topological and geometric properties. For each data object, two curve representations are developed to characterize its topological and geometric aspects. We further define the notions of topological and geometric medians as well as quantiles based on both representations. In addition, we take a novel approach to define the Pareto medians and quantiles through a multi-objective optimization problem. In particular, we study two different objective functions which measure the topological variation and geometric variation, respectively. Analytical solutions are provided for topological and geometric medians and quantiles, and in general, for Pareto medians and quantiles, the genetic algorithm is implemented. The proposed methods are applied to analyze a data set of pyramidal neurons.
</p>projecteuclid.org/euclid.aos/1530086424_20180627040039Wed, 27 Jun 2018 04:00 EDTEfficient and adaptive linear regression in semi-supervised settingshttps://projecteuclid.org/euclid.aos/1530086425<strong>Abhishek Chakrabortty</strong>, <strong>Tianxi Cai</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1541--1572.</p><p><strong>Abstract:</strong><br/>
We consider the linear regression problem under semi-supervised settings wherein the available data typically consists of: (i) a small or moderate sized “labeled” data, and (ii) a much larger sized “unlabeled” data. Such data arises naturally from settings where the outcome, unlike the covariates, is expensive to obtain, a frequent scenario in modern studies involving large databases like electronic medical records (EMR). Supervised estimators like the ordinary least squares (OLS) estimator utilize only the labeled data. It is often of interest to investigate if and when the unlabeled data can be exploited to improve estimation of the regression parameter in the adopted linear model.
In this paper, we propose a class of “Efficient and Adaptive Semi-Supervised Estimators” (EASE) to improve estimation efficiency. The EASE are two-step estimators adaptive to model mis-specification, leading to improved (optimal in some cases) efficiency under model mis-specification, and equal (optimal) efficiency under a linear model. This adaptive property, often unaddressed in the existing literature, is crucial for advocating “safe” use of the unlabeled data. The construction of EASE primarily involves a flexible “semi-nonparametric” imputation, including a smoothing step that works well even when the number of covariates is not small; and a follow up “refitting” step along with a cross-validation (CV) strategy both of which have useful practical as well as theoretical implications towards addressing two important issues: under-smoothing and over-fitting. We establish asymptotic results including consistency, asymptotic normality and the adaptive properties of EASE. We also provide influence function expansions and a “double” CV strategy for inference. The results are further validated through extensive simulations, followed by application to an EMR study on auto-immunity.
</p>projecteuclid.org/euclid.aos/1530086425_20180627040039Wed, 27 Jun 2018 04:00 EDTConvexified modularity maximization for degree-corrected stochastic block modelshttps://projecteuclid.org/euclid.aos/1530086426<strong>Yudong Chen</strong>, <strong>Xiaodong Li</strong>, <strong>Jiaming Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1573--1602.</p><p><strong>Abstract:</strong><br/>
The stochastic block model (SBM), a popular framework for studying community detection in networks, is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. To find the communities under DCSBM, this paper proposes a convexified modularity maximization approach, which is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted $\ell_{1}$-norm $k$-medoids procedure. We establish nonasymptotic theoretical guarantees for approximate and perfect clustering, both of which build on a new degree-corrected density gap condition . Our approximate clustering results are insensitive to the minimum degree, and hold even in sparse regime with bounded average degrees. In the special case of SBM, our theoretical guarantees match the best-known results of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive performance compared to the state of the art in the literature.
</p>projecteuclid.org/euclid.aos/1530086426_20180627040039Wed, 27 Jun 2018 04:00 EDTNear-optimality of linear recovery in Gaussian observation scheme under $\Vert \cdot \Vert_{2}^{2}$-losshttps://projecteuclid.org/euclid.aos/1530086427<strong>Anatoli Juditsky</strong>, <strong>Arkadi Nemirovski</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1603--1629.</p><p><strong>Abstract:</strong><br/>
We consider the problem of recovering linear image $Bx$ of a signal $x$ known to belong to a given convex compact set $\mathcal{X}$ from indirect observation $\omega=Ax+\sigma\xi$ of $x$ corrupted by Gaussian noise $\xi$. It is shown that under some assumptions on $\mathcal{X}$ (satisfied, e.g., when $\mathcal{X}$ is the intersection of $K$ concentric ellipsoids/elliptic cylinders), an easy-to-compute linear estimate is near-optimal in terms of its worst case, over $x\in\mathcal{X}$, expected $\Vert \cdot \Vert_{2}^{2}$-loss. The main novelty here is that the result imposes no restrictions on $A$ and $B$. To the best of our knowledge, preceding results on optimality of linear estimates dealt either with one-dimensional $Bx$ (estimation of linear forms) or with the “diagonal case” where $A$, $B$ are diagonal and $\mathcal{X}$ is given by a “separable” constraint like $\mathcal{X}=\{x:\sum_{i}a_{i}^{2}x_{i}^{2}\leq1\}$ or $\mathcal{X}=\{x:\max_{i}|a_{i}x_{i}|\leq1\}$.
</p>projecteuclid.org/euclid.aos/1530086427_20180627040039Wed, 27 Jun 2018 04:00 EDTAn MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processeshttps://projecteuclid.org/euclid.aos/1530086428<strong>Hani Doss</strong>, <strong>Yeonhee Park</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1630--1663.</p><p><strong>Abstract:</strong><br/>
Consider a Bayesian situation in which we observe $Y\sim p_{\theta}$, where $\theta\in\Theta$, and we have a family $\{\nu_{h},h\in\mathcal{H}\}$ of potential prior distributions on $\Theta$. Let $g$ be a real-valued function of $\theta$, and let $I_{g}(h)$ be the posterior expectation of $g(\theta)$ when the prior is $\nu_{h}$. We are interested in two problems: (i) selecting a particular value of $h$, and (ii) estimating the family of posterior expectations $\{I_{g}(h),h\in\mathcal{H}\}$. Let $m_{y}(h)$ be the marginal likelihood of the hyperparameter $h$: $m_{y}(h)=\int p_{\theta}(y)\nu_{h}(d\theta)$. The empirical Bayes estimate of $h$ is, by definition, the value of $h$ that maximizes $m_{y}(h)$. It turns out that it is typically possible to use Markov chain Monte Carlo to form point estimates for $m_{y}(h)$ and $I_{g}(h)$ for each individual $h$ in a continuum, and also confidence intervals for $m_{y}(h)$ and $I_{g}(h)$ that are valid pointwise. However, we are interested in forming estimates, with confidence statements, of the entire families of integrals $\{m_{y}(h),h\in\mathcal{H}\}$ and $\{I_{g}(h),h\in\mathcal{H}\}$: we need estimates of the first family in order to carry out empirical Bayes inference, and we need estimates of the second family in order to do Bayesian sensitivity analysis. We establish strong consistency and functional central limit theorems for estimates of these families by using tools from empirical process theory. We give two applications, one to latent Dirichlet allocation, which is used in topic modeling, and the other is to a model for Bayesian variable selection in linear regression.
</p>projecteuclid.org/euclid.aos/1530086428_20180627040039Wed, 27 Jun 2018 04:00 EDTCurvature and inference for maximum likelihood estimateshttps://projecteuclid.org/euclid.aos/1530086429<strong>Bradley Efron</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1664--1692.</p><p><strong>Abstract:</strong><br/>
Maximum likelihood estimates are sufficient statistics in exponential families, but not in general. The theory of statistical curvature was introduced to measure the effects of MLE insufficiency in one-parameter families. Here, we analyze curvature in the more realistic venue of multiparameter families—more exactly, curved exponential families , a broad class of smoothly defined nonexponential family models. We show that within the set of observations giving the same value for the MLE, there is a “region of stability” outside of which the MLE is no longer even a local maximum. Accuracy of the MLE is affected by the location of the observation vector within the region of stability. Our motivating example involves “$g$-modeling,” an empirical Bayes estimation procedure.
</p>projecteuclid.org/euclid.aos/1530086429_20180627040039Wed, 27 Jun 2018 04:00 EDTEmpirical Bayes estimates for a two-way cross-classified modelhttps://projecteuclid.org/euclid.aos/1530086430<strong>Lawrence D. Brown</strong>, <strong>Gourab Mukherjee</strong>, <strong>Asaf Weinstein</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1693--1720.</p><p><strong>Abstract:</strong><br/>
We develop an empirical Bayes procedure for estimating the cell means in an unbalanced, two-way additive model with fixed effects. We employ a hierarchical model, which reflects exchangeability of the effects within treatment and within block but not necessarily between them, as suggested before by Lindley and Smith [ J. R. Stat. Soc., B 34 (1972) 1–41]. The hyperparameters of this hierarchical model, instead of considered fixed, are to be substituted with data-dependent values in such a way that the point risk of the empirical Bayes estimator is small. Our method chooses the hyperparameters by minimizing an unbiased risk estimate and is shown to be asymptotically optimal for the estimation problem defined above, under suitable conditions. The usual empirical Best Linear Unbiased Predictor (BLUP) is shown to be substantially different from the proposed method in the unbalanced case and, therefore, performs suboptimally. Our estimator is implemented through a computationally tractable algorithm that is scalable to work under large designs. The case of missing cell observations is treated as well.
</p>projecteuclid.org/euclid.aos/1530086430_20180627040039Wed, 27 Jun 2018 04:00 EDTEstimating variance of random effects to solve multiple problems simultaneouslyhttps://projecteuclid.org/euclid.aos/1530086431<strong>Masayo Yoshimori Hirose</strong>, <strong>Partha Lahiri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1721--1741.</p><p><strong>Abstract:</strong><br/>
The two-level normal hierarchical model (NHM) has played a critical role in statistical theory for the last several decades. In this paper, we propose random effects variance estimator that simultaneously (i) improves on the estimation of the related shrinkage factors, (ii) protects empirical best linear unbiased predictors (EBLUP) [same as empirical Bayes (EB)] of the random effects from the common overshrinkage problem, (iii) avoids complex bias correction in generating strictly positive second-order unbiased mean square error (MSE) (same as integrated Bayes risk) estimator either by the Taylor series or single parametric bootstrap method. The idea of achieving multiple desirable properties in an EBLUP or EB method through a suitably devised random effects variance estimator is the first of its kind and holds promise in providing good inferences for random effects under the EBLUP or EB framework. The proposed methodology is also evaluated using a Monte Carlo simulation study and real data analysis.
</p>projecteuclid.org/euclid.aos/1530086431_20180627040039Wed, 27 Jun 2018 04:00 EDTOptimal shrinkage of eigenvalues in the spiked covariance modelhttps://projecteuclid.org/euclid.aos/1530086432<strong>David Donoho</strong>, <strong>Matan Gavish</strong>, <strong>Iain Johnstone</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1742--1778.</p><p><strong>Abstract:</strong><br/>
We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation.
In an asymptotic framework based on the spiked covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker $\eta$ that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker $\eta^{*}$ dominating all other shrinkers. The shape of the optimal shrinker is determined by the choice of loss function and, crucially, by inconsistency of both eigenvalues and eigenvectors of the sample covariance matrix.
Details of these phenomena and closed form formulas for the optimal eigenvalue shrinkers are worked out for a menagerie of 26 loss functions for covariance estimation found in the literature, including the Stein, Entropy, Divergence, Fréchet, Bhattacharya/Matusita, Frobenius Norm, Operator Norm, Nuclear Norm and Condition Number losses.
</p>projecteuclid.org/euclid.aos/1530086432_20180627040039Wed, 27 Jun 2018 04:00 EDTA Bayesian approach to the selection of two-level multi-stratum factorial designshttps://projecteuclid.org/euclid.aos/1530086433<strong>Ming-Chung Chang</strong>, <strong>Ching-Shui Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1779--1806.</p><p><strong>Abstract:</strong><br/>
In a multi-stratum factorial experiment, there are multiple error terms (strata) with different variances that arise from complicated structures of the experimental units. For unstructured experimental units, minimum aberration is a popular criterion for choosing regular fractional factorial designs. One difficulty in extending this criterion to multi-stratum factorial designs is that the formulation of a word length pattern based on which minimum aberration is defined requires an order of desirability among the relevant words, but a natural order is often lacking. Furthermore, a criterion based only on word length patterns does not account for the different stratum variances. Mitchell, Morris and Ylvisaker [ Statist. Sinica 5 (1995) 559–573] proposed a framework for Bayesian factorial designs. A Gaussian process is used as the prior for the treatment effects, from which a prior distribution of the factorial effects is induced. This approach is applied to study optimal and efficient multi-stratum factorial designs. Good surrogates for the Bayesian criteria that can be related to word length and generalized word length patterns for regular and nonregular designs, respectively, are derived. A tool is developed for eliminating inferior designs and reducing the designs that need to be considered without requiring any knowledge of stratum variances. Numerical examples are used to illustrate the theory in several settings.
</p>projecteuclid.org/euclid.aos/1530086433_20180627040039Wed, 27 Jun 2018 04:00 EDTAccuracy assessment for high-dimensional linear regressionhttps://projecteuclid.org/euclid.aos/1530086434<strong>T. Tony Cai</strong>, <strong>Zijian Guo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1807--1836.</p><p><strong>Abstract:</strong><br/>
This paper considers point and interval estimation of the $\ell_{q}$ loss of an estimator in high-dimensional linear regression with random design. We establish the minimax rate for estimating the $\ell_{q}$ loss and the minimax expected length of confidence intervals for the $\ell_{q}$ loss of rate-optimal estimators of the regression vector, including commonly used estimators such as Lasso, scaled Lasso, square-root Lasso and Dantzig Selector. Adaptivity of confidence intervals for the $\ell_{q}$ loss is also studied. Both the setting of the known identity design covariance matrix and known noise level and the setting of unknown design covariance matrix and unknown noise level are studied. The results reveal interesting and significant differences between estimating the $\ell_{2}$ loss and $\ell_{q}$ loss with $1\le q<2$ as well as between the two settings.
New technical tools are developed to establish rate sharp lower bounds for the minimax estimation error and the expected length of minimax and adaptive confidence intervals for the $\ell_{q}$ loss. A significant difference between loss estimation and the traditional parameter estimation is that for loss estimation the constraint is on the performance of the estimator of the regression vector, but the lower bounds are on the difficulty of estimating its $\ell_{q}$ loss. The technical tools developed in this paper can also be of independent interest.
</p>projecteuclid.org/euclid.aos/1530086434_20180627040039Wed, 27 Jun 2018 04:00 EDTVariable selection with Hamming losshttps://projecteuclid.org/euclid.aos/1534492821<strong>Cristina Butucea</strong>, <strong>Mohamed Ndaoud</strong>, <strong>Natalia A. Stepanova</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1837--1875.</p><p><strong>Abstract:</strong><br/>
We derive nonasymptotic bounds for the minimax risk of variable selection under expected Hamming loss in the Gaussian mean model in $\mathbb{R}^{d}$ for classes of at most $s$-sparse vectors separated from 0 by a constant $a>0$. In some cases, we get exact expressions for the nonasymptotic minimax risk as a function of $d,s,a$ and find explicitly the minimax selectors. These results are extended to dependent or non-Gaussian observations and to the problem of crowdsourcing. Analogous conclusions are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we derive necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes.
</p>projecteuclid.org/euclid.aos/1534492821_20180817040040Fri, 17 Aug 2018 04:00 EDTRandomization-based causal inference from split-plot designshttps://projecteuclid.org/euclid.aos/1534492822<strong>Anqi Zhao</strong>, <strong>Peng Ding</strong>, <strong>Rahul Mukerjee</strong>, <strong>Tirthankar Dasgupta</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1876--1903.</p><p><strong>Abstract:</strong><br/>
Under the potential outcomes framework, we propose a randomization based estimation procedure for causal inference from split-plot designs, with special emphasis on $2^{2}$ designs that naturally arise in many social, behavioral and biomedical experiments. Point estimators of factorial effects are obtained and their sampling variances are derived in closed form as linear combinations of the between- and within-group covariances of the potential outcomes. Results are compared to those under complete randomization as measures of design efficiency. Conservative estimators of these sampling variances are proposed. Connection of the randomization-based approach to inference based on the linear mixed effects model is explored. Results on sampling variances of point estimators and their estimators are extended to general split-plot designs. The superiority over existing model-based alternatives in frequency coverage properties is reported under a variety of simulation settings for both binary and continuous outcomes.
</p>projecteuclid.org/euclid.aos/1534492822_20180817040040Fri, 17 Aug 2018 04:00 EDTA new perspective on robust $M$-estimation: Finite sample theory and applications to dependence-adjusted multiple testinghttps://projecteuclid.org/euclid.aos/1534492823<strong>Wen-Xin Zhou</strong>, <strong>Koushiki Bose</strong>, <strong>Jianqing Fan</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1904--1931.</p><p><strong>Abstract:</strong><br/>
Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [ Ann. Statist. 1 (1973) 799–821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry–Esseen inequality and Cramér-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.
</p>projecteuclid.org/euclid.aos/1534492823_20180817040040Fri, 17 Aug 2018 04:00 EDTRobust covariance and scatter matrix estimation under Huber’s contamination modelhttps://projecteuclid.org/euclid.aos/1534492824<strong>Mengjie Chen</strong>, <strong>Chao Gao</strong>, <strong>Zhao Ren</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1932--1960.</p><p><strong>Abstract:</strong><br/>
Covariance matrix estimation is one of the most important problems in statistics. To accommodate the complexity of modern datasets, it is desired to have estimation procedures that not only can incorporate the structural assumptions of covariance matrices, but are also robust to outliers from arbitrary sources. In this paper, we define a new concept called matrix depth and then propose a robust covariance matrix estimator by maximizing the empirical depth function. The proposed estimator is shown to achieve minimax optimal rate under Huber’s $\varepsilon$-contamination model for estimating covariance/scatter matrices with various structures including bandedness and sparsity.
</p>projecteuclid.org/euclid.aos/1534492824_20180817040040Fri, 17 Aug 2018 04:00 EDTEmpirical best prediction under a nested error model with log transformationhttps://projecteuclid.org/euclid.aos/1534492825<strong>Isabel Molina</strong>, <strong>Nirian Martín</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1961--1993.</p><p><strong>Abstract:</strong><br/>
In regression models involving economic variables such as income, log transformation is typically taken to achieve approximate normality and stabilize the variance. However, often the interest is predicting individual values or means of the variable in the original scale. Under a nested error model for the log transformation of the target variable, we show that the usual approach of back transforming the predicted values may introduce a substantial bias. We obtain the optimal (or “best”) predictors of individual values of the original variable and of small area means under that model. Empirical best predictors are defined by estimating the unknown model parameters in the best predictors. When estimation is desired for subpopulations with small sample sizes (small areas), nested error models are widely used to “borrow strength” from the other areas and obtain estimators with greater efficiency than direct estimators based on the scarce area-specific data. We show that naive predictors of small area means obtained by back-transformation under the mentioned model may even underperform direct estimators. Moreover, assessing the uncertainty of the considered predictor is not straightforward. Exact mean squared errors of the best predictors and second-order approximations to the mean squared errors of the empirical best predictors are derived. Estimators of the mean squared errors that are second-order correct are also obtained. Simulation studies and an example with Mexican data on living conditions illustrate the procedures.
</p>projecteuclid.org/euclid.aos/1534492825_20180817040040Fri, 17 Aug 2018 04:00 EDTBackward nested descriptors asymptotics with inference on stem cell differentiationhttps://projecteuclid.org/euclid.aos/1534492826<strong>Stephan F. Huckemann</strong>, <strong>Benjamin Eltzner</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1994--2019.</p><p><strong>Abstract:</strong><br/>
For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them existence of a.s. local twice differentiable charts, asymptotic joint normality of a BNFD can be shown. If charts factor suitably, this leads to individual asymptotic normality for the last element, a principal nested mean or a principal nested geodesic, say. It turns out that these results pertain to principal nested spheres (PNS) and principal nested great subsphere (PNGS) analysis by Jung, Dryden and Marron [ Biometrika 99 (2012) 551–568] as well as to the intrinsic mean on a first geodesic principal component (IMo1GPC) for manifolds and Kendall’s shape spaces. A nested bootstrap two-sample test is derived and illustrated with simulations. In a study on real data, PNGS is applied to track early human mesenchymal stem cell differentiation over a coarse time grid and, among others, to locate a change point with direct consequences for the design of further studies.
</p>projecteuclid.org/euclid.aos/1534492826_20180817040040Fri, 17 Aug 2018 04:00 EDTChange-point detection in multinomial data with a large number of categorieshttps://projecteuclid.org/euclid.aos/1534492827<strong>Guanghui Wang</strong>, <strong>Changliang Zou</strong>, <strong>Guosheng Yin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2020--2044.</p><p><strong>Abstract:</strong><br/>
We consider a sequence of multinomial data for which the probabilities associated with the categories are subject to abrupt changes of unknown magnitudes at unknown locations. When the number of categories is comparable to or even larger than the number of subjects allocated to these categories, conventional methods such as the classical Pearson’s chi-squared test and the deviance test may not work well. Motivated by high-dimensional homogeneity tests, we propose a novel change-point detection procedure that allows the number of categories to tend to infinity. The null distribution of our test statistic is asymptotically normal and the test performs well with finite samples. The number of change-points is determined by minimizing a penalized objective function based on segmentation, and the locations of the change-points are estimated by minimizing the objective function with the dynamic programming algorithm. Under some mild conditions, the consistency of the estimators of multiple change-points is established. Simulation studies show that the proposed method performs satisfactorily for identifying change-points in terms of power and estimation accuracy, and it is illustrated with an analysis of a real data set.
</p>projecteuclid.org/euclid.aos/1534492827_20180817040040Fri, 17 Aug 2018 04:00 EDTLocal asymptotic normality property for fractional Gaussian noise under high-frequency observationshttps://projecteuclid.org/euclid.aos/1534492828<strong>Alexandre Brouste</strong>, <strong>Masaaki Fukasawa</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2045--2061.</p><p><strong>Abstract:</strong><br/>
Local Asymptotic Normality (LAN) property for fractional Gaussian noise under high-frequency observations is proved with nondiagonal rate matrices depending on the parameter to be estimated. In contrast to the LAN families in the literature, nondiagonal rate matrices are inevitable. As consequences of the LAN property, a maximum likelihood sequence of estimators is shown to be asymptotically efficient and the likelihood ratio test on the Hurst parameter is shown to be an asymptotically uniformly most powerful unbiased test for two-sided hypotheses.
</p>projecteuclid.org/euclid.aos/1534492828_20180817040040Fri, 17 Aug 2018 04:00 EDTGlobal testing against sparse alternatives under Ising modelshttps://projecteuclid.org/euclid.aos/1534492829<strong>Rajarshi Mukherjee</strong>, <strong>Sumit Mukherjee</strong>, <strong>Ming Yuan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2062--2093.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the effect of dependence on detecting sparse signals. In particular, we focus on global testing against sparse alternatives for the means of binary outcomes following an Ising model, and establish how the interplay between the strength and sparsity of a signal determines its detectability under various notions of dependence. The profound impact of dependence is best illustrated under the Curie–Weiss model where we observe the effect of a “thermodynamic” phase transition. In particular, the critical state exhibits a subtle “blessing of dependence” phenomenon in that one can detect much weaker signals at criticality than otherwise. Furthermore, we develop a testing procedure that is broadly applicable to account for dependence and show that it is asymptotically minimax optimal under fairly general regularity conditions.
</p>projecteuclid.org/euclid.aos/1534492829_20180817040040Fri, 17 Aug 2018 04:00 EDTPrincipal component analysis for second-order stationary vector time serieshttps://projecteuclid.org/euclid.aos/1534492830<strong>Jinyuan Chang</strong>, <strong>Bin Guo</strong>, <strong>Qiwei Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2094--2124.</p><p><strong>Abstract:</strong><br/>
We extend the principal component analysis (PCA) to second-order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a $p$-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When $p$ is large, an additional step is required to perform a permutation in terms of either maximum cross-correlations or FDR based on multiple tests. The asymptotic theory is established for both fixed $p$ and diverging $p$ when the sample size $n$ tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.
</p>projecteuclid.org/euclid.aos/1534492830_20180817040040Fri, 17 Aug 2018 04:00 EDTEstimation of a monotone density in $s$-sample biased sampling modelshttps://projecteuclid.org/euclid.aos/1534492831<strong>Kwun Chuen Gary Chan</strong>, <strong>Hok Kan Ling</strong>, <strong>Tony Sit</strong>, <strong>Sheung Chi Phillip Yam</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2125--2152.</p><p><strong>Abstract:</strong><br/>
We study the nonparametric estimation of a decreasing density function $g_{0}$ in a general $s$-sample biased sampling model with weight (or bias) functions $w_{i}$ for $i=1,\ldots,s$. The determination of the monotone maximum likelihood estimator $\hat{g}_{n}$ and its asymptotic distribution, except for the case when $s=1$, has been long missing in the literature due to certain nonstandard structures of the likelihood function, such as nonseparability and a lack of strictly positive second order derivatives of the negative of the log-likelihood function. The existence, uniqueness, self-characterization, consistency of $\hat{g}_{n}$ and its asymptotic distribution at a fixed point are established in this article. To overcome the barriers caused by nonstandard likelihood structures, for instance, we show the tightness of $\hat{g}_{n}$ via a purely analytic argument instead of an intrinsic geometric one and propose an indirect approach to attain the $\sqrt{n}$-rate of convergence of the linear functional $\int w_{i}\hat{g}_{n}$.
</p>projecteuclid.org/euclid.aos/1534492831_20180817040040Fri, 17 Aug 2018 04:00 EDTCommunity detection in degree-corrected block modelshttps://projecteuclid.org/euclid.aos/1534492832<strong>Chao Gao</strong>, <strong>Zongming Ma</strong>, <strong>Anderson Y. Zhang</strong>, <strong>Harrison H. Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2153--2185.</p><p><strong>Abstract:</strong><br/>
Community detection is a central problem of network data analysis. Given a network, the goal of community detection is to partition the network nodes into a small number of clusters, which could often help reveal interesting structures. The present paper studies community detection in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks of the problem for a misclassification proportion loss under appropriate conditions. The minimax risks are shown to depend on degree-correction parameters, community sizes and average within and between community connectivities in an intuitive and interpretable way. In addition, we propose a polynomial time algorithm to adaptively perform consistent and even asymptotically optimal community detection in DCBMs.
</p>projecteuclid.org/euclid.aos/1534492832_20180817040040Fri, 17 Aug 2018 04:00 EDTCLT for largest eigenvalues and unit root testing for high-dimensional nonstationary time serieshttps://projecteuclid.org/euclid.aos/1534492833<strong>Bo Zhang</strong>, <strong>Guangming Pan</strong>, <strong>Jiti Gao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2186--2215.</p><p><strong>Abstract:</strong><br/>
Let $\{Z_{ij}\}$ be independent and identically distributed (i.i.d.) random variables with $EZ_{ij}=0$, $E\vert Z_{ij}\vert^{2}=1$ and $E\vert Z_{ij}\vert^{4}<\infty$. Define linear processes $Y_{tj}=\sum_{k=0}^{\infty}b_{k}Z_{t-k,j}$ with $\sum_{i=0}^{\infty}\vert b_{i}\vert <\infty$. Consider a $p$-dimensional time series model of the form $\mathbf{x}_{t}=\boldsymbol{\Pi} \mathbf{x}_{t-1}+\Sigma^{1/2}\mathbf{y}_{t},\ 1\leq t\leq T$ with $\mathbf{y}_{t}=(Y_{t1},\ldots,Y_{tp})'$ and $\Sigma^{1/2}$ be the square root of a symmetric positive definite matrix. Let $\mathbf{B}=(1/p)\mathbf{XX}^{*}$ with $\mathbf{X}=(\mathbf{x_{1}},\ldots,\mathbf{x_{T}})'$ and $X^{*}$ be the conjugate transpose. This paper establishes both the convergence in probability and the asymptotic joint distribution of the first $k$ largest eigenvalues of $\mathbf{B}$ when $\mathbf{x}_{t}$ is nonstationary. As an application, two new unit root tests for possible nonstationarity of high-dimensional time series are proposed and then studied both theoretically and numerically.
</p>projecteuclid.org/euclid.aos/1534492833_20180817040040Fri, 17 Aug 2018 04:00 EDTSmooth backfitting for errors-in-variables additive modelshttps://projecteuclid.org/euclid.aos/1534492834<strong>Kyunghee Han</strong>, <strong>Byeong U. Park</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2216--2250.</p><p><strong>Abstract:</strong><br/>
In this work, we develop a new smooth backfitting method and theory for estimating additive nonparametric regression models when the covariates are contaminated by measurement errors. For this, we devise a new kernel function that suitably deconvolutes the bias due to measurement errors as well as renders a projection interpretation to the resulting estimator in the space of additive functions. The deconvolution property and the projection interpretation are essential for a successful solution of the problem. We prove that the method based on the new kernel weighting scheme achieves the optimal rate of convergence in one-dimensional deconvolution problems when the smoothness of measurement error distribution is less than a threshold value. We find that the speed of convergence is slower than the univariate rate when the smoothness of measurement error distribution is above the threshold, but it is still much faster than the optimal rate in multivariate deconvolution problems. The theory requires a deliberate analysis of the nonnegligible effects of measurement errors being propagated to other additive components through backfitting operation. We present the finite sample performance of the deconvolution smooth backfitting estimators that confirms our theoretical findings.
</p>projecteuclid.org/euclid.aos/1534492834_20180817040040Fri, 17 Aug 2018 04:00 EDTUnifying Markov properties for graphical modelshttps://projecteuclid.org/euclid.aos/1534492835<strong>Steffen Lauritzen</strong>, <strong>Kayvan Sadeghi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2251--2278.</p><p><strong>Abstract:</strong><br/>
Several types of graphs with different conditional independence interpretations—also known as Markov properties—have been proposed and used in graphical models. In this paper, we unify these Markov properties by introducing a class of graphs with four types of edges—lines, arrows, arcs and dotted lines—and a single separation criterion. We show that independence structures defined by this class specialize to each of the previously defined cases, when suitable subclasses of graphs are considered. In addition, we define a pairwise Markov property for the subclass of chain mixed graphs, which includes chain graphs with the LWF interpretation, as well as summary graphs (and consequently ancestral graphs). We prove the equivalence of this pairwise Markov property to the global Markov property for compositional graphoid independence models.
</p>projecteuclid.org/euclid.aos/1534492835_20180817040040Fri, 17 Aug 2018 04:00 EDTAdaptation in log-concave density estimationhttps://projecteuclid.org/euclid.aos/1534492836<strong>Arlene K. H. Kim</strong>, <strong>Adityanand Guntuboyina</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2279--2306.</p><p><strong>Abstract:</strong><br/>
The log-concave maximum likelihood estimator of a density on the real line based on a sample of size $n$ is known to attain the minimax optimal rate of convergence of $O(n^{-4/5})$ with respect to, for example, squared Hellinger distance. In this paper, we show that it also enjoys attractive adaptation properties, in the sense that it achieves a faster rate of convergence when the logarithm of the true density is $k$-affine (i.e., made up of $k$ affine pieces), or close to $k$-affine, provided in each case that $k$ is not too large. Our results use two different techniques: the first relies on a new Marshall’s inequality for log-concave density estimation, and reveals that when the true density is close to log-linear on its support, the log-concave maximum likelihood estimator can achieve the parametric rate of convergence in total variation distance. Our second approach depends on local bracketing entropy methods, and allows us to prove a sharp oracle inequality, which implies in particular a risk bound with respect to various global loss functions, including Kullback–Leibler divergence, of $O(\frac{k}{n}\log^{5/4}(en/k))$ when the true density is log-concave and its logarithm is close to $k$-affine.
</p>projecteuclid.org/euclid.aos/1534492836_20180817040040Fri, 17 Aug 2018 04:00 EDTWeak convergence of a pseudo maximum likelihood estimator for the extremal indexhttps://projecteuclid.org/euclid.aos/1534492837<strong>Betina Berghaus</strong>, <strong>Axel Bücher</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2307--2335.</p><p><strong>Abstract:</strong><br/>
The extremes of a stationary time series typically occur in clusters. A primary measure for this phenomenon is the extremal index, representing the reciprocal of the expected cluster size. Both disjoint and sliding blocks estimator for the extremal index are analyzed in detail. In contrast to many competitors, the estimators only depend on the choice of one parameter sequence. We derive an asymptotic expansion, prove asymptotic normality and show consistency of an estimator for the asymptotic variance. Explicit calculations in certain models and a finite-sample Monte Carlo simulation study reveal that the sliding blocks estimator outperforms other blocks estimators, and that it is competitive to runs- and inter-exceedance estimators in various models. The methods are applied to a variety of financial time series.
</p>projecteuclid.org/euclid.aos/1534492837_20180817040040Fri, 17 Aug 2018 04:00 EDTSemiparametric efficiency bounds for high-dimensional modelshttps://projecteuclid.org/euclid.aos/1534492838<strong>Jana Janková</strong>, <strong>Sara van de Geer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2336--2359.</p><p><strong>Abstract:</strong><br/>
Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper, we propose a framework for obtaining semiparametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semiparametric point of view: we concentrate on one-dimensional functions of a high-dimensional parameter. We follow two different approaches to reach the lower bounds: asymptotic Cramér–Rao bounds and Le Cam’s type of analysis. Both of these approaches allow us to define a class of asymptotically unbiased or “regular” estimators for which a lower bound is derived. Consequently, we show that certain estimators obtained by de-sparsifying (or de-biasing) an $\ell_{1}$-penalized M-estimator are asymptotically unbiased and achieve the lower bound on the variance: thus in this sense they are asymptotically efficient. The paper discusses in detail the linear regression model and the Gaussian graphical model.
</p>projecteuclid.org/euclid.aos/1534492838_20180817040040Fri, 17 Aug 2018 04:00 EDTLimit theorems for eigenvectors of the normalized Laplacian for random graphshttps://projecteuclid.org/euclid.aos/1534492839<strong>Minh Tang</strong>, <strong>Carey E. Priebe</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2360--2415.</p><p><strong>Abstract:</strong><br/>
We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and, furthermore, the mean and the covariance matrix of each row are functions of the associated vertex’s block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.
</p>projecteuclid.org/euclid.aos/1534492839_20180817040040Fri, 17 Aug 2018 04:00 EDTOptimality and sub-optimality of PCA I: Spiked random matrix modelshttps://projecteuclid.org/euclid.aos/1534492840<strong>Amelia Perry</strong>, <strong>Alexander S. Wein</strong>, <strong>Afonso S. Bandeira</strong>, <strong>Ankur Moitra</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2416--2451.</p><p><strong>Abstract:</strong><br/>
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or “spike”) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including nonspectral tests. Our results leverage Le Cam’s notion of contiguity and include:
(i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike.
(ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries.
(iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.
</p>projecteuclid.org/euclid.aos/1534492840_20180817040040Fri, 17 Aug 2018 04:00 EDTOn the exponentially weighted aggregate with the Laplace priorhttps://projecteuclid.org/euclid.aos/1534492841<strong>Arnak S. Dalalyan</strong>, <strong>Edwin Grappin</strong>, <strong>Quentin Paris</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2452--2478.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the statistical behaviour of the Exponentially Weighted Aggregate (EWA) in the problem of high-dimensional regression with fixed design. Under the assumption that the underlying regression vector is sparse, it is reasonable to use the Laplace distribution as a prior. The resulting estimator and, specifically, a particular instance of it referred to as the Bayesian lasso, was already used in the statistical literature because of its computational convenience, even though no thorough mathematical analysis of its statistical properties was carried out. The present work fills this gap by establishing sharp oracle inequalities for the EWA with the Laplace prior. These inequalities show that if the temperature parameter is small, the EWA with the Laplace prior satisfies the same type of oracle inequality as the lasso estimator does, as long as the quality of estimation is measured by the prediction loss. Extensions of the proposed methodology to the problem of prediction with low-rank matrices are considered.
</p>projecteuclid.org/euclid.aos/1534492841_20180817040040Fri, 17 Aug 2018 04:00 EDTGoodness-of-fit testing of error distribution in linear measurement error modelshttps://projecteuclid.org/euclid.aos/1534492842<strong>Hira L. Koul</strong>, <strong>Weixing Song</strong>, <strong>Xiaoqing Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2479--2510.</p><p><strong>Abstract:</strong><br/>
This paper investigates a class of goodness-of-fit tests for fitting an error density in linear regression models with measurement error in covariates. Each test statistic is the integrated square difference between the deconvolution kernel density estimator of the regression model error density and a smoothed version of the null error density, an analog of the so-called Bickel and Rosenblatt test statistic. The asymptotic null distributions of the proposed test statistics are derived for both the ordinary smooth and super smooth cases. The asymptotic power behavior of the proposed tests against a fixed alternative and a class of local nonparametric alternatives for both cases is also described. The finite sample performance of the proposed test is evaluated by a simulation study. The simulation study shows some superiority of the proposed test over some other tests. Finally, a real data is used to illustrate the proposed test.
</p>projecteuclid.org/euclid.aos/1534492842_20180817040040Fri, 17 Aug 2018 04:00 EDT