The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTCovariate balancing propensity score by tailored loss functionshttps://projecteuclid.org/euclid.aos/1547197245<strong>Qingyuan Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 965--993.</p><p><strong>Abstract:</strong><br/>
In observational studies, propensity scores are commonly estimated by maximum likelihood but may fail to balance high-dimensional pretreatment covariates even after specification search. We introduce a general framework that unifies and generalizes several recent proposals to improve covariate balance when designing an observational study. Instead of the likelihood function, we propose to optimize special loss functions—covariate balancing scoring rules (CBSR)—to estimate the propensity score. A CBSR is uniquely determined by the link function in the GLM and the estimand (a weighted average treatment effect). We show CBSR does not lose asymptotic efficiency in estimating the weighted average treatment effect compared to the Bernoulli likelihood, but CBSR is much more robust in finite samples. Borrowing tools developed in statistical learning, we propose practical strategies to balance covariate functions in rich function classes. This is useful to estimate the maximum bias of the inverse probability weighting (IPW) estimators and construct honest confidence intervals in finite samples. Lastly, we provide several numerical examples to demonstrate the tradeoff of bias and variance in the IPW-type estimators and the tradeoff in balancing different function classes of the covariates.
</p>projecteuclid.org/euclid.aos/1547197245_20190111040129Fri, 11 Jan 2019 04:01 ESTThe geometry of hypothesis testing over convex cones: Generalized likelihood ratio tests and minimax radiihttps://projecteuclid.org/euclid.aos/1547197246<strong>Yuting Wei</strong>, <strong>Martin J. Wainwright</strong>, <strong>Adityanand Guntuboyina</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 994--1024.</p><p><strong>Abstract:</strong><br/>
We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arises in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing and shape-constrained inference in nonparametric statistics. We provide a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones. When applied to concrete examples, this result reveals some interesting phenomena that do not arise in the analogous problems of estimation under convex constraints. In particular, in contrast to estimation error, the testing error no longer depends purely on the problem complexity via a volume-based measure (such as metric entropy or Gaussian complexity); other geometric properties of the cones also play an important role. In order to address the issue of optimality, we prove information-theoretic lower bounds for the minimax testing radius again in terms of geometric quantities. Our general theorems are illustrated by examples including the cases of monotone and orthant cones, and involve some results of independent interest.
</p>projecteuclid.org/euclid.aos/1547197246_20190111040129Fri, 11 Jan 2019 04:01 ESTNonparametric implied Lévy densitieshttps://projecteuclid.org/euclid.aos/1547197247<strong>Likuan Qin</strong>, <strong>Viktor Todorov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1025--1060.</p><p><strong>Abstract:</strong><br/>
This paper develops a nonparametric estimator for the Lévy density of an asset price, following an Itô semimartingale, implied by short-maturity options. The asymptotic setup is one in which the time to maturity of the available options decreases, the mesh of the available strike grid shrinks and the strike range expands. The estimation is based on aggregating the observed option data into nonparametric estimates of the conditional characteristic function of the return distribution, the derivatives of which allow to infer the Fourier transform of a known transform of the Lévy density in a way which is robust to the level of the unknown diffusive volatility of the asset price. The Lévy density estimate is then constructed via Fourier inversion. We derive an asymptotic bound for the integrated squared error of the estimator in the general case as well as its probability limit in the special Lévy case. We further show rate optimality of our Lévy density estimator in a minimax sense. An empirical application to market index options reveals relative stability of the left tail decay during high and low volatility periods.
</p>projecteuclid.org/euclid.aos/1547197247_20190111040129Fri, 11 Jan 2019 04:01 ESTOn model selection from a finite family of possibly misspecified time series modelshttps://projecteuclid.org/euclid.aos/1547197248<strong>Hsiang-Ling Hsu</strong>, <strong>Ching-Kang Ing</strong>, <strong>Howell Tong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1061--1087.</p><p><strong>Abstract:</strong><br/>
Consider finite parametric time series models. “I have $n$ observations and $k$ models, which model should I choose on the basis of the data alone” is a frequently asked question in many practical situations. This poses the key problem of selecting a model from a collection of candidate models, none of which is necessarily the true data generating process (DGP). Although existing literature on model selection is vast, there is a serious lacuna in that the above problem does not seem to have received much attention. In fact, existing model selection criteria have avoided addressing the above problem directly, either by assuming that the true DGP is included among the candidate models and aiming at choosing this DGP, or by assuming that the true DGP can be asymptotically approximated by an increasing sequence of candidate models and aiming at choosing the candidate having the best predictive capability in some asymptotic sense. In this article, we propose a misspecification-resistant information criterion (MRIC) to address the key problem directly. We first prove the asymptotic efficiency of MRIC whether the true DGP is among the candidates or not, within the fixed-dimensional framework. We then extend this result to the high-dimensional case in which the number of candidate variables is much larger than the sample size. In particular, we show that MRIC can be used in conjunction with a high-dimensional model selection method to select the (asymptotically) best predictive model across several high-dimensional misspecified time series models.
</p>projecteuclid.org/euclid.aos/1547197248_20190111040129Fri, 11 Jan 2019 04:01 ESTEstimating the algorithmic variance of randomized ensembles via the bootstraphttps://projecteuclid.org/euclid.aos/1547197249<strong>Miles E. Lopes</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1088--1112.</p><p><strong>Abstract:</strong><br/>
Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is “large enough”—so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of “algorithmic variance” (i.e., the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable $\mathrm{ERR}_{t}$ denote the prediction error of a randomized ensemble of size $t$. Working under a “first-order model” for randomized ensembles, we prove that the centered law of $\mathrm{ERR}_{t}$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $\mathrm{ERR}_{t}$ are negligible.
</p>projecteuclid.org/euclid.aos/1547197249_20190111040129Fri, 11 Jan 2019 04:01 ESTEfficient nonparametric Bayesian inference for $X$-ray transformshttps://projecteuclid.org/euclid.aos/1547197250<strong>François Monard</strong>, <strong>Richard Nickl</strong>, <strong>Gabriel P. Paternain</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1113--1147.</p><p><strong>Abstract:</strong><br/>
We consider the statistical inverse problem of recovering a function $f:M\to \mathbb{R}$, where $M$ is a smooth compact Riemannian manifold with boundary, from measurements of general $X$-ray transforms $I_{a}(f)$ of $f$, corrupted by additive Gaussian noise. For $M$ equal to the unit disk with “flat” geometry and $a=0$ this reduces to the standard Radon transform, but our general setting allows for anisotropic media $M$ and can further model local “attenuation” effects—both highly relevant in practical imaging problems such as SPECT tomography. We study a nonparametric Bayesian inference method based on standard Gaussian process priors for $f$. The posterior reconstruction of $f$ corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty that does not require the calculation of the singular value decomposition of the forward operator $I_{a}$. We prove Bernstein–von Mises theorems for a large family of one-dimensional linear functionals of $f$, and they entail that posterior-based inferences such as credible sets are valid and optimal from a frequentist point of view. In particular we derive the asymptotic distribution of smooth linear functionals of the Tikhonov regulariser, which attains the semiparametric information lower bound. The proofs rely on an invertibility result for the “Fisher information” operator $I_{a}^{*}I_{a}$ between suitable function spaces, a result of independent interest that relies on techniques from microlocal analysis. We illustrate the performance of the proposed method via simulations in various settings.
</p>projecteuclid.org/euclid.aos/1547197250_20190111040129Fri, 11 Jan 2019 04:01 ESTGeneralized random forestshttps://projecteuclid.org/euclid.aos/1547197251<strong>Susan Athey</strong>, <strong>Julie Tibshirani</strong>, <strong>Stefan Wager</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1148--1178.</p><p><strong>Abstract:</strong><br/>
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [ Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
</p>projecteuclid.org/euclid.aos/1547197251_20190111040129Fri, 11 Jan 2019 04:01 ESTA classification criterion for definitive screening designshttps://projecteuclid.org/euclid.aos/1547197252<strong>Eric D. Schoen</strong>, <strong>Pieter T. Eendebak</strong>, <strong>Peter Goos</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1179--1202.</p><p><strong>Abstract:</strong><br/>
A conference design is a rectangular matrix with orthogonal columns, one zero in each column, at most one zero in each row and $-1$’s and $+1$’s elsewhere. A definitive screening design can be constructed by folding over a conference design and adding a row vector of zeroes. We prove that, for a given even number of rows, there is just one isomorphism class for conference designs with two or three columns. Next, we derive all isomorphism classes for conference designs with four columns. Based on our results, we propose a classification criterion for definitive screening designs founded on projections into four factors. We illustrate the potential of the criterion by studying designs with 24 and 82 factors.
</p>projecteuclid.org/euclid.aos/1547197252_20190111040129Fri, 11 Jan 2019 04:01 ESTApproximating faces of marginal polytopes in discrete hierarchical modelshttps://projecteuclid.org/euclid.aos/1550026834<strong>Nanwei Wang</strong>, <strong>Johannes Rauh</strong>, <strong>Hélène Massam</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1203--1233.</p><p><strong>Abstract:</strong><br/>
The existence of the maximum likelihood estimate in a hierarchical log-linear model is crucial to the reliability of inference for this model. Determining whether the estimate exists is equivalent to finding whether the sufficient statistics vector $t$ belongs to the boundary of the marginal polytope of the model. The dimension of the smallest face $\mathbf{F}_{t}$ containing $t$ determines the dimension of the reduced model which should be considered for correct inference. For higher-dimensional problems, it is not possible to compute $\mathbf{F}_{t}$ exactly. Massam and Wang (2015) found an outer approximation to $\mathbf{F}_{t}$ using a collection of submodels of the original model. This paper refines the methodology to find an outer approximation and devises a new methodology to find an inner approximation. The inner approximation is given not in terms of a face of the marginal polytope, but in terms of a subset of the vertices of $\mathbf{F}_{t}$.
Knowing $\mathbf{F}_{t}$ exactly indicates which cell probabilities have maximum likelihood estimates equal to $0$. When $\mathbf{F}_{t}$ cannot be obtained exactly, we can use, first, the outer approximation $\mathbf{F}_{2}$ to reduce the dimension of the problem and then the inner approximation $\mathbf{F}_{1}$ to obtain correct estimates of cell probabilities corresponding to elements of $\mathbf{F}_{1}$ and improve the estimates of the remaining probabilities corresponding to elements in $\mathbf{F}_{2}\setminus\mathbf{F}_{1}$. Using both real-world and simulated data, we illustrate our results, and show that our methodology scales to high dimensions.
</p>projecteuclid.org/euclid.aos/1550026834_20190212220137Tue, 12 Feb 2019 22:01 ESTCHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimalityhttps://projecteuclid.org/euclid.aos/1550026835<strong>T. Tony Cai</strong>, <strong>Jing Ma</strong>, <strong>Linjun Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1234--1267.</p><p><strong>Abstract:</strong><br/>
Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance.
Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.
</p>projecteuclid.org/euclid.aos/1550026835_20190212220137Tue, 12 Feb 2019 22:01 ESTExponential ergodicity of the bouncy particle samplerhttps://projecteuclid.org/euclid.aos/1550026836<strong>George Deligiannidis</strong>, <strong>Alexandre Bouchard-Côté</strong>, <strong>Arnaud Doucet</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1268--1287.</p><p><strong>Abstract:</strong><br/>
Nonreversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, geometric ergodicity results for such schemes have only been established so far under very restrictive assumptions. We give here verifiable conditions on the target distribution under which the Bouncy Particle Sampler algorithm introduced in [ Phys. Rev. E 85 (2012) 026703, 1671–1691] is geometrically ergodic and we provide a central limit theorem for the associated ergodic averages. This holds essentially whenever the target satisfies a curvature condition and the growth of the negative logarithm of the target is at least linear and at most quadratic. For target distributions with thinner tails, we propose an original modification of this scheme that is geometrically ergodic. For targets with thicker tails, we extend the idea pioneered in [ Ann. Statist. 40 (2012) 3050–3076] in a random walk Metropolis context. We establish geometric ergodicity of the Bouncy Particle Sampler with respect to an appropriate transformation of the target. Mapping the resulting process back to the original parameterization, we obtain a geometrically ergodic piecewise deterministic Markov process.
</p>projecteuclid.org/euclid.aos/1550026836_20190212220137Tue, 12 Feb 2019 22:01 ESTThe Zig-Zag process and super-efficient sampling for Bayesian analysis of big datahttps://projecteuclid.org/euclid.aos/1550026838<strong>Joris Bierkens</strong>, <strong>Paul Fearnhead</strong>, <strong>Gareth Roberts</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1288--1320.</p><p><strong>Abstract:</strong><br/>
Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multidimensional version of the Zig-Zag process of [ Ann. Appl. Probab. 27 (2017) 846–882], a continuous-time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the Zig-Zag process offers a flexible nonreversible alternative which we observe to often have favourable convergence properties. We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme , that is, the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then the Zig-Zag process can be super-efficient: after an initial preprocessing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.
</p>projecteuclid.org/euclid.aos/1550026838_20190212220137Tue, 12 Feb 2019 22:01 ESTEstimation of large covariance and precision matrices from temporally dependent observationshttps://projecteuclid.org/euclid.aos/1550026839<strong>Hai Shu</strong>, <strong>Bin Nan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1321--1350.</p><p><strong>Abstract:</strong><br/>
We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_{1}$ minimization and the $\ell_{1}$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.
</p>projecteuclid.org/euclid.aos/1550026839_20190212220137Tue, 12 Feb 2019 22:01 ESTBootstrap tuning in Gaussian ordered model selectionhttps://projecteuclid.org/euclid.aos/1550026841<strong>Vladimir Spokoiny</strong>, <strong>Niklas Willrich</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1351--1380.</p><p><strong>Abstract:</strong><br/>
The paper focuses on the problem of model selection in linear Gaussian regression with unknown possibly inhomogeneous noise. For a given family of linear estimators $\{\widetilde{\boldsymbol{{\theta}}}_{m},m\in\mathscr{M}\}$, ordered by their variance, we offer a new “smallest accepted” approach motivated by Lepski’s device and the multiple testing idea. The procedure selects the smallest model which satisfies the acceptance rule based on comparison with all larger models. The method is completely data-driven and does not use any prior information about the variance structure of the noise: its parameters are adjusted to the underlying possibly heterogeneous noise by the so-called “propagation condition” using a wild bootstrap method. The validity of the bootstrap calibration is proved for finite samples with an explicit error bound. We provide a comprehensive theoretical study of the method, describe in details the set of possible values of the selected model $\widehat{m}\in\mathscr{M}$ and establish some oracle error bounds for the corresponding estimator $\widehat{\boldsymbol{{\theta}}}=\widetilde{\boldsymbol{{\theta}}}_{\widehat{m}}$.
</p>projecteuclid.org/euclid.aos/1550026841_20190212220137Tue, 12 Feb 2019 22:01 ESTSequential change-point detection based on nearest neighborshttps://projecteuclid.org/euclid.aos/1550026842<strong>Hao Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1381--1407.</p><p><strong>Abstract:</strong><br/>
We propose a new framework for the detection of change-points in online, sequential data analysis. The approach utilizes nearest neighbor information and can be applied to sequences of multivariate observations or non-Euclidean data objects, such as network data. Different stopping rules are explored, and one specific rule is recommended due to its desirable properties. An accurate analytic approximation of the average run length is derived for the recommended rule, making it an easy off-the-shelf approach for real multivariate/object sequential data monitoring applications. Simulations reveal that the new approach has better performance than likelihood-based approaches for high dimensional data. The new approach is illustrated through a real dataset in detecting global structural changes in social networks.
</p>projecteuclid.org/euclid.aos/1550026842_20190212220137Tue, 12 Feb 2019 22:01 ESTPrediction when fitting simple models to high-dimensional datahttps://projecteuclid.org/euclid.aos/1550026843<strong>Lukas Steinberger</strong>, <strong>Hannes Leeb</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1408--1442.</p><p><strong>Abstract:</strong><br/>
We study linear subset regression in the context of a high-dimensional linear model. Consider $y=\vartheta +\theta 'z+\epsilon $ with univariate response $y$ and a $d$-vector of random regressors $z$, and a submodel where $y$ is regressed on a set of $p$ explanatory variables that are given by $x=M'z$, for some $d\times p$ matrix $M$. Here, “high-dimensional” means that the number $d$ of available explanatory variables in the overall model is much larger than the number $p$ of variables in the submodel. In this paper, we present Pinsker-type results for prediction of $y$ given $x$. In particular, we show that the mean squared prediction error of the best linear predictor of $y$ given $x$ is close to the mean squared prediction error of the corresponding Bayes predictor $\mathbb{E}[y\|x]$, provided only that $p/\log d$ is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from $n$ independent observations of $(y,x)$ is close to that of the Bayes predictor, provided only that both $p/\log d$ and $p/n$ are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables $z$.
</p>projecteuclid.org/euclid.aos/1550026843_20190212220137Tue, 12 Feb 2019 22:01 ESTTwo-sample and ANOVA tests for high dimensional meanshttps://projecteuclid.org/euclid.aos/1550026845<strong>Song Xi Chen</strong>, <strong>Jun Li</strong>, <strong>Ping-Shou Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1443--1474.</p><p><strong>Abstract:</strong><br/>
This paper considers testing the equality of two high dimensional means. Two approaches are utilized to formulate $L_{2}$-type tests for better power performance when the two high dimensional mean vectors differ only in sparsely populated coordinates and the differences are faint. One is to conduct thresholding to remove the nonsignal bearing dimensions for variance reduction of the test statistics. The other is to transform the data via the precision matrix for signal enhancement. It is shown that the thresholding and data transformation lead to attractive detection boundaries for the tests. Furthermore, we demonstrate explicitly the effects of precision matrix estimation on the detection boundary for the test with thresholding and data transformation. Extension to multi-sample ANOVA tests is also investigated. Numerical studies are performed to confirm the theoretical findings and demonstrate the practical implementations.
</p>projecteuclid.org/euclid.aos/1550026845_20190212220137Tue, 12 Feb 2019 22:01 ESTValid confidence intervals for post-model-selection predictorshttps://projecteuclid.org/euclid.aos/1550026846<strong>François Bachoc</strong>, <strong>Hannes Leeb</strong>, <strong>Benedikt M. Pötscher</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1475--1504.</p><p><strong>Abstract:</strong><br/>
We consider inference post-model-selection in linear regression. In this setting, Berk et al. [ Ann. Statist. 41 (2013a) 802–837] recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain nonstandard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to confidence intervals for post-model-selection predictors.
</p>projecteuclid.org/euclid.aos/1550026846_20190212220137Tue, 12 Feb 2019 22:01 ESTA robust and efficient approach to causal inference based on sparse sufficient dimension reductionhttps://projecteuclid.org/euclid.aos/1550026847<strong>Shujie Ma</strong>, <strong>Liping Zhu</strong>, <strong>Zhiwei Zhang</strong>, <strong>Chih-Ling Tsai</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1505--1535.</p><p><strong>Abstract:</strong><br/>
A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.
</p>projecteuclid.org/euclid.aos/1550026847_20190212220137Tue, 12 Feb 2019 22:01 ESTThe maximum likelihood threshold of a path diagramhttps://projecteuclid.org/euclid.aos/1550026848<strong>Mathias Drton</strong>, <strong>Christopher Fox</strong>, <strong>Andreas Käufl</strong>, <strong>Guillaume Pouliot</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1536--1553.</p><p><strong>Abstract:</strong><br/>
Linear structural equation models postulate noisy linear relationships between variables of interest. Each model corresponds to a path diagram, which is a mixed graph with directed edges that encode the domains of the linear functions and bidirected edges that indicate possible correlations among noise terms. Using this graphical representation, we determine the maximum likelihood threshold, that is, the minimum sample size at which the likelihood function of a Gaussian structural equation model is almost surely bounded. Our result allows the model to have feedback loops and is based on decomposing the path diagram with respect to the connected components of its bidirected part. We also prove that if the sample size is below the threshold, then the likelihood function is almost surely unbounded. Our work clarifies, in particular, that standard likelihood inference is applicable to sparse high-dimensional models even if they feature feedback loops.
</p>projecteuclid.org/euclid.aos/1550026848_20190212220137Tue, 12 Feb 2019 22:01 ESTConvex regularization for high-dimensional multiresponse tensor regressionhttps://projecteuclid.org/euclid.aos/1550026849<strong>Garvesh Raskutti</strong>, <strong>Ming Yuan</strong>, <strong>Han Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1554--1584.</p><p><strong>Abstract:</strong><br/>
In this paper, we present a general convex optimization approach for solving high-dimensional multiple response tensor regression problems under low-dimensional structural assumptions. We consider using convex and weakly decomposable regularizers assuming that the underlying tensor lies in an unknown low-dimensional subspace. Within our framework, we derive general risk bounds of the resulting estimate under fairly general dependence structure among covariates. Our framework leads to upper bounds in terms of two very simple quantities, the Gaussian width of a convex set in tensor space and the intrinsic dimension of the low-dimensional tensor subspace. To the best of our knowledge, this is the first general framework that applies to multiple response problems. These general bounds provide useful upper bounds on rates of convergence for a number of fundamental statistical models of interest including multiresponse regression, vector autoregressive models, low-rank tensor models and pairwise interaction models. Moreover, in many of these settings we prove that the resulting estimates are minimax optimal. We also provide a numerical study that both validates our theoretical guarantees and demonstrates the breadth of our framework.
</p>projecteuclid.org/euclid.aos/1550026849_20190212220137Tue, 12 Feb 2019 22:01 ESTLarge sample theory for merged data from multiple sourceshttps://projecteuclid.org/euclid.aos/1550026850<strong>Takumi Saegusa</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1585--1615.</p><p><strong>Abstract:</strong><br/>
We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional $M$-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.
</p>projecteuclid.org/euclid.aos/1550026850_20190212220137Tue, 12 Feb 2019 22:01 ESTKhinchine’s theorem and Edgeworth approximations for weighted sumshttps://projecteuclid.org/euclid.aos/1550026851<strong>Sergey G. Bobkov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1616--1633.</p><p><strong>Abstract:</strong><br/>
Let $F_{n}$ denote the distribution function of the normalized sum of $n$ i.i.d. random variables. In this paper, polynomial rates of approximation of $F_{n}$ by the corrected normal laws are considered in the model where the underlying distribution has a convolution structure. As a basic tool, the convergence part of Khinchine’s theorem in metric theory of Diophantine approximations is extended to the class of product characteristic functions.
</p>projecteuclid.org/euclid.aos/1550026851_20190212220137Tue, 12 Feb 2019 22:01 ESTDistributed inference for quantile regression processeshttps://projecteuclid.org/euclid.aos/1550026852<strong>Stanislav Volgushev</strong>, <strong>Shih-Kang Chao</strong>, <strong>Guang Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1634--1662.</p><p><strong>Abstract:</strong><br/>
The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from subsamples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.
</p>projecteuclid.org/euclid.aos/1550026852_20190212220137Tue, 12 Feb 2019 22:01 ESTGaussian approximation of maxima of Wiener functionals and its application to high-frequency datahttps://projecteuclid.org/euclid.aos/1550026853<strong>Yuta Koike</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1663--1687.</p><p><strong>Abstract:</strong><br/>
This paper establishes an upper bound for the Kolmogorov distance between the maximum of a high-dimensional vector of smooth Wiener functionals and the maximum of a Gaussian random vector. As a special case, we show that the maximum of multiple Wiener–Itô integrals with common orders is well approximated by its Gaussian analog in terms of the Kolmogorov distance if their covariance matrices are close to each other and the maximum of the fourth cumulants of the multiple Wiener–Itô integrals is close to zero. This may be viewed as a new kind of fourth moment phenomenon, which has attracted considerable attention in the recent studies of probability. This type of Gaussian approximation result has many potential applications to statistics. To illustrate this point, we present two statistical applications in high-frequency financial econometrics: One is the hypothesis testing problem for the absence of lead-lag effects and the other is the construction of uniform confidence bands for spot volatility.
</p>projecteuclid.org/euclid.aos/1550026853_20190212220137Tue, 12 Feb 2019 22:01 ESTCausal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventionshttps://projecteuclid.org/euclid.aos/1550026854<strong>Dominik Rothenhäusler</strong>, <strong>Peter Bühlmann</strong>, <strong>Nicolai Meinshausen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1688--1722.</p><p><strong>Abstract:</strong><br/>
Causal inference is known to be very challenging when only observational data are available. Randomized experiments are often costly and impractical and in instrumental variable regression the number of instruments has to exceed the number of causal predictors. It was recently shown in Peters, Bühlmann and Meinshausen (2016) ( J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947–1012) that causal inference for the full model is possible when data from distinct observational environments are available, exploiting that the conditional distribution of a response variable is invariant under the correct causal model. Two shortcomings of such an approach are the high computational effort for large-scale data and the assumed absence of hidden confounders. Here, we show that these two shortcomings can be addressed if one is willing to make a more restrictive assumption on the type of interventions that generate different environments. Thereby, we look at a different notion of invariance, namely inner-product invariance. By avoiding a computationally cumbersome reverse-engineering approach such as in Peters, Bühlmann and Meinshausen (2016), it allows for large-scale causal inference in linear structural equation models. We discuss identifiability conditions for the causal parameter and derive asymptotic confidence intervals in the low-dimensional setting. In the case of nonidentifiability, we show that the solution set of causal Dantzig has predictive guarantees under certain interventions. We derive finite-sample bounds in the high-dimensional setting and investigate its performance on simulated datasets.
</p>projecteuclid.org/euclid.aos/1550026854_20190212220137Tue, 12 Feb 2019 22:01 ESTNonpenalized variable selection in high-dimensional linear model settings via generalized fiducial inferencehttps://projecteuclid.org/euclid.aos/1550026855<strong>Jonathan P. Williams</strong>, <strong>Jan Hannig</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1723--1753.</p><p><strong>Abstract:</strong><br/>
Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design matrix is collinear. To overcome this challenge, an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. This new procedure is able to effectively account for linear dependencies among subsets of covariates in a high-dimensional setting where $p$ can grow almost exponentially in $n$, as well as in the classical setting where $p\le n$. It is shown that the procedure very naturally assigns small probabilities to subsets of covariates which include redundancies by way of explicit $L_{0}$ minimization. Furthermore, with a typical sparsity assumption, it is shown that the proposed method is consistent in the sense that the probability of the true sparse subset of covariates converges in probability to 1 as $n\to\infty$, or as $n\to\infty$ and $p\to\infty$. Very reasonable conditions are needed, and little restriction is placed on the class of possible subsets of covariates to achieve this consistency result.
</p>projecteuclid.org/euclid.aos/1550026855_20190212220137Tue, 12 Feb 2019 22:01 ESTSuper-resolution estimation of cyclic arrival rateshttps://projecteuclid.org/euclid.aos/1550026856<strong>Ningyuan Chen</strong>, <strong>Donald K. K. Lee</strong>, <strong>Sahand N. Negahban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1754--1775.</p><p><strong>Abstract:</strong><br/>
Exploiting the fact that most arrival processes exhibit cyclic behaviour, we propose a simple procedure for estimating the intensity of a nonhomogeneous Poisson process. The estimator is the super-resolution analogue to Shao (2010) and Shao and Lii [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 (2011) 99–122], which is a sum of $p$ sinusoids where $p$ and the amplitude and phase of each wave are not known and need to be estimated. This results in an interpretable yet flexible specification that is suitable for use in modelling as well as in high resolution simulations.
Our estimation procedure sits in between classic periodogram methods and atomic/total variation norm thresholding. Through a novel use of window functions in the point process domain, our approach attains super-resolution without semidefinite programming. Under suitable conditions, finite sample guarantees can be derived for our procedure. These resolve some open questions and expand existing results in spectral estimation literature.
</p>projecteuclid.org/euclid.aos/1550026856_20190212220137Tue, 12 Feb 2019 22:01 ESTSequential multiple testing with generalized error control: An asymptotic optimality theoryhttps://projecteuclid.org/euclid.aos/1550026857<strong>Yanglei Song</strong>, <strong>Georgios Fellouris</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1776--1803.</p><p><strong>Abstract:</strong><br/>
The sequential multiple testing problem is considered under two generalized error metrics. Under the first one, the probability of at least $k$ mistakes, of any kind, is controlled. Under the second, the probabilities of at least $k_{1}$ false positives and at least $k_{2}$ false negatives are simultaneously controlled. For each formulation, the optimal expected sample size is characterized, to a first-order asymptotic approximation as the error probabilities go to 0, and a novel multiple testing procedure is proposed and shown to be asymptotically efficient under every signal configuration. These results are established when the data streams for the various hypotheses are independent and each local log-likelihood ratio statistic satisfies a certain strong law of large numbers. In the special case of i.i.d. observations in each stream, the gains of the proposed sequential procedures over fixed-sample size schemes are quantified.
</p>projecteuclid.org/euclid.aos/1550026857_20190212220137Tue, 12 Feb 2019 22:01 ESTExact recovery in the Ising blockmodelhttps://projecteuclid.org/euclid.aos/1558425631<strong>Quentin Berthet</strong>, <strong>Philippe Rigollet</strong>, <strong>Piyush Srivastava</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1805--1834.</p><p><strong>Abstract:</strong><br/>
We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie–Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.
</p>projecteuclid.org/euclid.aos/1558425631_20190521040050Tue, 21 May 2019 04:00 EDTMaximum likelihood estimation in Gaussian models under total positivityhttps://projecteuclid.org/euclid.aos/1558425632<strong>Steffen Lauritzen</strong>, <strong>Caroline Uhler</strong>, <strong>Piotr Zwiernik</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1835--1863.</p><p><strong>Abstract:</strong><br/>
We analyze the problem of maximum likelihood estimation for Gaussian distributions that are multivariate totally positive of order two ($\mathrm{MTP}_{2}$). By exploiting connections to phylogenetics and single-linkage clustering, we give a simple proof that the maximum likelihood estimator (MLE) for such distributions exists based on $n\geq2$ observations, irrespective of the underlying dimension. Slawski and Hein [ Linear Algebra Appl. 473 (2015) 145–179], who first proved this result, also provided empirical evidence showing that the $\mathrm{MTP}_{2}$ constraint serves as an implicit regularizer and leads to sparsity in the estimated inverse covariance matrix, determining what we name the ML graph. We show that we can find an upper bound for the ML graph by adding edges corresponding to correlations in excess of those explained by the maximum weight spanning forest of the correlation matrix. Moreover, we provide globally convergent coordinate descent algorithms for calculating the MLE under the $\mathrm{MTP}_{2}$ constraint which are structurally similar to iterative proportional scaling. We conclude the paper with a discussion of signed $\mathrm{MTP}_{2}$ distributions.
</p>projecteuclid.org/euclid.aos/1558425632_20190521040050Tue, 21 May 2019 04:00 EDTMaximum likelihood estimation in transformed linear regression with nonnormal errorshttps://projecteuclid.org/euclid.aos/1558425633<strong>Xingwei Tong</strong>, <strong>Fuqing Gao</strong>, <strong>Kani Chen</strong>, <strong>Dingjiao Cai</strong>, <strong>Jianguo Sun</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1864--1892.</p><p><strong>Abstract:</strong><br/>
This paper discusses the transformed linear regression with non-normal error distributions, a problem that often occurs in many areas such as economics and social sciences as well as medical studies. The linear transformation model is an important tool in survival analysis partly due to its flexibility. In particular, it includes the Cox model and the proportional odds model as special cases when the error follows the extreme value distribution and the logistic distribution, respectively. Despite the popularity and generality of linear transformation models, however, there is no general theory on the maximum likelihood estimation of the regression parameter and the transformation function. One main difficulty for this is that the transformation function near the tails diverges to infinity and can be quite unstable. It affects the accuracy of the estimation of the transformation function and regression parameters. In this paper, we develop the maximum likelihood estimation approach and provide the near optimal conditions on the error distribution under which the consistency and asymptotic normality of the resulting estimators can be established. Extensive numerical studies suggest that the methodology works well, and an application to the data on a typhoon forecast is provided.
</p>projecteuclid.org/euclid.aos/1558425633_20190521040050Tue, 21 May 2019 04:00 EDTHypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rateshttps://projecteuclid.org/euclid.aos/1558425634<strong>Sivaraman Balakrishnan</strong>, <strong>Larry Wasserman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1893--1927.</p><p><strong>Abstract:</strong><br/>
We consider the goodness-of-fit testing problem of distinguishing whether the data are drawn from a specified distribution, versus a composite alternative separated from the null in the total variation metric. In the discrete case, we consider goodness-of-fit testing when the null distribution has a possibly growing or unbounded number of categories. In the continuous case, we consider testing a Hölder density with exponent $0<s\leq 1$, with possibly unbounded support, in the low-smoothness regime where the Hölder parameter is not assumed to be constant. In contrast to existing results, we show that the minimax rate and critical testing radius in these settings depend strongly, and in a precise way, on the null distribution being tested and this motivates the study of the (local) minimax rate as a function of the null distribution. For multinomials, the local minimax rate has been established in recent work. We revisit and extend these results and develop two modifications to the $\chi^{2}$-test whose performance we characterize. For testing Hölder densities, we show that the usual binning tests are inadequate in the low-smoothness regime and we design a spatially adaptive partitioning scheme that forms the basis for our locally minimax optimal tests. Furthermore, we provide the first local minimax lower bounds for this problem which yield a sharp characterization of the dependence of the critical radius on the null hypothesis being tested. In the low-smoothness regime, we also provide adaptive tests that adapt to the unknown smoothness parameter. We illustrate our results with a variety of simulations that demonstrate the practical utility of our proposed tests.
</p>projecteuclid.org/euclid.aos/1558425634_20190521040050Tue, 21 May 2019 04:00 EDTThe BLUE in continuous-time regression models with correlated errorshttps://projecteuclid.org/euclid.aos/1558425635<strong>Holger Dette</strong>, <strong>Andrey Pepelyshev</strong>, <strong>Anatoly Zhigljavsky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1928--1959.</p><p><strong>Abstract:</strong><br/>
In this paper, the problem of best linear unbiased estimation is investigated for continuous-time regression models. We prove several general statements concerning the explicit form of the best linear unbiased estimator (BLUE), in particular when the error process is a smooth process with one or several derivatives of the response process available for construction of the estimators. We derive the explicit form of the BLUE for many specific models including the cases of continuous autoregressive errors of order two and integrated error processes (such as integrated Brownian motion). The results are illustrated on many examples.
</p>projecteuclid.org/euclid.aos/1558425635_20190521040050Tue, 21 May 2019 04:00 EDTAdaptive-to-model checking for regressions with diverging number of predictorshttps://projecteuclid.org/euclid.aos/1558425636<strong>Falong Tan</strong>, <strong>Lixing Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1960--1994.</p><p><strong>Abstract:</strong><br/>
In this paper, we construct an adaptive-to-model residual-marked empirical process as the base of constructing a goodness-of-fit test for parametric single-index models with diverging number of predictors. To study the relevant asymptotic properties, we first investigate, under the null and alternative hypothesis, the estimation consistency and asymptotically linear representation of the nonlinear least squares estimator for the parameters of interest and then the convergence of the empirical process to a Gaussian process. We prove that under the null hypothesis the convergence of the process holds when the number of predictors diverges to infinity at a certain rate that can be of order, in some cases, $o(n^{1/3}/\log n)$ where $n$ is the sample size. The convergence is also studied under the local and global alternative hypothesis. These results are readily applied to other model checking problems. Further, by modifying the approach in the literature to suit the diverging dimension settings, we construct a martingale transformation and then the asymptotic properties of the test statistic are investigated. Numerical studies are conducted to examine the performance of the test.
</p>projecteuclid.org/euclid.aos/1558425636_20190521040050Tue, 21 May 2019 04:00 EDTNonparametric screening under conditional strictly convex loss for ultrahigh dimensional sparse datahttps://projecteuclid.org/euclid.aos/1558425637<strong>Xu Han</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 1995--2022.</p><p><strong>Abstract:</strong><br/>
Sure screening technique has been considered as a powerful tool to handle the ultrahigh dimensional variable selection problems, where the dimensionality $p$ and the sample size $n$ can satisfy the NP dimensionality $\log p=O(n^{a})$ for some $a>0$ [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 (2008) 849–911]. The current paper aims to simultaneously tackle the “universality” and “effectiveness” of sure screening procedures. For the “universality,” we develop a general and unified framework for nonparametric screening methods from a loss function perspective. Consider a loss function to measure the divergence of the response variable and the underlying nonparametric function of covariates. We newly propose a class of loss functions called conditional strictly convex loss, which contains, but is not limited to, negative log likelihood loss from one-parameter exponential families, exponential loss for binary classification and quantile regression loss. The sure screening property and model selection size control will be established within this class of loss functions. For the “effectiveness,” we focus on a goodness-of-fit nonparametric screening (Goffins) method under conditional strictly convex loss. Interestingly, we can achieve a better convergence probability of containing the true model compared with related literature. The superior performance of our proposed method has been further demonstrated by extensive simulation studies and some real scientific data example.
</p>projecteuclid.org/euclid.aos/1558425637_20190521040050Tue, 21 May 2019 04:00 EDTLocal stationarity and time-inhomogeneous Markov chainshttps://projecteuclid.org/euclid.aos/1558425638<strong>Lionel Truquet</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2023--2050.</p><p><strong>Abstract:</strong><br/>
A primary motivation of this contribution is to define new locally stationary Markov models for categorical or integer-valued data. For this initial purpose, we propose a new general approach for dealing with time-inhomogeneity that extends the local stationarity notion developed in the time series literature. We also introduce a probabilistic framework which is very flexible and allows us to consider a much larger class of Markov chain models on arbitrary state spaces, including most of the locally stationary autoregressive processes studied in the literature. We consider triangular arrays of time-inhomogeneous Markov chains, defined by some families of contracting and slowly-varying Markov kernels. The finite-dimensional distribution of such Markov chains can be approximated locally with the distribution of ergodic Markov chains and some mixing properties are also available for these triangular arrays. As a consequence of our results, some classical geometrically ergodic homogeneous Markov chain models have a locally stationary version, which lays the theoretical foundations for new statistical modeling. Statistical inference of finite-state Markov chains can be based on kernel smoothing and we provide a complete and fast implementation of such models, directly usable by the practitioners. We also illustrate the theory on a real data set. A central limit theorem for Markov chains on more general state spaces is also provided and illustrated with the statistical inference in INAR models, Poisson ARCH models and binary time series models. Additional examples such as locally stationary regime-switching or SETAR models are also discussed.
</p>projecteuclid.org/euclid.aos/1558425638_20190521040050Tue, 21 May 2019 04:00 EDTHigh-dimensional change-point detection under sparse alternativeshttps://projecteuclid.org/euclid.aos/1558425639<strong>Farida Enikeeva</strong>, <strong>Zaid Harchaoui</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2051--2079.</p><p><strong>Abstract:</strong><br/>
We consider the problem of detecting a change in mean in a sequence of high-dimensional Gaussian vectors. The change in mean may be occurring simultaneously in an unknown subset components. We propose a hypothesis test to detect the presence of a change-point and establish the detection boundary in different regimes under the assumption that the dimension tends to infinity and the length of the sequence grows with the dimension. A remarkable feature of the proposed test is that it does not require any knowledge of the subset of components in which the change in mean is occurring and yet automatically adapts to yield optimal rates of convergence over a wide range of statistical regimes.
</p>projecteuclid.org/euclid.aos/1558425639_20190521040050Tue, 21 May 2019 04:00 EDTPerturbation bootstrap in adaptive Lassohttps://projecteuclid.org/euclid.aos/1558425640<strong>Debraj Das</strong>, <strong>Karl Gregory</strong>, <strong>S. N. Lahiri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2080--2116.</p><p><strong>Abstract:</strong><br/>
The Adaptive Lasso (Alasso) was proposed by Zou [ J. Amer. Statist. Assoc. 101 (2006) 1418–1429] as a modification of the Lasso for the purpose of simultaneous variable selection and estimation of the parameters in a linear regression model. Zou [ J. Amer. Statist. Assoc. 101 (2006) 1418–1429] established that the Alasso estimator is variable-selection consistent as well as asymptotically Normal in the indices corresponding to the nonzero regression coefficients in certain fixed-dimensional settings. In an influential paper, Minnier, Tian and Cai [ J. Amer. Statist. Assoc. 106 (2011) 1371–1382] proposed a perturbation bootstrap method and established its distributional consistency for the Alasso estimator in the fixed-dimensional setting. In this paper, however, we show that this (naive) perturbation bootstrap fails to achieve second-order correctness in approximating the distribution of the Alasso estimator. We propose a modification to the perturbation bootstrap objective function and show that a suitably Studentized version of our modified perturbation bootstrap Alasso estimator achieves second-order correctness even when the dimension of the model is allowed to grow to infinity with the sample size. As a consequence, inferences based on the modified perturbation bootstrap will be more accurate than the inferences based on the oracle Normal approximation. We give simulation studies demonstrating good finite-sample properties of our modified perturbation bootstrap method as well as an illustration of our method on a real data set.
</p>projecteuclid.org/euclid.aos/1558425640_20190521040050Tue, 21 May 2019 04:00 EDTEstimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functionshttps://projecteuclid.org/euclid.aos/1558425641<strong>Pierre Alquier</strong>, <strong>Vincent Cottet</strong>, <strong>Guillaume Lecué</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2117--2144.</p><p><strong>Abstract:</strong><br/>
We obtain estimation error rates and sharp oracle inequalities for regularization procedures of the form \begin{equation*}\hat{f}\in\mathop{\operatorname{argmin}}_{f\in F}\Bigg(\frac{1}{N}\sum_{i=1}^{N}\ell_{f}(X_{i},Y_{i})+\lambda \Vert f\Vert \Bigg)\end{equation*} when $\Vert \cdot \Vert $ is any norm, $F$ is a convex class of functions and $\ell$ is a Lipschitz loss function satisfying a Bernstein condition over $F$. We explore both the bounded and sub-Gaussian stochastic frameworks for the distribution of the $f(X_{i})$’s, with no assumption on the distribution of the $Y_{i}$’s. The general results rely on two main objects: a complexity function and a sparsity equation, that depend on the specific setting in hand (loss $\ell$ and norm $\Vert \cdot \Vert $).
As a proof of concept, we obtain minimax rates of convergence in the following problems: (1) matrix completion with any Lipschitz loss function, including the hinge and logistic loss for the so-called 1-bit matrix completion instance of the problem, and quantile losses for the general case, which enables to estimate any quantile on the entries of the matrix; (2) logistic LASSO and variants such as the logistic SLOPE, and also shape constrained logistic regression; (3) kernel methods, where the loss is the hinge loss, and the regularization function is the RKHS norm.
</p>projecteuclid.org/euclid.aos/1558425641_20190521040050Tue, 21 May 2019 04:00 EDTGeneralized cluster trees and singular measureshttps://projecteuclid.org/euclid.aos/1558425642<strong>Yen-Chi Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2174--2203.</p><p><strong>Abstract:</strong><br/>
In this paper we study the $\alpha $-cluster tree ($\alpha $-tree) under both singular and nonsingular measures. The $\alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $\alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $\alpha $-tree fails to converge in the $L_{\infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $\alpha $-tree.
</p>projecteuclid.org/euclid.aos/1558425642_20190521040050Tue, 21 May 2019 04:00 EDTSpectral method and regularized MLE are both optimal for top-$K$ rankinghttps://projecteuclid.org/euclid.aos/1558425643<strong>Yuxin Chen</strong>, <strong>Jianqing Fan</strong>, <strong>Cong Ma</strong>, <strong>Kaizheng Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2204--2235.</p><p><strong>Abstract:</strong><br/>
This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled.
We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $\mathop{\mathrm{sin}}\nolimits \Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $\ell_{2}$ error upper bound for the spectral method and the minimax lower limit.
</p>projecteuclid.org/euclid.aos/1558425643_20190521040050Tue, 21 May 2019 04:00 EDTNegative association, ordering and convergence of resampling methodshttps://projecteuclid.org/euclid.aos/1558425644<strong>Mathieu Gerber</strong>, <strong>Nicolas Chopin</strong>, <strong>Nick Whiteley</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2236--2260.</p><p><strong>Abstract:</strong><br/>
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $\mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $\mathbb{R}^{d}$, the variance of the resampling error is ${\scriptstyle\mathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.
</p>projecteuclid.org/euclid.aos/1558425644_20190521040050Tue, 21 May 2019 04:00 EDTOn deep learning as a remedy for the curse of dimensionality in nonparametric regressionhttps://projecteuclid.org/euclid.aos/1558425645<strong>Benedikt Bauer</strong>, <strong>Michael Kohler</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2261--2285.</p><p><strong>Abstract:</strong><br/>
Assuming that a smoothness condition and a suitable restriction on the structure of the regression function hold, it is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression. The proof is based on new approximation results concerning multilayer feedforward neural networks with bounded weights and a bounded number of hidden neurons. The estimates are compared with various other approaches by using simulated data.
</p>projecteuclid.org/euclid.aos/1558425645_20190521040050Tue, 21 May 2019 04:00 EDTConvergence rates of least squares regression estimators with heavy-tailed errorshttps://projecteuclid.org/euclid.aos/1558425646<strong>Qiyang Han</strong>, <strong>Jon A. Wellner</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2286--2319.</p><p><strong>Abstract:</strong><br/>
We study the performance of the least squares estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a $p$th moment ($p\geq1$). In such a heavy-tailed regression setting, we show that if the model satisfies a standard “entropy condition” with exponent $\alpha\in(0,2)$, then the $L_{2}$ loss of the LSE converges at a rate
\[\mathcal{O}_{\mathbf{P}}\bigl(n^{-\frac{1}{2+\alpha}}\vee n^{-\frac{1}{2}+\frac{1}{2p}}\bigr).\] Such a rate cannot be improved under the entropy condition alone.
This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have $p\geq1+2/\alpha$ moments, the $L_{2}$ loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if $p<1+2/\alpha$, there are (many) hard models at any entropy level $\alpha$ for which the $L_{2}$ loss of the LSE converges at a strictly slower rate than other robust estimators.
The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the $L_{2}$ loss of the LSE can converge arbitrarily slowly when the independence fails.
The key technical ingredient is a new multiplier inequality that gives sharp bounds for the “multiplier empirical process” associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality.
</p>projecteuclid.org/euclid.aos/1558425646_20190521040050Tue, 21 May 2019 04:00 EDTConvergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regressionhttps://projecteuclid.org/euclid.aos/1558425647<strong>Qian Qin</strong>, <strong>James P. Hobert</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2320--2347.</p><p><strong>Abstract:</strong><br/>
The use of MCMC algorithms in high dimensional Bayesian problems has become routine. This has spurred so-called convergence complexity analysis, the goal of which is to ascertain how the convergence rate of a Monte Carlo Markov chain scales with sample size, $n$, and/or number of covariates, $p$. This article provides a thorough convergence complexity analysis of Albert and Chib’s [ J. Amer. Statist. Assoc. 88 (1993) 669–679] data augmentation algorithm for the Bayesian probit regression model. The main tools used in this analysis are drift and minorization conditions. The usual pitfalls associated with this type of analysis are avoided by utilizing centered drift functions, which are minimized in high posterior probability regions, and by using a new technique to suppress high-dimensionality in the construction of minorization conditions. The main result is that the geometric convergence rate of the underlying Markov chain is bounded below 1 both as $n\rightarrow\infty$ (with $p$ fixed), and as $p\rightarrow\infty$ (with $n$ fixed). Furthermore, the first computable bounds on the total variation distance to stationarity are byproducts of the asymptotic analysis.
</p>projecteuclid.org/euclid.aos/1558425647_20190521040050Tue, 21 May 2019 04:00 EDTOn testing conditional qualitative treatment effectshttps://projecteuclid.org/euclid.aos/1558425648<strong>Chengchun Shi</strong>, <strong>Rui Song</strong>, <strong>Wenbin Lu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2348--2377.</p><p><strong>Abstract:</strong><br/>
Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. In this paper, we first introduce the notion of conditional qualitative treatment effects (CQTE) of a set of variables given another set of variables and provide a class of equivalent representations for the null hypothesis of no CQTE. The proposed definition of CQTE does not assume any parametric form for the optimal treatment rule and plays an important role for assessing the incremental value of a set of new variables in optimal treatment decision making conditional on an existing set of prescriptive variables. We then propose novel testing procedures for no CQTE based on kernel estimation of the conditional contrast functions. We show that our test statistics have asymptotically correct size and nonnegligible power against some nonstandard local alternatives. The empirical performance of the proposed tests are evaluated by simulations and an application to an AIDS data set.
</p>projecteuclid.org/euclid.aos/1558425648_20190521040050Tue, 21 May 2019 04:00 EDTDynamic network models and graphon estimationhttps://projecteuclid.org/euclid.aos/1558425649<strong>Marianna Pensky</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2378--2403.</p><p><strong>Abstract:</strong><br/>
In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities $\mathbf{{\Lambda}}$ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator $\widehat{\boldsymbol{\Lambda}}$ of $\mathbf{{\Lambda}}$ and show that $\widehat{\boldsymbol{\Lambda}}$ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of $\mathbf{{\Lambda}}$ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions.
</p>projecteuclid.org/euclid.aos/1558425649_20190521040050Tue, 21 May 2019 04:00 EDTCross validation for locally stationary processeshttps://projecteuclid.org/euclid.aos/1558512018<strong>Stefan Richter</strong>, <strong>Rainer Dahlhaus</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 4, 2145--2173.</p><p><strong>Abstract:</strong><br/>
We propose an adaptive bandwidth selector via cross validation for local M-estimators in locally stationary processes. We prove asymptotic optimality of the procedure under mild conditions on the underlying parameter curves. The results are applicable to a wide range of locally stationary processes such linear and nonlinear processes. A simulation study shows that the method works fairly well also in misspecified situations.
</p>projecteuclid.org/euclid.aos/1558512018_20190522040104Wed, 22 May 2019 04:01 EDT