The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTUniformly valid post-regularization confidence regions for many functional parameters in z-estimation frameworkhttps://projecteuclid.org/euclid.aos/1536631286<strong>Alexandre Belloni</strong>, <strong>Victor Chernozhukov</strong>, <strong>Denis Chetverikov</strong>, <strong>Ying Wei</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3643--3675.</p><p><strong>Abstract:</strong><br/>
In this paper, we develop procedures to construct simultaneous confidence bands for ${\tilde{p}}$ potentially infinite-dimensional parameters after model selection for general moment condition models where ${\tilde{p}}$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the ${\tilde{p}}$ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for ${{\tilde{p}}\gg n}$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.
</p>projecteuclid.org/euclid.aos/1536631286_20180910220136Mon, 10 Sep 2018 22:01 EDTLocal asymptotic equivalence of pure states ensembles and quantum Gaussian white noisehttps://projecteuclid.org/euclid.aos/1536631287<strong>Cristina Butucea</strong>, <strong>Mădălin Guţă</strong>, <strong>Michael Nussbaum</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3676--3706.</p><p><strong>Abstract:</strong><br/>
Quantum technology is increasingly relying on specialised statistical inference methods for analysing quantum measurement data. This motivates the development of “quantum statistics”, a field that is shaping up at the overlap of quantum physics and “classical” statistics. One of the less investigated topics to date is that of statistical inference for infinite dimensional quantum systems, which can be seen as quantum counterpart of nonparametric statistics. In this paper, we analyse the asymptotic theory of quantum statistical models consisting of ensembles of quantum systems which are identically prepared in a pure state. In the limit of large ensembles, we establish the local asymptotic equivalence (LAE) of this i.i.d. model to a quantum Gaussian white noise model. We use the LAE result in order to establish minimax rates for the estimation of pure states belonging to Hermite–Sobolev classes of wave functions. Moreover, for quadratic functional estimation of the same states we note an elbow effect in the rates, whereas for testing a pure state a sharp parametric rate is attained over the nonparametric Hermite–Sobolev class.
</p>projecteuclid.org/euclid.aos/1536631287_20180910220136Mon, 10 Sep 2018 22:01 EDTExtremal quantile treatment effectshttps://projecteuclid.org/euclid.aos/1536631288<strong>Yichong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3707--3740.</p><p><strong>Abstract:</strong><br/>
This paper establishes an asymptotic theory and inference method for quantile treatment effect estimators when the quantile index is close to or equal to zero. Such quantile treatment effects are of interest in many applications, such as the effect of maternal smoking on an infant’s adverse birth outcomes. When the quantile index is close to zero, the sparsity of data jeopardizes conventional asymptotic theory and bootstrap inference. When the quantile index is zero, there are no existing inference methods directly applicable in the treatment effect context. This paper addresses both of these issues by proposing new inference methods that are shown to be asymptotically valid as well as having adequate finite sample properties.
</p>projecteuclid.org/euclid.aos/1536631288_20180910220136Mon, 10 Sep 2018 22:01 EDTOptimal maximin $L_{1}$-distance Latin hypercube designs based on good lattice point designshttps://projecteuclid.org/euclid.aos/1536631289<strong>Lin Wang</strong>, <strong>Qian Xiao</strong>, <strong>Hongquan Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3741--3766.</p><p><strong>Abstract:</strong><br/>
Maximin distance Latin hypercube designs are commonly used for computer experiments, but the construction of such designs is challenging. We construct a series of maximin Latin hypercube designs via Williams transformations of good lattice point designs. Some constructed designs are optimal under the maximin $L_{1}$-distance criterion, while others are asymptotically optimal. Moreover, these designs are also shown to have small pairwise correlations between columns.
</p>projecteuclid.org/euclid.aos/1536631289_20180910220136Mon, 10 Sep 2018 22:01 EDTRho-estimators revisited: General theory and applicationshttps://projecteuclid.org/euclid.aos/1536631290<strong>Yannick Baraud</strong>, <strong>Lucien Birgé</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3767--3804.</p><p><strong>Abstract:</strong><br/>
Following Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517], we pursue our attempt to design a robust universal estimator of the joint distribution of $n$ independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution $\mathbf{P}$ and a dominated model $\mathscr{Q}$ for $\mathbf{P}$, we build an estimator $\widehat{\mathbf{P}}$ based on $\mathscr{Q}$ (a $\rho$-estimator) and measure its risk by an Hellinger-type distance. When $\mathbf{P}$ does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of $\mathbf{P}$. In most situations, this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When $\mathbf{P}$ does not belong to the model, its risk involves an additional bias term proportional to the distance between $\mathbf{P}$ and $\mathscr{Q}$, whatever the true distribution $\mathbf{P}$. From this point of view, this new version of $\rho$-estimators improves upon the previous one described in Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517] which required that $\mathbf{P}$ be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the maximum likelihood estimator to be a $\rho$-estimator. Finally, we consider the situation where the statistician has at her or his disposal many different models and we build a penalized version of the $\rho$-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows one to estimate the regression function but also the distribution of the errors.
</p>projecteuclid.org/euclid.aos/1536631290_20180910220136Mon, 10 Sep 2018 22:01 EDTThink globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embeddinghttps://projecteuclid.org/euclid.aos/1536631291<strong>Hau-Tieng Wu</strong>, <strong>Nan Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3805--3837.</p><p><strong>Abstract:</strong><br/>
Since its introduction in 2000, Locally Linear Embedding (LLE) has been widely applied in data science. We provide an asymptotical analysis of LLE under the manifold setup. We show that for a general manifold, asymptotically we may not obtain the Laplace–Beltrami operator, and the result may depend on nonuniform sampling unless a correct regularization is chosen. We also derive the corresponding kernel function, which indicates that LLE is not a Markov process. A comparison with other commonly applied nonlinear algorithms, particularly a diffusion map, is provided and its relationship with locally linear regression is also discussed.
</p>projecteuclid.org/euclid.aos/1536631291_20180910220136Mon, 10 Sep 2018 22:01 EDTNonparametric covariate-adjusted response-adaptive design based on a functional urn modelhttps://projecteuclid.org/euclid.aos/1536631292<strong>Giacomo Aletti</strong>, <strong>Andrea Ghiglietti</strong>, <strong>William F. Rosenberger</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3838--3866.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a general class of covariate-adjusted response-adaptive (CARA) designs based on a new functional urn model. We prove strong consistency concerning the functional urn proportion and the proportion of subjects assigned to the treatment groups, in the whole study and for each covariate profile, allowing the distribution of the responses conditioned on covariates to be estimated nonparametrically. In addition, we establish joint central limit theorems for the above quantities and the sufficient statistics of features of interest, which allow to construct procedures to make inference on the conditional response distributions. These results are then applied to typical situations concerning Gaussian and binary responses.
</p>projecteuclid.org/euclid.aos/1536631292_20180910220136Mon, 10 Sep 2018 22:01 EDTFunctional data analysis by matrix completionhttps://projecteuclid.org/euclid.aos/1543568580<strong>Marie-Hélène Descary</strong>, <strong>Victor M. Panaretos</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 1--38.</p><p><strong>Abstract:</strong><br/>
Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that rough variation is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale fluctuations may indeed be rough relative to the global scale, but still smooth at shorter scales. These may be confounded with the global smooth components of variation by the smoothing and PCA, potentially distorting the parsimony and interpretability of the analysis. The goal of this paper is to investigate how both smooth and rough variations can be recovered on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the recovery of the two corresponding covariance operators. The key insight is that they should possess complementary forms of parsimony: one smooth and finite rank (large scale), and the other banded and potentially infinite rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth and grid resolution. Under these conditions, we show that the recovery problem is equivalent to rank-constrained matrix completion, and exploit this to construct estimators of the two covariances, without assuming knowledge of the true bandwidth or rank; we study their asymptotic behaviour, and then use them to recover the smooth and rough components of each functional datum by best linear prediction. As a result, we effectively produce separate functional PCAs for smooth and rough variation.
</p>projecteuclid.org/euclid.aos/1543568580_20181130040326Fri, 30 Nov 2018 04:03 ESTBayesian fractional posteriorshttps://projecteuclid.org/euclid.aos/1543568581<strong>Anirban Bhattacharya</strong>, <strong>Debdeep Pati</strong>, <strong>Yun Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 39--66.</p><p><strong>Abstract:</strong><br/>
We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback–Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in misspecified models. Our derivation reveals several advantages of averaging based Bayesian procedures over optimization based frequentist procedures. As an application of the Bayesian oracle inequality, we derive a sharp oracle inequality in multivariate convex regression problems. We also illustrate the theory in Gaussian process regression and density estimation problems.
</p>projecteuclid.org/euclid.aos/1543568581_20181130040326Fri, 30 Nov 2018 04:03 ESTDistribution theory for hierarchical processeshttps://projecteuclid.org/euclid.aos/1543568582<strong>Federico Camerlenghi</strong>, <strong>Antonio Lijoi</strong>, <strong>Peter Orbanz</strong>, <strong>Igor Prünster</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 67--92.</p><p><strong>Abstract:</strong><br/>
Hierarchies of discrete probability measures are remarkably popular as nonparametric priors in applications, arguably due to two key properties: (i) they naturally represent multiple heterogeneous populations; (ii) they produce ties across populations, resulting in a shrinkage property often described as “sharing of information.” In this paper, we establish a distribution theory for hierarchical random measures that are generated via normalization, thus encompassing both the hierarchical Dirichlet and hierarchical Pitman–Yor processes. These results provide a probabilistic characterization of the induced (partially exchangeable) partition structure, including the distribution and the asymptotics of the number of partition sets, and a complete posterior characterization. They are obtained by representing hierarchical processes in terms of completely random measures, and by applying a novel technique for deriving the associated distributions. Moreover, they also serve as building blocks for new simulation algorithms, and we derive marginal and conditional algorithms for Bayesian inference.
</p>projecteuclid.org/euclid.aos/1543568582_20181130040326Fri, 30 Nov 2018 04:03 ESTAdaptive estimation of the sparsity in the Gaussian vector modelhttps://projecteuclid.org/euclid.aos/1543568583<strong>Alexandra Carpentier</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 93--126.</p><p><strong>Abstract:</strong><br/>
Consider the Gaussian vector model with mean value $\theta$. We study the twin problems of estimating the number $\Vert \theta \Vert_{0}$ of nonzero components of $\theta$ and testing whether $\Vert \theta \Vert_{0}$ is smaller than some value. For testing, we establish the minimax separation distances for this model and introduce a minimax adaptive test. Extensions to the case of unknown variance are also discussed. Rewriting the estimation of $\Vert \theta \Vert_{0}$ as a multiple testing problem of all hypotheses $\{\Vert \theta \Vert_{0}\leq q\}$, we both derive a new way of assessing the optimality of a sparsity estimator and we exhibit such an optimal procedure. This general approach provides a roadmap for estimating the complexity of the signal in various statistical models.
</p>projecteuclid.org/euclid.aos/1543568583_20181130040326Fri, 30 Nov 2018 04:03 ESTApproximate optimal designs for multivariate polynomial regressionhttps://projecteuclid.org/euclid.aos/1543568584<strong>Yohann De Castro</strong>, <strong>Fabrice Gamboa</strong>, <strong>Didier Henrion</strong>, <strong>Roxana Hess</strong>, <strong>Jean-Bernard Lasserre</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 127--155.</p><p><strong>Abstract:</strong><br/>
We introduce a new approach aiming at computing approximate optimal designs for multivariate polynomial regressions on compact (semialgebraic) design spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically the approximate optimal design problem. The geometry of the design is recovered via semidefinite programming duality theory. This article shows that the hierarchy converges to the approximate optimal design as the order of the hierarchy increases. Furthermore, we provide a dual certificate ensuring finite convergence of the hierarchy and showing that the approximate optimal design can be computed numerically with our method. As a byproduct, we revisit the equivalence theorem of the experimental design theory: it is linked to the Christoffel polynomial and it characterizes finite convergence of the moment-sum-of-square hierarchies.
</p>projecteuclid.org/euclid.aos/1543568584_20181130040326Fri, 30 Nov 2018 04:03 ESTEfficient estimation of integrated volatility functionals via multiscale jackknifehttps://projecteuclid.org/euclid.aos/1543568585<strong>Jia Li</strong>, <strong>Yunxiao Liu</strong>, <strong>Dacheng Xiu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 156--176.</p><p><strong>Abstract:</strong><br/>
We propose semiparametrically efficient estimators for general integrated volatility functionals of multivariate semimartingale processes. A plug-in method that uses nonparametric estimates of spot volatilities is known to induce high-order biases that need to be corrected to obey a central limit theorem. Such bias terms arise from boundary effects, the diffusive and jump movements of stochastic volatility and the sampling error from the nonparametric spot volatility estimation. We propose a novel jackknife method for bias correction. The jackknife estimator is simply formed as a linear combination of a few uncorrected estimators associated with different local window sizes used in the estimation of spot volatility. We show theoretically that our estimator is asymptotically mixed Gaussian, semiparametrically efficient, and more robust to the choice of local windows. To facilitate the practical use, we introduce a simulation-based estimator of the asymptotic variance, so that our inference is derivative-free, and hence is convenient to implement.
</p>projecteuclid.org/euclid.aos/1543568585_20181130040326Fri, 30 Nov 2018 04:03 ESTNonasymptotic rates for manifold, tangent space and curvature estimationhttps://projecteuclid.org/euclid.aos/1543568586<strong>Eddie Aamari</strong>, <strong>Clément Levrard</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 177--204.</p><p><strong>Abstract:</strong><br/>
Given a noisy sample from a submanifold $M\subset\mathbb{R}^{D}$, we derive optimal rates for the estimation of tangent spaces $T_{X}M$, the second fundamental form $\mathit{II}_{X}^{M}$ and the submanifold $M$. After motivating their study, we introduce a quantitative class of $\mathcal{C}^{k}$-submanifolds in analogy with Hölder classes. The proposed estimators are based on local polynomials and allow to deal simultaneously with the three problems at stake. Minimax lower bounds are derived using a conditional version of Assouad’s lemma when the base point $X$ is random.
</p>projecteuclid.org/euclid.aos/1543568586_20181130040326Fri, 30 Nov 2018 04:03 ESTNonparametric testing for multiple survival functions with noninferiority marginshttps://projecteuclid.org/euclid.aos/1543568587<strong>Hsin-wen Chang</strong>, <strong>Ian W. McKeague</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 205--232.</p><p><strong>Abstract:</strong><br/>
New nonparametric tests for the ordering of multiple survival functions are developed with the possibility of right censorship taken into account. The motivation comes from noninferiority trials with multiple treatments. The proposed tests are based on nonparametric likelihood ratio statistics, which are known to provide more powerful tests than Wald-type procedures, but in this setting have only been studied for pairs of survival functions or in the absence of censoring. We introduce a novel type of pool adjacent violator algorithm that leads to a complete solution of the problem. The limit distributions can be expressed as weighted sums of squares involving projections of certain Gaussian processes onto the given ordered alternative. A simulation study shows that the new procedures have superior power to a competing combined-pairwise Cox model approach. We illustrate the proposed methods using data from a three-arm noninferiority trial.
</p>projecteuclid.org/euclid.aos/1543568587_20181130040326Fri, 30 Nov 2018 04:03 ESTOracle inequalities and adaptive estimation in the convolution structure density modelhttps://projecteuclid.org/euclid.aos/1543568588<strong>O. V. Lepski</strong>, <strong>T. Willer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 233--287.</p><p><strong>Abstract:</strong><br/>
We study the problem of nonparametric estimation under $\mathbb{L}_{p}$-loss, $p\in[1,\infty)$, in the framework of the convolution structure density model on $\mathbb{R}^{d}$. This observation scheme is a generalization of two classical statistical models, namely density estimation under direct and indirect observations. The original pointwise selection rule from a family of “kernel-type” estimators is proposed. For the selected estimator, we prove an $\mathbb{L}_{p}$-norm oracle inequality and several of its consequences. Next, the problem of adaptive minimax estimation under $\mathbb{L}_{p}$-loss over the scale of anisotropic Nikol’skii classes is addressed. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. We prove that the proposed selection rule leads to the construction of an optimally or nearly optimally (up to logarithmic factors) adaptive estimator.
</p>projecteuclid.org/euclid.aos/1543568588_20181130040326Fri, 30 Nov 2018 04:03 ESTEfficient multivariate entropy estimation via $k$-nearest neighbour distanceshttps://projecteuclid.org/euclid.aos/1543568589<strong>Thomas B. Berrett</strong>, <strong>Richard J. Samworth</strong>, <strong>Ming Yuan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 288--318.</p><p><strong>Abstract:</strong><br/>
Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [ Probl. Inform. Transm. 23 (1987), 95–101], based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in $\mathbb{R}^{d}$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when $d\leq 3$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.
</p>projecteuclid.org/euclid.aos/1543568589_20181130040326Fri, 30 Nov 2018 04:03 ESTPosterior graph selection and estimation consistency for high-dimensional Bayesian DAG modelshttps://projecteuclid.org/euclid.aos/1543568590<strong>Xuan Cao</strong>, <strong>Kshitij Khare</strong>, <strong>Malay Ghosh</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 319--348.</p><p><strong>Abstract:</strong><br/>
Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying variables. A variety of priors have been developed in recent years for Bayesian inference in DAG models, yet crucial convergence and sparsity selection properties for these models have not been thoroughly investigated. Most of these priors are adaptations/generalizations of the Wishart distribution in the DAG context. In this paper, we consider a flexible and general class of these “DAG-Wishart” priors with multiple shape parameters. Under mild regularity assumptions, we establish strong graph selection consistency and establish posterior convergence rates for estimation when the number of variables $p$ is allowed to grow at an appropriate subexponential rate with the sample size $n$.
</p>projecteuclid.org/euclid.aos/1543568590_20181130040326Fri, 30 Nov 2018 04:03 ESTLocally adaptive confidence bandshttps://projecteuclid.org/euclid.aos/1543568591<strong>Tim Patschkowski</strong>, <strong>Angelika Rohde</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 349--381.</p><p><strong>Abstract:</strong><br/>
We develop honest and locally adaptive confidence bands for probability densities. They provide substantially improved confidence statements in case of inhomogeneous smoothness, and are easily implemented and visualized. The article contributes conceptual work on locally adaptive inference as a straightforward modification of the global setting imposes severe obstacles for statistical purposes. Among others, we introduce a statistical notion of local Hölder regularity and prove a correspondingly strong version of local adaptivity. We substantially relax the straightforward localization of the self-similarity condition in order not to rule out prototypical densities. The set of densities permanently excluded from the consideration is shown to be pathological in a mathematically rigorous sense. On a technical level, the crucial component for the verification of honesty is the identification of an asymptotically least favorable stationary case by means of Slepian’s comparison inequality.
</p>projecteuclid.org/euclid.aos/1543568591_20181130040326Fri, 30 Nov 2018 04:03 ESTAsymptotic distribution-free change-point detection for multivariate and non-Euclidean datahttps://projecteuclid.org/euclid.aos/1543568592<strong>Lynna Chu</strong>, <strong>Hao Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 382--414.</p><p><strong>Abstract:</strong><br/>
We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution-free under the null hypothesis of no change. Analytic $p$-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.
</p>projecteuclid.org/euclid.aos/1543568592_20181130040326Fri, 30 Nov 2018 04:03 ESTStatistics on the Stiefel manifold: Theory and applicationshttps://projecteuclid.org/euclid.aos/1543568593<strong>Rudrasis Chakraborty</strong>, <strong>Baba C. Vemuri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 415--438.</p><p><strong>Abstract:</strong><br/>
A Stiefel manifold of the compact type is often encountered in many fields of engineering including, signal and image processing, machine learning, numerical optimization and others. The Stiefel manifold is a Riemannian homogeneous space but not a symmetric space. In previous work, researchers have defined probability distributions on symmetric spaces and performed statistical analysis of data residing in these spaces. In this paper, we present original work involving definition of Gaussian distributions on a homogeneous space and show that the maximum-likelihood estimate of the location parameter of a Gaussian distribution on the homogeneous space yields the Fréchet mean (FM) of the samples drawn from this distribution. Further, we present an algorithm to sample from the Gaussian distribution on the Stiefel manifold and recursively compute the FM of these samples. We also prove the weak consistency of this recursive FM estimator. Several synthetic and real data experiments are then presented, demonstrating the superior computational performance of this estimator over the gradient descent based nonrecursive counter part as well as the stochastic gradient descent based method prevalent in literature.
</p>projecteuclid.org/euclid.aos/1543568593_20181130040326Fri, 30 Nov 2018 04:03 ESTGoodness-of-fit tests for the functional linear model based on randomly projected empirical processeshttps://projecteuclid.org/euclid.aos/1543568594<strong>Juan A. Cuesta-Albertos</strong>, <strong>Eduardo García-Portugués</strong>, <strong>Manuel Febrero-Bande</strong>, <strong>Wenceslao González-Manteiga</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 439--467.</p><p><strong>Abstract:</strong><br/>
We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-$n$ convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The computation of the test in practice involves calibration by wild bootstrap resampling and the combination of several $p$-values, arising from different projections, by means of the false discovery rate method. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives. The software provided implements the tests and allows the replication of simulations and data applications.
</p>projecteuclid.org/euclid.aos/1543568594_20181130040326Fri, 30 Nov 2018 04:03 ESTConvolved subsampling estimation with applications to block bootstraphttps://projecteuclid.org/euclid.aos/1543568595<strong>Johannes Tewes</strong>, <strong>Dimitris N. Politis</strong>, <strong>Daniel J. Nordman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 468--496.</p><p><strong>Abstract:</strong><br/>
The block bootstrap approximates sampling distributions from dependent data by resampling data blocks. A fundamental problem is establishing its consistency for the distribution of a sample mean, as a prototypical statistic. We use a structural relationship with subsampling to characterize the bootstrap in a new and general manner. While subsampling and block bootstrap differ, the block bootstrap distribution of a sample mean equals that of a $k$-fold self-convolution of a subsampling distribution. Motivated by this, we provide simple necessary and sufficient conditions for a convolved subsampling estimator to produce a normal limit that matches the target of bootstrap estimation. These conditions may be linked to consistency properties of an original subsampling distribution, which are often obtainable under minimal assumptions. Through several examples, the results are shown to validate the block bootstrap for means under significantly weakened assumptions in many existing (and some new) dependence settings, which also addresses a standing conjecture of Politis, Romano and Wolf [ Subsampling (1999) Springer]. Beyond sample means, convolved subsampling may not match the block bootstrap, but instead provides an alternative resampling estimator that may be of interest. Under minimal dependence conditions, results also broadly establish convolved subsampling for general statistics having normal limits.
</p>projecteuclid.org/euclid.aos/1543568595_20181130040326Fri, 30 Nov 2018 04:03 ESTFeature elimination in kernel machines in moderately high dimensionshttps://projecteuclid.org/euclid.aos/1543568596<strong>Sayan Dasgupta</strong>, <strong>Yair Goldberg</strong>, <strong>Michael R. Kosorok</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 497--526.</p><p><strong>Abstract:</strong><br/>
We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features. We present theoretical properties of this method and show that it is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present a few case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach.
</p>projecteuclid.org/euclid.aos/1543568596_20181130040326Fri, 30 Nov 2018 04:03 ESTHigh-dimensional covariance matrices in elliptical distributions with application to spherical testhttps://projecteuclid.org/euclid.aos/1543568597<strong>Jiang Hu</strong>, <strong>Weiming Li</strong>, <strong>Zhi Liu</strong>, <strong>Wang Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 527--555.</p><p><strong>Abstract:</strong><br/>
This paper discusses fluctuations of linear spectral statistics of high-dimensional sample covariance matrices when the underlying population follows an elliptical distribution. Such population often possesses high order correlations among their coordinates, which have great impact on the asymptotic behaviors of linear spectral statistics. Taking such kind of dependency into consideration, we establish a new central limit theorem for the linear spectral statistics in this paper for a class of elliptical populations. This general theoretical result has wide applications and, as an example, it is then applied to test the sphericity of elliptical populations.
</p>projecteuclid.org/euclid.aos/1543568597_20181130040326Fri, 30 Nov 2018 04:03 ESTA critical threshold for design effects in network samplinghttps://projecteuclid.org/euclid.aos/1543568598<strong>Karl Rohe</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 556--582.</p><p><strong>Abstract:</strong><br/>
Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample.
In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value $\operatorname{DE}$, then constructing an estimator from the novel design makes the variance of the estimator $\operatorname{DE}$ times greater than it would be under a simple random sample with the same sample size $n$. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, $\lambda_{2}$. If $m<1/\lambda_{2}^{2}$, then the design effect is finite (i.e., the standard estimator is $\sqrt{n}$-consistent). However, if $m>1/\lambda_{2}^{2}$, then the design effect grows with $n$ (i.e., the standard estimator is no longer $\sqrt{n}$-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of $n^{\log_{m}\lambda_{2}}$. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.
</p>projecteuclid.org/euclid.aos/1543568598_20181130040326Fri, 30 Nov 2018 04:03 ESTPermutation $p$-value approximation via generalized Stolarsky invariancehttps://projecteuclid.org/euclid.aos/1543568599<strong>Hera Y. He</strong>, <strong>Kinjal Basu</strong>, <strong>Qingyuan Zhao</strong>, <strong>Art B. Owen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 583--611.</p><p><strong>Abstract:</strong><br/>
It is common for genomic data analysis to use $p$-values from a large number of permutation tests. The multiplicity of tests may require very tiny $p$-values in order to reject any null hypotheses and the common practice of using randomly sampled permutations then becomes very expensive. We propose an inexpensive approximation to $p$-values for two sample linear test statistics, derived from Stolarsky’s invariance principle. The method creates a geometrically derived reference set of approximate $p$-values for each hypothesis. The average of that set is used as a point estimate $\hat{p}$ and our generalization of the invariance principle allows us to compute the variance of the $p$-values in that set. We find that in cases where the point estimate is small, the variance is a modest multiple of the square of that point estimate, yielding a relative error property similar to that of saddlepoint approximations. On a Parkinson’s disease data set, the new approximation is faster and more accurate than the saddlepoint approximation. We also obtain a simple probabilistic explanation of Stolarsky’s invariance principle.
</p>projecteuclid.org/euclid.aos/1543568599_20181130040326Fri, 30 Nov 2018 04:03 ESTCanonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank casehttps://projecteuclid.org/euclid.aos/1543568600<strong>Zhigang Bao</strong>, <strong>Jiang Hu</strong>, <strong>Guangming Pan</strong>, <strong>Wang Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 612--640.</p><p><strong>Abstract:</strong><br/>
Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$, respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the canonical correlation analysis. We investigate the high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population cross-covariance matrix of random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, that is, there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_{1}\geq\cdots\geq r_{k}>0$. We study the sample counterparts of $r_{i},i=1,\ldots,k$, that is, the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{xx}^{-1}S_{xy}S_{yy}^{-1}S_{yx}$, denoted by $\lambda_{1}\geq\cdots\geq\lambda_{k}$. We show that there exists a threshold $r_{c}\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_{i}\leq r_{c}$, $\lambda_{i}$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_{i}>r_{c}$, $\lambda_{i}$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_{i}$’s in turn, thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of $\lambda_{i}$’s under appropriate normalization. Specifically, $\lambda_{i}$ possesses Gaussian type fluctuation if $r_{i}>r_{c}$, and follows Tracy–Widom distribution if $r_{i}<r_{c}$. Some applications of our results are also discussed.
</p>projecteuclid.org/euclid.aos/1543568600_20181130040326Fri, 30 Nov 2018 04:03 ESTUniform projection designshttps://projecteuclid.org/euclid.aos/1543568601<strong>Fasheng Sun</strong>, <strong>Yaping Wang</strong>, <strong>Hongquan Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 641--661.</p><p><strong>Abstract:</strong><br/>
Efficient designs are in high demand in practice for both computer and physical experiments. Existing designs (such as maximin distance designs and uniform designs) may have bad low-dimensional projections, which is undesirable when only a few factors are active. We propose a new design criterion, called uniform projection criterion, by focusing on projection uniformity. Uniform projection designs generated under the new criterion scatter points uniformly in all dimensions and have good space-filling properties in terms of distance, uniformity and orthogonality. We show that the new criterion is a function of the pairwise $L_{1}$-distances between the rows, so that the new criterion can be computed at no more cost than a design criterion that ignores projection properties. We develop some theoretical results and show that maximin $L_{1}$-equidistant designs are uniform projection designs. In addition, a class of asymptotically optimal uniform projection designs based on good lattice point sets are constructed. We further illustrate an application of uniform projection designs via a multidrug combination experiment.
</p>projecteuclid.org/euclid.aos/1543568601_20181130040326Fri, 30 Nov 2018 04:03 ESTComputation of maximum likelihood estimates in cyclic structural equation modelshttps://projecteuclid.org/euclid.aos/1547197234<strong>Mathias Drton</strong>, <strong>Christopher Fox</strong>, <strong>Y. Samuel Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 663--690.</p><p><strong>Abstract:</strong><br/>
Software for computation of maximum likelihood estimates in linear structural equation models typically employs general techniques from nonlinear optimization, such as quasi-Newton methods. In practice, careful tuning of initial values is often required to avoid convergence issues. As an alternative approach, we propose a block-coordinate descent method that cycles through the considered variables, updating only the parameters related to a given variable in each step. We show that the resulting block update problems can be solved in closed form even when the structural equation model comprises feedback cycles. Furthermore, we give a characterization of the models for which the block-coordinate descent algorithm is well defined, meaning that for generic data and starting values all block optimization problems admit a unique solution. For the characterization, we represent each model by its mixed graph (also known as path diagram), which leads to criteria that can be checked in time that is polynomial in the number of considered variables.
</p>projecteuclid.org/euclid.aos/1547197234_20190111040129Fri, 11 Jan 2019 04:01 ESTFréchet regression for random objects with Euclidean predictorshttps://projecteuclid.org/euclid.aos/1547197235<strong>Alexander Petersen</strong>, <strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 691--719.</p><p><strong>Abstract:</strong><br/>
Increasingly, statisticians are faced with the task of analyzing complex data that are non-Euclidean and specifically do not lie in a vector space. To address the need for statistical methods for such data, we introduce the concept of Fréchet regression. This is a general approach to regression when responses are complex random objects in a metric space and predictors are in $\mathcal{R}^{p}$, achieved by extending the classical concept of a Fréchet mean to the notion of a conditional Fréchet mean. We develop generalized versions of both global least squares regression and local weighted least squares smoothing. The target quantities are appropriately defined population versions of global and local regression for response objects in a metric space. We derive asymptotic rates of convergence for the corresponding fitted regressions using observed data to the population targets under suitable regularity conditions by applying empirical process methods. For the special case of random objects that reside in a Hilbert space, such as regression models with vector predictors and functional data as responses, we obtain a limit distribution. The proposed methods have broad applicability. Illustrative examples include responses that consist of probability distributions and correlation matrices, and we demonstrate both global and local Fréchet regression for demographic and brain imaging data. Local Fréchet regression is also illustrated via a simulation with response data which lie on the sphere.
</p>projecteuclid.org/euclid.aos/1547197235_20190111040129Fri, 11 Jan 2019 04:01 ESTDivide and conquer in nonstandard problems and the super-efficiency phenomenonhttps://projecteuclid.org/euclid.aos/1547197236<strong>Moulinath Banerjee</strong>, <strong>Cécile Durot</strong>, <strong>Bodhisattva Sen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 720--757.</p><p><strong>Abstract:</strong><br/>
We study how the divide and conquer principle works in non-standard problems where rates of convergence are typically slower than $\sqrt{n}$ and limit distributions are non-Gaussian, and provide a detailed treatment for a variety of important and well-studied problems involving nonparametric estimation of a monotone function. We find that for a fixed model, the pooled estimator, obtained by averaging nonstandard estimates across mutually exclusive subsamples, outperforms the nonstandard monotonicity-constrained (global) estimator based on the entire sample in the sense of pointwise estimation of the function. We also show that, under appropriate conditions, if the number of subsamples is allowed to increase at appropriate rates, the pooled estimator is asymptotically normally distributed with a variance that is empirically estimable from the subsample-level estimates. Further, in the context of monotone regression, we show that this gain in efficiency under a fixed model comes at a price—the pooled estimator’s performance, in a uniform sense (maximal risk) over a class of models worsens as the number of subsamples increases, leading to a version of the super-efficiency phenomenon. In the process, we develop analytical results for the order of the bias in isotonic regression, which are of independent interest.
</p>projecteuclid.org/euclid.aos/1547197236_20190111040129Fri, 11 Jan 2019 04:01 ESTRank verification for exponential familieshttps://projecteuclid.org/euclid.aos/1547197237<strong>Kenneth Hung</strong>, <strong>William Fithian</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 758--782.</p><p><strong>Abstract:</strong><br/>
Many statistical experiments involve comparing multiple population groups. For example, a public opinion poll may ask which of several political candidates commands the most support; a social scientific survey may report the most common of several responses to a question; or, a clinical trial may compare binary patient outcomes under several treatment conditions to determine the most effective treatment. Having observed the “winner” (largest observed response) in a noisy experiment, it is natural to ask whether that candidate, survey response or treatment is actually the “best” (stochastically largest response). This article concerns the problem of rank verification —post hoc significance tests of whether the orderings discovered in the data reflect the population ranks. For exponential family models, we show under mild conditions that an unadjusted two-tailed pairwise test comparing the first two-order statistics (i.e., comparing the “winner” to the “runner-up”) is a valid test of whether the winner is truly the best. We extend our analysis to provide equally simple procedures to obtain lower confidence bounds on the gap between the winning population and the others, and to verify ranks beyond the first.
</p>projecteuclid.org/euclid.aos/1547197237_20190111040129Fri, 11 Jan 2019 04:01 ESTSub-Gaussian estimators of the mean of a random vectorhttps://projecteuclid.org/euclid.aos/1547197238<strong>Gábor Lugosi</strong>, <strong>Shahar Mendelson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 783--794.</p><p><strong>Abstract:</strong><br/>
We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.
</p>projecteuclid.org/euclid.aos/1547197238_20190111040129Fri, 11 Jan 2019 04:01 ESTCombinatorial inference for graphical modelshttps://projecteuclid.org/euclid.aos/1547197239<strong>Matey Neykov</strong>, <strong>Junwei Lu</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 795--827.</p><p><strong>Abstract:</strong><br/>
We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we study the information-theoretic limits of a large family of combinatorial inference problems. We propose new concepts including structural packing and buffer entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of novel and practical structural testing algorithms to match the lower bounds. We provide numerical results on both synthetic graphical models and brain networks to illustrate the usefulness of these proposed methods.
</p>projecteuclid.org/euclid.aos/1547197239_20190111040129Fri, 11 Jan 2019 04:01 ESTEstimation and prediction using generalized Wendland covariance functions under fixed domain asymptoticshttps://projecteuclid.org/euclid.aos/1547197240<strong>Moreno Bevilacqua</strong>, <strong>Tarik Faouzi</strong>, <strong>Reinhard Furrer</strong>, <strong>Emilio Porcu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 828--856.</p><p><strong>Abstract:</strong><br/>
We study estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland (GW) class, under fixed domain asymptotics. As for the Matérn case, this class allows for a continuous parameterization of smoothness of the underlying Gaussian random field, being additionally compactly supported. The paper is divided into three parts: first, we characterize the equivalence of two Gaussian measures with GW covariance function, and we provide sufficient conditions for the equivalence of two Gaussian measures with Matérn and GW covariance functions. In the second part, we establish strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to GW covariance model, under fixed domain asymptotics. The third part elucidates the consequences of our results in terms of (misspecified) best linear unbiased predictor, under fixed domain asymptotics. Our findings are illustrated through a simulation study: the former compares the finite sample behavior of the maximum likelihood estimation of the microergodic parameter with the given asymptotic distribution. The latter compares the finite-sample behavior of the prediction and its associated mean square error when using two equivalent Gaussian measures with Matérn and GW covariance models, using covariance tapering as benchmark.
</p>projecteuclid.org/euclid.aos/1547197240_20190111040129Fri, 11 Jan 2019 04:01 ESTChebyshev polynomials, moment matching, and optimal estimation of the unseenhttps://projecteuclid.org/euclid.aos/1547197241<strong>Yihong Wu</strong>, <strong>Pengkun Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 857--883.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least $\frac{1}{k}$. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of $\varepsilon k$ with probability at least 0.1 is within universal constant factors of $\frac{k}{\log k}\log^{2}\frac{1}{\varepsilon }$, which improves the state-of-the-art result of $\frac{k}{\varepsilon^{2}\log k}$ in [In Advances in Neural Information Processing Systems (2013) 2157–2165]. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in $O(n+\log^{2}k)$ time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.
</p>projecteuclid.org/euclid.aos/1547197241_20190111040129Fri, 11 Jan 2019 04:01 ESTPartial least squares prediction in high-dimensional regressionhttps://projecteuclid.org/euclid.aos/1547197242<strong>R. Dennis Cook</strong>, <strong>Liliana Forzani</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 884--908.</p><p><strong>Abstract:</strong><br/>
We study the asymptotic behavior of predictions from partial least squares (PLS) regression as the sample size and number of predictors diverge in various alignments. We show that there is a range of regression scenarios where PLS predictions have the usual root-$n$ convergence rate, even when the sample size is substantially smaller than the number of predictors, and an even wider range where the rate is slower but may still produce practically useful results. We show also that PLS predictions achieve their best asymptotic behavior in abundant regressions where many predictors contribute information about the response. Their asymptotic behavior tends to be undesirable in sparse regressions where few predictors contribute information about the response.
</p>projecteuclid.org/euclid.aos/1547197242_20190111040129Fri, 11 Jan 2019 04:01 ESTSignal aliasing in Gaussian random fields for experiments with qualitative factorshttps://projecteuclid.org/euclid.aos/1547197243<strong>Ming-Chung Chang</strong>, <strong>Shao-Wei Cheng</strong>, <strong>Ching-Shui Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 909--935.</p><p><strong>Abstract:</strong><br/>
Signal aliasing is an inevitable consequence of using fractional factorial designs. Unlike linear models with fixed factorial effects, for Gaussian random field models advocated in some Bayesian design and computer experiment literature, the issue of signal aliasing has not received comparable attention. In the present article, this issue is tackled for experiments with qualitative factors. The signals in a Gaussian random field can be characterized by the random effects identified from the covariance function. The aliasing severity of the signals is determined by two key elements: (i) the aliasing pattern, which depends only on the chosen design, and (ii) the effect priority, which is related to the variances of the random effects and depends on the model parameters. We first apply this framework to study the signal-aliasing problem for regular fractional factorial designs. For general factorial designs including nonregular ones, we propose an aliasing severity index to quantify the severity of signal aliasing. We also observe that the aliasing severity index is highly correlated with the prediction variance.
</p>projecteuclid.org/euclid.aos/1547197243_20190111040129Fri, 11 Jan 2019 04:01 ESTCross: Efficient low-rank tensor completionhttps://projecteuclid.org/euclid.aos/1547197244<strong>Anru Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 936--964.</p><p><strong>Abstract:</strong><br/>
The completion of tensors, or high-order arrays, attracts significant attention in recent research. Current literature on tensor completion primarily focuses on recovery from a set of uniformly randomly measured entries, and the required number of measurements to achieve recovery is not guaranteed to be optimal. In addition, the implementation of some previous methods are NP-hard. In this article, we propose a framework for low-rank tensor completion via a novel tensor measurement scheme that we name Cross. The proposed procedure is efficient and easy to implement. In particular, we show that a third-order tensor of Tucker rank-$(r_{1},r_{2},r_{3})$ in $p_{1}$-by-$p_{2}$-by-$p_{3}$ dimensional space can be recovered from as few as $r_{1}r_{2}r_{3}+r_{1}(p_{1}-r_{1})+r_{2}(p_{2}-r_{2})+r_{3}(p_{3}-r_{3})$ noiseless measurements, which matches the sample complexity lower bound. In the case of noisy measurements, we also develop a theoretical upper bound and the matching minimax lower bound for recovery error over certain classes of low-rank tensors for the proposed procedure. The results can be further extended to fourth or higher-order tensors. Simulation studies show that the method performs well under a variety of settings. Finally, the procedure is illustrated through a real dataset in neuroimaging.
</p>projecteuclid.org/euclid.aos/1547197244_20190111040129Fri, 11 Jan 2019 04:01 ESTCovariate balancing propensity score by tailored loss functionshttps://projecteuclid.org/euclid.aos/1547197245<strong>Qingyuan Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 965--993.</p><p><strong>Abstract:</strong><br/>
In observational studies, propensity scores are commonly estimated by maximum likelihood but may fail to balance high-dimensional pretreatment covariates even after specification search. We introduce a general framework that unifies and generalizes several recent proposals to improve covariate balance when designing an observational study. Instead of the likelihood function, we propose to optimize special loss functions—covariate balancing scoring rules (CBSR)—to estimate the propensity score. A CBSR is uniquely determined by the link function in the GLM and the estimand (a weighted average treatment effect). We show CBSR does not lose asymptotic efficiency in estimating the weighted average treatment effect compared to the Bernoulli likelihood, but CBSR is much more robust in finite samples. Borrowing tools developed in statistical learning, we propose practical strategies to balance covariate functions in rich function classes. This is useful to estimate the maximum bias of the inverse probability weighting (IPW) estimators and construct honest confidence intervals in finite samples. Lastly, we provide several numerical examples to demonstrate the tradeoff of bias and variance in the IPW-type estimators and the tradeoff in balancing different function classes of the covariates.
</p>projecteuclid.org/euclid.aos/1547197245_20190111040129Fri, 11 Jan 2019 04:01 ESTThe geometry of hypothesis testing over convex cones: Generalized likelihood ratio tests and minimax radiihttps://projecteuclid.org/euclid.aos/1547197246<strong>Yuting Wei</strong>, <strong>Martin J. Wainwright</strong>, <strong>Adityanand Guntuboyina</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 994--1024.</p><p><strong>Abstract:</strong><br/>
We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arises in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing and shape-constrained inference in nonparametric statistics. We provide a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones. When applied to concrete examples, this result reveals some interesting phenomena that do not arise in the analogous problems of estimation under convex constraints. In particular, in contrast to estimation error, the testing error no longer depends purely on the problem complexity via a volume-based measure (such as metric entropy or Gaussian complexity); other geometric properties of the cones also play an important role. In order to address the issue of optimality, we prove information-theoretic lower bounds for the minimax testing radius again in terms of geometric quantities. Our general theorems are illustrated by examples including the cases of monotone and orthant cones, and involve some results of independent interest.
</p>projecteuclid.org/euclid.aos/1547197246_20190111040129Fri, 11 Jan 2019 04:01 ESTNonparametric implied Lévy densitieshttps://projecteuclid.org/euclid.aos/1547197247<strong>Likuan Qin</strong>, <strong>Viktor Todorov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1025--1060.</p><p><strong>Abstract:</strong><br/>
This paper develops a nonparametric estimator for the Lévy density of an asset price, following an Itô semimartingale, implied by short-maturity options. The asymptotic setup is one in which the time to maturity of the available options decreases, the mesh of the available strike grid shrinks and the strike range expands. The estimation is based on aggregating the observed option data into nonparametric estimates of the conditional characteristic function of the return distribution, the derivatives of which allow to infer the Fourier transform of a known transform of the Lévy density in a way which is robust to the level of the unknown diffusive volatility of the asset price. The Lévy density estimate is then constructed via Fourier inversion. We derive an asymptotic bound for the integrated squared error of the estimator in the general case as well as its probability limit in the special Lévy case. We further show rate optimality of our Lévy density estimator in a minimax sense. An empirical application to market index options reveals relative stability of the left tail decay during high and low volatility periods.
</p>projecteuclid.org/euclid.aos/1547197247_20190111040129Fri, 11 Jan 2019 04:01 ESTOn model selection from a finite family of possibly misspecified time series modelshttps://projecteuclid.org/euclid.aos/1547197248<strong>Hsiang-Ling Hsu</strong>, <strong>Ching-Kang Ing</strong>, <strong>Howell Tong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1061--1087.</p><p><strong>Abstract:</strong><br/>
Consider finite parametric time series models. “I have $n$ observations and $k$ models, which model should I choose on the basis of the data alone” is a frequently asked question in many practical situations. This poses the key problem of selecting a model from a collection of candidate models, none of which is necessarily the true data generating process (DGP). Although existing literature on model selection is vast, there is a serious lacuna in that the above problem does not seem to have received much attention. In fact, existing model selection criteria have avoided addressing the above problem directly, either by assuming that the true DGP is included among the candidate models and aiming at choosing this DGP, or by assuming that the true DGP can be asymptotically approximated by an increasing sequence of candidate models and aiming at choosing the candidate having the best predictive capability in some asymptotic sense. In this article, we propose a misspecification-resistant information criterion (MRIC) to address the key problem directly. We first prove the asymptotic efficiency of MRIC whether the true DGP is among the candidates or not, within the fixed-dimensional framework. We then extend this result to the high-dimensional case in which the number of candidate variables is much larger than the sample size. In particular, we show that MRIC can be used in conjunction with a high-dimensional model selection method to select the (asymptotically) best predictive model across several high-dimensional misspecified time series models.
</p>projecteuclid.org/euclid.aos/1547197248_20190111040129Fri, 11 Jan 2019 04:01 ESTEstimating the algorithmic variance of randomized ensembles via the bootstraphttps://projecteuclid.org/euclid.aos/1547197249<strong>Miles E. Lopes</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1088--1112.</p><p><strong>Abstract:</strong><br/>
Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is “large enough”—so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of “algorithmic variance” (i.e., the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable $\mathrm{ERR}_{t}$ denote the prediction error of a randomized ensemble of size $t$. Working under a “first-order model” for randomized ensembles, we prove that the centered law of $\mathrm{ERR}_{t}$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $\mathrm{ERR}_{t}$ are negligible.
</p>projecteuclid.org/euclid.aos/1547197249_20190111040129Fri, 11 Jan 2019 04:01 ESTEfficient nonparametric Bayesian inference for $X$-ray transformshttps://projecteuclid.org/euclid.aos/1547197250<strong>François Monard</strong>, <strong>Richard Nickl</strong>, <strong>Gabriel P. Paternain</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1113--1147.</p><p><strong>Abstract:</strong><br/>
We consider the statistical inverse problem of recovering a function $f:M\to \mathbb{R}$, where $M$ is a smooth compact Riemannian manifold with boundary, from measurements of general $X$-ray transforms $I_{a}(f)$ of $f$, corrupted by additive Gaussian noise. For $M$ equal to the unit disk with “flat” geometry and $a=0$ this reduces to the standard Radon transform, but our general setting allows for anisotropic media $M$ and can further model local “attenuation” effects—both highly relevant in practical imaging problems such as SPECT tomography. We study a nonparametric Bayesian inference method based on standard Gaussian process priors for $f$. The posterior reconstruction of $f$ corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty that does not require the calculation of the singular value decomposition of the forward operator $I_{a}$. We prove Bernstein–von Mises theorems for a large family of one-dimensional linear functionals of $f$, and they entail that posterior-based inferences such as credible sets are valid and optimal from a frequentist point of view. In particular we derive the asymptotic distribution of smooth linear functionals of the Tikhonov regulariser, which attains the semiparametric information lower bound. The proofs rely on an invertibility result for the “Fisher information” operator $I_{a}^{*}I_{a}$ between suitable function spaces, a result of independent interest that relies on techniques from microlocal analysis. We illustrate the performance of the proposed method via simulations in various settings.
</p>projecteuclid.org/euclid.aos/1547197250_20190111040129Fri, 11 Jan 2019 04:01 ESTGeneralized random forestshttps://projecteuclid.org/euclid.aos/1547197251<strong>Susan Athey</strong>, <strong>Julie Tibshirani</strong>, <strong>Stefan Wager</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1148--1178.</p><p><strong>Abstract:</strong><br/>
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [ Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
</p>projecteuclid.org/euclid.aos/1547197251_20190111040129Fri, 11 Jan 2019 04:01 ESTA classification criterion for definitive screening designshttps://projecteuclid.org/euclid.aos/1547197252<strong>Eric D. Schoen</strong>, <strong>Pieter T. Eendebak</strong>, <strong>Peter Goos</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1179--1202.</p><p><strong>Abstract:</strong><br/>
A conference design is a rectangular matrix with orthogonal columns, one zero in each column, at most one zero in each row and $-1$’s and $+1$’s elsewhere. A definitive screening design can be constructed by folding over a conference design and adding a row vector of zeroes. We prove that, for a given even number of rows, there is just one isomorphism class for conference designs with two or three columns. Next, we derive all isomorphism classes for conference designs with four columns. Based on our results, we propose a classification criterion for definitive screening designs founded on projections into four factors. We illustrate the potential of the criterion by studying designs with 24 and 82 factors.
</p>projecteuclid.org/euclid.aos/1547197252_20190111040129Fri, 11 Jan 2019 04:01 ESTApproximating faces of marginal polytopes in discrete hierarchical modelshttps://projecteuclid.org/euclid.aos/1550026834<strong>Nanwei Wang</strong>, <strong>Johannes Rauh</strong>, <strong>Hélène Massam</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1203--1233.</p><p><strong>Abstract:</strong><br/>
The existence of the maximum likelihood estimate in a hierarchical log-linear model is crucial to the reliability of inference for this model. Determining whether the estimate exists is equivalent to finding whether the sufficient statistics vector $t$ belongs to the boundary of the marginal polytope of the model. The dimension of the smallest face $\mathbf{F}_{t}$ containing $t$ determines the dimension of the reduced model which should be considered for correct inference. For higher-dimensional problems, it is not possible to compute $\mathbf{F}_{t}$ exactly. Massam and Wang (2015) found an outer approximation to $\mathbf{F}_{t}$ using a collection of submodels of the original model. This paper refines the methodology to find an outer approximation and devises a new methodology to find an inner approximation. The inner approximation is given not in terms of a face of the marginal polytope, but in terms of a subset of the vertices of $\mathbf{F}_{t}$.
Knowing $\mathbf{F}_{t}$ exactly indicates which cell probabilities have maximum likelihood estimates equal to $0$. When $\mathbf{F}_{t}$ cannot be obtained exactly, we can use, first, the outer approximation $\mathbf{F}_{2}$ to reduce the dimension of the problem and then the inner approximation $\mathbf{F}_{1}$ to obtain correct estimates of cell probabilities corresponding to elements of $\mathbf{F}_{1}$ and improve the estimates of the remaining probabilities corresponding to elements in $\mathbf{F}_{2}\setminus\mathbf{F}_{1}$. Using both real-world and simulated data, we illustrate our results, and show that our methodology scales to high dimensions.
</p>projecteuclid.org/euclid.aos/1550026834_20190212220137Tue, 12 Feb 2019 22:01 ESTCHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimalityhttps://projecteuclid.org/euclid.aos/1550026835<strong>T. Tony Cai</strong>, <strong>Jing Ma</strong>, <strong>Linjun Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1234--1267.</p><p><strong>Abstract:</strong><br/>
Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance.
Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.
</p>projecteuclid.org/euclid.aos/1550026835_20190212220137Tue, 12 Feb 2019 22:01 ESTExponential ergodicity of the bouncy particle samplerhttps://projecteuclid.org/euclid.aos/1550026836<strong>George Deligiannidis</strong>, <strong>Alexandre Bouchard-Côté</strong>, <strong>Arnaud Doucet</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1268--1287.</p><p><strong>Abstract:</strong><br/>
Nonreversible Markov chain Monte Carlo schemes based on piecewise deterministic Markov processes have been recently introduced in applied probability, automatic control, physics and statistics. Although these algorithms demonstrate experimentally good performance and are accordingly increasingly used in a wide range of applications, geometric ergodicity results for such schemes have only been established so far under very restrictive assumptions. We give here verifiable conditions on the target distribution under which the Bouncy Particle Sampler algorithm introduced in [ Phys. Rev. E 85 (2012) 026703, 1671–1691] is geometrically ergodic and we provide a central limit theorem for the associated ergodic averages. This holds essentially whenever the target satisfies a curvature condition and the growth of the negative logarithm of the target is at least linear and at most quadratic. For target distributions with thinner tails, we propose an original modification of this scheme that is geometrically ergodic. For targets with thicker tails, we extend the idea pioneered in [ Ann. Statist. 40 (2012) 3050–3076] in a random walk Metropolis context. We establish geometric ergodicity of the Bouncy Particle Sampler with respect to an appropriate transformation of the target. Mapping the resulting process back to the original parameterization, we obtain a geometrically ergodic piecewise deterministic Markov process.
</p>projecteuclid.org/euclid.aos/1550026836_20190212220137Tue, 12 Feb 2019 22:01 ESTThe Zig-Zag process and super-efficient sampling for Bayesian analysis of big datahttps://projecteuclid.org/euclid.aos/1550026838<strong>Joris Bierkens</strong>, <strong>Paul Fearnhead</strong>, <strong>Gareth Roberts</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1288--1320.</p><p><strong>Abstract:</strong><br/>
Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use sub-sampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multidimensional version of the Zig-Zag process of [ Ann. Appl. Probab. 27 (2017) 846–882], a continuous-time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the Zig-Zag process offers a flexible nonreversible alternative which we observe to often have favourable convergence properties. We show how the Zig-Zag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme , that is, the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a control-variate idea to reduce the variance of our unbiased estimator, then the Zig-Zag process can be super-efficient: after an initial preprocessing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.
</p>projecteuclid.org/euclid.aos/1550026838_20190212220137Tue, 12 Feb 2019 22:01 ESTEstimation of large covariance and precision matrices from temporally dependent observationshttps://projecteuclid.org/euclid.aos/1550026839<strong>Hai Shu</strong>, <strong>Bin Nan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1321--1350.</p><p><strong>Abstract:</strong><br/>
We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_{1}$ minimization and the $\ell_{1}$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.
</p>projecteuclid.org/euclid.aos/1550026839_20190212220137Tue, 12 Feb 2019 22:01 ESTBootstrap tuning in Gaussian ordered model selectionhttps://projecteuclid.org/euclid.aos/1550026841<strong>Vladimir Spokoiny</strong>, <strong>Niklas Willrich</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1351--1380.</p><p><strong>Abstract:</strong><br/>
The paper focuses on the problem of model selection in linear Gaussian regression with unknown possibly inhomogeneous noise. For a given family of linear estimators $\{\widetilde{\boldsymbol{{\theta}}}_{m},m\in\mathscr{M}\}$, ordered by their variance, we offer a new “smallest accepted” approach motivated by Lepski’s device and the multiple testing idea. The procedure selects the smallest model which satisfies the acceptance rule based on comparison with all larger models. The method is completely data-driven and does not use any prior information about the variance structure of the noise: its parameters are adjusted to the underlying possibly heterogeneous noise by the so-called “propagation condition” using a wild bootstrap method. The validity of the bootstrap calibration is proved for finite samples with an explicit error bound. We provide a comprehensive theoretical study of the method, describe in details the set of possible values of the selected model $\widehat{m}\in\mathscr{M}$ and establish some oracle error bounds for the corresponding estimator $\widehat{\boldsymbol{{\theta}}}=\widetilde{\boldsymbol{{\theta}}}_{\widehat{m}}$.
</p>projecteuclid.org/euclid.aos/1550026841_20190212220137Tue, 12 Feb 2019 22:01 ESTSequential change-point detection based on nearest neighborshttps://projecteuclid.org/euclid.aos/1550026842<strong>Hao Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1381--1407.</p><p><strong>Abstract:</strong><br/>
We propose a new framework for the detection of change-points in online, sequential data analysis. The approach utilizes nearest neighbor information and can be applied to sequences of multivariate observations or non-Euclidean data objects, such as network data. Different stopping rules are explored, and one specific rule is recommended due to its desirable properties. An accurate analytic approximation of the average run length is derived for the recommended rule, making it an easy off-the-shelf approach for real multivariate/object sequential data monitoring applications. Simulations reveal that the new approach has better performance than likelihood-based approaches for high dimensional data. The new approach is illustrated through a real dataset in detecting global structural changes in social networks.
</p>projecteuclid.org/euclid.aos/1550026842_20190212220137Tue, 12 Feb 2019 22:01 ESTPrediction when fitting simple models to high-dimensional datahttps://projecteuclid.org/euclid.aos/1550026843<strong>Lukas Steinberger</strong>, <strong>Hannes Leeb</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1408--1442.</p><p><strong>Abstract:</strong><br/>
We study linear subset regression in the context of a high-dimensional linear model. Consider $y=\vartheta +\theta 'z+\epsilon $ with univariate response $y$ and a $d$-vector of random regressors $z$, and a submodel where $y$ is regressed on a set of $p$ explanatory variables that are given by $x=M'z$, for some $d\times p$ matrix $M$. Here, “high-dimensional” means that the number $d$ of available explanatory variables in the overall model is much larger than the number $p$ of variables in the submodel. In this paper, we present Pinsker-type results for prediction of $y$ given $x$. In particular, we show that the mean squared prediction error of the best linear predictor of $y$ given $x$ is close to the mean squared prediction error of the corresponding Bayes predictor $\mathbb{E}[y\|x]$, provided only that $p/\log d$ is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from $n$ independent observations of $(y,x)$ is close to that of the Bayes predictor, provided only that both $p/\log d$ and $p/n$ are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables $z$.
</p>projecteuclid.org/euclid.aos/1550026843_20190212220137Tue, 12 Feb 2019 22:01 ESTTwo-sample and ANOVA tests for high dimensional meanshttps://projecteuclid.org/euclid.aos/1550026845<strong>Song Xi Chen</strong>, <strong>Jun Li</strong>, <strong>Ping-Shou Zhong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1443--1474.</p><p><strong>Abstract:</strong><br/>
This paper considers testing the equality of two high dimensional means. Two approaches are utilized to formulate $L_{2}$-type tests for better power performance when the two high dimensional mean vectors differ only in sparsely populated coordinates and the differences are faint. One is to conduct thresholding to remove the nonsignal bearing dimensions for variance reduction of the test statistics. The other is to transform the data via the precision matrix for signal enhancement. It is shown that the thresholding and data transformation lead to attractive detection boundaries for the tests. Furthermore, we demonstrate explicitly the effects of precision matrix estimation on the detection boundary for the test with thresholding and data transformation. Extension to multi-sample ANOVA tests is also investigated. Numerical studies are performed to confirm the theoretical findings and demonstrate the practical implementations.
</p>projecteuclid.org/euclid.aos/1550026845_20190212220137Tue, 12 Feb 2019 22:01 ESTValid confidence intervals for post-model-selection predictorshttps://projecteuclid.org/euclid.aos/1550026846<strong>François Bachoc</strong>, <strong>Hannes Leeb</strong>, <strong>Benedikt M. Pötscher</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1475--1504.</p><p><strong>Abstract:</strong><br/>
We consider inference post-model-selection in linear regression. In this setting, Berk et al. [ Ann. Statist. 41 (2013a) 802–837] recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain nonstandard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to confidence intervals for post-model-selection predictors.
</p>projecteuclid.org/euclid.aos/1550026846_20190212220137Tue, 12 Feb 2019 22:01 ESTA robust and efficient approach to causal inference based on sparse sufficient dimension reductionhttps://projecteuclid.org/euclid.aos/1550026847<strong>Shujie Ma</strong>, <strong>Liping Zhu</strong>, <strong>Zhiwei Zhang</strong>, <strong>Chih-Ling Tsai</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1505--1535.</p><p><strong>Abstract:</strong><br/>
A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.
</p>projecteuclid.org/euclid.aos/1550026847_20190212220137Tue, 12 Feb 2019 22:01 ESTThe maximum likelihood threshold of a path diagramhttps://projecteuclid.org/euclid.aos/1550026848<strong>Mathias Drton</strong>, <strong>Christopher Fox</strong>, <strong>Andreas Käufl</strong>, <strong>Guillaume Pouliot</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1536--1553.</p><p><strong>Abstract:</strong><br/>
Linear structural equation models postulate noisy linear relationships between variables of interest. Each model corresponds to a path diagram, which is a mixed graph with directed edges that encode the domains of the linear functions and bidirected edges that indicate possible correlations among noise terms. Using this graphical representation, we determine the maximum likelihood threshold, that is, the minimum sample size at which the likelihood function of a Gaussian structural equation model is almost surely bounded. Our result allows the model to have feedback loops and is based on decomposing the path diagram with respect to the connected components of its bidirected part. We also prove that if the sample size is below the threshold, then the likelihood function is almost surely unbounded. Our work clarifies, in particular, that standard likelihood inference is applicable to sparse high-dimensional models even if they feature feedback loops.
</p>projecteuclid.org/euclid.aos/1550026848_20190212220137Tue, 12 Feb 2019 22:01 ESTConvex regularization for high-dimensional multiresponse tensor regressionhttps://projecteuclid.org/euclid.aos/1550026849<strong>Garvesh Raskutti</strong>, <strong>Ming Yuan</strong>, <strong>Han Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1554--1584.</p><p><strong>Abstract:</strong><br/>
In this paper, we present a general convex optimization approach for solving high-dimensional multiple response tensor regression problems under low-dimensional structural assumptions. We consider using convex and weakly decomposable regularizers assuming that the underlying tensor lies in an unknown low-dimensional subspace. Within our framework, we derive general risk bounds of the resulting estimate under fairly general dependence structure among covariates. Our framework leads to upper bounds in terms of two very simple quantities, the Gaussian width of a convex set in tensor space and the intrinsic dimension of the low-dimensional tensor subspace. To the best of our knowledge, this is the first general framework that applies to multiple response problems. These general bounds provide useful upper bounds on rates of convergence for a number of fundamental statistical models of interest including multiresponse regression, vector autoregressive models, low-rank tensor models and pairwise interaction models. Moreover, in many of these settings we prove that the resulting estimates are minimax optimal. We also provide a numerical study that both validates our theoretical guarantees and demonstrates the breadth of our framework.
</p>projecteuclid.org/euclid.aos/1550026849_20190212220137Tue, 12 Feb 2019 22:01 ESTLarge sample theory for merged data from multiple sourceshttps://projecteuclid.org/euclid.aos/1550026850<strong>Takumi Saegusa</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1585--1615.</p><p><strong>Abstract:</strong><br/>
We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional $M$-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.
</p>projecteuclid.org/euclid.aos/1550026850_20190212220137Tue, 12 Feb 2019 22:01 ESTKhinchine’s theorem and Edgeworth approximations for weighted sumshttps://projecteuclid.org/euclid.aos/1550026851<strong>Sergey G. Bobkov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1616--1633.</p><p><strong>Abstract:</strong><br/>
Let $F_{n}$ denote the distribution function of the normalized sum of $n$ i.i.d. random variables. In this paper, polynomial rates of approximation of $F_{n}$ by the corrected normal laws are considered in the model where the underlying distribution has a convolution structure. As a basic tool, the convergence part of Khinchine’s theorem in metric theory of Diophantine approximations is extended to the class of product characteristic functions.
</p>projecteuclid.org/euclid.aos/1550026851_20190212220137Tue, 12 Feb 2019 22:01 ESTDistributed inference for quantile regression processeshttps://projecteuclid.org/euclid.aos/1550026852<strong>Stanislav Volgushev</strong>, <strong>Shih-Kang Chao</strong>, <strong>Guang Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1634--1662.</p><p><strong>Abstract:</strong><br/>
The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from subsamples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.
</p>projecteuclid.org/euclid.aos/1550026852_20190212220137Tue, 12 Feb 2019 22:01 ESTGaussian approximation of maxima of Wiener functionals and its application to high-frequency datahttps://projecteuclid.org/euclid.aos/1550026853<strong>Yuta Koike</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1663--1687.</p><p><strong>Abstract:</strong><br/>
This paper establishes an upper bound for the Kolmogorov distance between the maximum of a high-dimensional vector of smooth Wiener functionals and the maximum of a Gaussian random vector. As a special case, we show that the maximum of multiple Wiener–Itô integrals with common orders is well approximated by its Gaussian analog in terms of the Kolmogorov distance if their covariance matrices are close to each other and the maximum of the fourth cumulants of the multiple Wiener–Itô integrals is close to zero. This may be viewed as a new kind of fourth moment phenomenon, which has attracted considerable attention in the recent studies of probability. This type of Gaussian approximation result has many potential applications to statistics. To illustrate this point, we present two statistical applications in high-frequency financial econometrics: One is the hypothesis testing problem for the absence of lead-lag effects and the other is the construction of uniform confidence bands for spot volatility.
</p>projecteuclid.org/euclid.aos/1550026853_20190212220137Tue, 12 Feb 2019 22:01 ESTCausal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventionshttps://projecteuclid.org/euclid.aos/1550026854<strong>Dominik Rothenhäusler</strong>, <strong>Peter Bühlmann</strong>, <strong>Nicolai Meinshausen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1688--1722.</p><p><strong>Abstract:</strong><br/>
Causal inference is known to be very challenging when only observational data are available. Randomized experiments are often costly and impractical and in instrumental variable regression the number of instruments has to exceed the number of causal predictors. It was recently shown in Peters, Bühlmann and Meinshausen (2016) ( J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947–1012) that causal inference for the full model is possible when data from distinct observational environments are available, exploiting that the conditional distribution of a response variable is invariant under the correct causal model. Two shortcomings of such an approach are the high computational effort for large-scale data and the assumed absence of hidden confounders. Here, we show that these two shortcomings can be addressed if one is willing to make a more restrictive assumption on the type of interventions that generate different environments. Thereby, we look at a different notion of invariance, namely inner-product invariance. By avoiding a computationally cumbersome reverse-engineering approach such as in Peters, Bühlmann and Meinshausen (2016), it allows for large-scale causal inference in linear structural equation models. We discuss identifiability conditions for the causal parameter and derive asymptotic confidence intervals in the low-dimensional setting. In the case of nonidentifiability, we show that the solution set of causal Dantzig has predictive guarantees under certain interventions. We derive finite-sample bounds in the high-dimensional setting and investigate its performance on simulated datasets.
</p>projecteuclid.org/euclid.aos/1550026854_20190212220137Tue, 12 Feb 2019 22:01 ESTNonpenalized variable selection in high-dimensional linear model settings via generalized fiducial inferencehttps://projecteuclid.org/euclid.aos/1550026855<strong>Jonathan P. Williams</strong>, <strong>Jan Hannig</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1723--1753.</p><p><strong>Abstract:</strong><br/>
Standard penalized methods of variable selection and parameter estimation rely on the magnitude of coefficient estimates to decide which variables to include in the final model. However, coefficient estimates are unreliable when the design matrix is collinear. To overcome this challenge, an entirely new perspective on variable selection is presented within a generalized fiducial inference framework. This new procedure is able to effectively account for linear dependencies among subsets of covariates in a high-dimensional setting where $p$ can grow almost exponentially in $n$, as well as in the classical setting where $p\le n$. It is shown that the procedure very naturally assigns small probabilities to subsets of covariates which include redundancies by way of explicit $L_{0}$ minimization. Furthermore, with a typical sparsity assumption, it is shown that the proposed method is consistent in the sense that the probability of the true sparse subset of covariates converges in probability to 1 as $n\to\infty$, or as $n\to\infty$ and $p\to\infty$. Very reasonable conditions are needed, and little restriction is placed on the class of possible subsets of covariates to achieve this consistency result.
</p>projecteuclid.org/euclid.aos/1550026855_20190212220137Tue, 12 Feb 2019 22:01 ESTSuper-resolution estimation of cyclic arrival rateshttps://projecteuclid.org/euclid.aos/1550026856<strong>Ningyuan Chen</strong>, <strong>Donald K. K. Lee</strong>, <strong>Sahand N. Negahban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1754--1775.</p><p><strong>Abstract:</strong><br/>
Exploiting the fact that most arrival processes exhibit cyclic behaviour, we propose a simple procedure for estimating the intensity of a nonhomogeneous Poisson process. The estimator is the super-resolution analogue to Shao (2010) and Shao and Lii [ J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 (2011) 99–122], which is a sum of $p$ sinusoids where $p$ and the amplitude and phase of each wave are not known and need to be estimated. This results in an interpretable yet flexible specification that is suitable for use in modelling as well as in high resolution simulations.
Our estimation procedure sits in between classic periodogram methods and atomic/total variation norm thresholding. Through a novel use of window functions in the point process domain, our approach attains super-resolution without semidefinite programming. Under suitable conditions, finite sample guarantees can be derived for our procedure. These resolve some open questions and expand existing results in spectral estimation literature.
</p>projecteuclid.org/euclid.aos/1550026856_20190212220137Tue, 12 Feb 2019 22:01 ESTSequential multiple testing with generalized error control: An asymptotic optimality theoryhttps://projecteuclid.org/euclid.aos/1550026857<strong>Yanglei Song</strong>, <strong>Georgios Fellouris</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 3, 1776--1803.</p><p><strong>Abstract:</strong><br/>
The sequential multiple testing problem is considered under two generalized error metrics. Under the first one, the probability of at least $k$ mistakes, of any kind, is controlled. Under the second, the probabilities of at least $k_{1}$ false positives and at least $k_{2}$ false negatives are simultaneously controlled. For each formulation, the optimal expected sample size is characterized, to a first-order asymptotic approximation as the error probabilities go to 0, and a novel multiple testing procedure is proposed and shown to be asymptotically efficient under every signal configuration. These results are established when the data streams for the various hypotheses are independent and each local log-likelihood ratio statistic satisfies a certain strong law of large numbers. In the special case of i.i.d. observations in each stream, the gains of the proposed sequential procedures over fixed-sample size schemes are quantified.
</p>projecteuclid.org/euclid.aos/1550026857_20190212220137Tue, 12 Feb 2019 22:01 EST