The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTEstimating variance of random effects to solve multiple problems simultaneouslyhttps://projecteuclid.org/euclid.aos/1530086431<strong>Masayo Yoshimori Hirose</strong>, <strong>Partha Lahiri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1721--1741.</p><p><strong>Abstract:</strong><br/>
The two-level normal hierarchical model (NHM) has played a critical role in statistical theory for the last several decades. In this paper, we propose random effects variance estimator that simultaneously (i) improves on the estimation of the related shrinkage factors, (ii) protects empirical best linear unbiased predictors (EBLUP) [same as empirical Bayes (EB)] of the random effects from the common overshrinkage problem, (iii) avoids complex bias correction in generating strictly positive second-order unbiased mean square error (MSE) (same as integrated Bayes risk) estimator either by the Taylor series or single parametric bootstrap method. The idea of achieving multiple desirable properties in an EBLUP or EB method through a suitably devised random effects variance estimator is the first of its kind and holds promise in providing good inferences for random effects under the EBLUP or EB framework. The proposed methodology is also evaluated using a Monte Carlo simulation study and real data analysis.
</p>projecteuclid.org/euclid.aos/1530086431_20180627040039Wed, 27 Jun 2018 04:00 EDTOptimal shrinkage of eigenvalues in the spiked covariance modelhttps://projecteuclid.org/euclid.aos/1530086432<strong>David Donoho</strong>, <strong>Matan Gavish</strong>, <strong>Iain Johnstone</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1742--1778.</p><p><strong>Abstract:</strong><br/>
We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation.
In an asymptotic framework based on the spiked covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker $\eta$ that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker $\eta^{*}$ dominating all other shrinkers. The shape of the optimal shrinker is determined by the choice of loss function and, crucially, by inconsistency of both eigenvalues and eigenvectors of the sample covariance matrix.
Details of these phenomena and closed form formulas for the optimal eigenvalue shrinkers are worked out for a menagerie of 26 loss functions for covariance estimation found in the literature, including the Stein, Entropy, Divergence, Fréchet, Bhattacharya/Matusita, Frobenius Norm, Operator Norm, Nuclear Norm and Condition Number losses.
</p>projecteuclid.org/euclid.aos/1530086432_20180627040039Wed, 27 Jun 2018 04:00 EDTA Bayesian approach to the selection of two-level multi-stratum factorial designshttps://projecteuclid.org/euclid.aos/1530086433<strong>Ming-Chung Chang</strong>, <strong>Ching-Shui Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1779--1806.</p><p><strong>Abstract:</strong><br/>
In a multi-stratum factorial experiment, there are multiple error terms (strata) with different variances that arise from complicated structures of the experimental units. For unstructured experimental units, minimum aberration is a popular criterion for choosing regular fractional factorial designs. One difficulty in extending this criterion to multi-stratum factorial designs is that the formulation of a word length pattern based on which minimum aberration is defined requires an order of desirability among the relevant words, but a natural order is often lacking. Furthermore, a criterion based only on word length patterns does not account for the different stratum variances. Mitchell, Morris and Ylvisaker [ Statist. Sinica 5 (1995) 559–573] proposed a framework for Bayesian factorial designs. A Gaussian process is used as the prior for the treatment effects, from which a prior distribution of the factorial effects is induced. This approach is applied to study optimal and efficient multi-stratum factorial designs. Good surrogates for the Bayesian criteria that can be related to word length and generalized word length patterns for regular and nonregular designs, respectively, are derived. A tool is developed for eliminating inferior designs and reducing the designs that need to be considered without requiring any knowledge of stratum variances. Numerical examples are used to illustrate the theory in several settings.
</p>projecteuclid.org/euclid.aos/1530086433_20180627040039Wed, 27 Jun 2018 04:00 EDTAccuracy assessment for high-dimensional linear regressionhttps://projecteuclid.org/euclid.aos/1530086434<strong>T. Tony Cai</strong>, <strong>Zijian Guo</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 4, 1807--1836.</p><p><strong>Abstract:</strong><br/>
This paper considers point and interval estimation of the $\ell_{q}$ loss of an estimator in high-dimensional linear regression with random design. We establish the minimax rate for estimating the $\ell_{q}$ loss and the minimax expected length of confidence intervals for the $\ell_{q}$ loss of rate-optimal estimators of the regression vector, including commonly used estimators such as Lasso, scaled Lasso, square-root Lasso and Dantzig Selector. Adaptivity of confidence intervals for the $\ell_{q}$ loss is also studied. Both the setting of the known identity design covariance matrix and known noise level and the setting of unknown design covariance matrix and unknown noise level are studied. The results reveal interesting and significant differences between estimating the $\ell_{2}$ loss and $\ell_{q}$ loss with $1\le q<2$ as well as between the two settings.
New technical tools are developed to establish rate sharp lower bounds for the minimax estimation error and the expected length of minimax and adaptive confidence intervals for the $\ell_{q}$ loss. A significant difference between loss estimation and the traditional parameter estimation is that for loss estimation the constraint is on the performance of the estimator of the regression vector, but the lower bounds are on the difficulty of estimating its $\ell_{q}$ loss. The technical tools developed in this paper can also be of independent interest.
</p>projecteuclid.org/euclid.aos/1530086434_20180627040039Wed, 27 Jun 2018 04:00 EDTVariable selection with Hamming losshttps://projecteuclid.org/euclid.aos/1534492821<strong>Cristina Butucea</strong>, <strong>Mohamed Ndaoud</strong>, <strong>Natalia A. Stepanova</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1837--1875.</p><p><strong>Abstract:</strong><br/>
We derive nonasymptotic bounds for the minimax risk of variable selection under expected Hamming loss in the Gaussian mean model in $\mathbb{R}^{d}$ for classes of at most $s$-sparse vectors separated from 0 by a constant $a>0$. In some cases, we get exact expressions for the nonasymptotic minimax risk as a function of $d,s,a$ and find explicitly the minimax selectors. These results are extended to dependent or non-Gaussian observations and to the problem of crowdsourcing. Analogous conclusions are obtained for the probability of wrong recovery of the sparsity pattern. As corollaries, we derive necessary and sufficient conditions for such asymptotic properties as almost full recovery and exact recovery. Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes.
</p>projecteuclid.org/euclid.aos/1534492821_20180817040040Fri, 17 Aug 2018 04:00 EDTRandomization-based causal inference from split-plot designshttps://projecteuclid.org/euclid.aos/1534492822<strong>Anqi Zhao</strong>, <strong>Peng Ding</strong>, <strong>Rahul Mukerjee</strong>, <strong>Tirthankar Dasgupta</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1876--1903.</p><p><strong>Abstract:</strong><br/>
Under the potential outcomes framework, we propose a randomization based estimation procedure for causal inference from split-plot designs, with special emphasis on $2^{2}$ designs that naturally arise in many social, behavioral and biomedical experiments. Point estimators of factorial effects are obtained and their sampling variances are derived in closed form as linear combinations of the between- and within-group covariances of the potential outcomes. Results are compared to those under complete randomization as measures of design efficiency. Conservative estimators of these sampling variances are proposed. Connection of the randomization-based approach to inference based on the linear mixed effects model is explored. Results on sampling variances of point estimators and their estimators are extended to general split-plot designs. The superiority over existing model-based alternatives in frequency coverage properties is reported under a variety of simulation settings for both binary and continuous outcomes.
</p>projecteuclid.org/euclid.aos/1534492822_20180817040040Fri, 17 Aug 2018 04:00 EDTA new perspective on robust $M$-estimation: Finite sample theory and applications to dependence-adjusted multiple testinghttps://projecteuclid.org/euclid.aos/1534492823<strong>Wen-Xin Zhou</strong>, <strong>Koushiki Bose</strong>, <strong>Jianqing Fan</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1904--1931.</p><p><strong>Abstract:</strong><br/>
Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [ Ann. Statist. 1 (1973) 799–821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry–Esseen inequality and Cramér-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.
</p>projecteuclid.org/euclid.aos/1534492823_20180817040040Fri, 17 Aug 2018 04:00 EDTRobust covariance and scatter matrix estimation under Huber’s contamination modelhttps://projecteuclid.org/euclid.aos/1534492824<strong>Mengjie Chen</strong>, <strong>Chao Gao</strong>, <strong>Zhao Ren</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1932--1960.</p><p><strong>Abstract:</strong><br/>
Covariance matrix estimation is one of the most important problems in statistics. To accommodate the complexity of modern datasets, it is desired to have estimation procedures that not only can incorporate the structural assumptions of covariance matrices, but are also robust to outliers from arbitrary sources. In this paper, we define a new concept called matrix depth and then propose a robust covariance matrix estimator by maximizing the empirical depth function. The proposed estimator is shown to achieve minimax optimal rate under Huber’s $\varepsilon$-contamination model for estimating covariance/scatter matrices with various structures including bandedness and sparsity.
</p>projecteuclid.org/euclid.aos/1534492824_20180817040040Fri, 17 Aug 2018 04:00 EDTEmpirical best prediction under a nested error model with log transformationhttps://projecteuclid.org/euclid.aos/1534492825<strong>Isabel Molina</strong>, <strong>Nirian Martín</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1961--1993.</p><p><strong>Abstract:</strong><br/>
In regression models involving economic variables such as income, log transformation is typically taken to achieve approximate normality and stabilize the variance. However, often the interest is predicting individual values or means of the variable in the original scale. Under a nested error model for the log transformation of the target variable, we show that the usual approach of back transforming the predicted values may introduce a substantial bias. We obtain the optimal (or “best”) predictors of individual values of the original variable and of small area means under that model. Empirical best predictors are defined by estimating the unknown model parameters in the best predictors. When estimation is desired for subpopulations with small sample sizes (small areas), nested error models are widely used to “borrow strength” from the other areas and obtain estimators with greater efficiency than direct estimators based on the scarce area-specific data. We show that naive predictors of small area means obtained by back-transformation under the mentioned model may even underperform direct estimators. Moreover, assessing the uncertainty of the considered predictor is not straightforward. Exact mean squared errors of the best predictors and second-order approximations to the mean squared errors of the empirical best predictors are derived. Estimators of the mean squared errors that are second-order correct are also obtained. Simulation studies and an example with Mexican data on living conditions illustrate the procedures.
</p>projecteuclid.org/euclid.aos/1534492825_20180817040040Fri, 17 Aug 2018 04:00 EDTBackward nested descriptors asymptotics with inference on stem cell differentiationhttps://projecteuclid.org/euclid.aos/1534492826<strong>Stephan F. Huckemann</strong>, <strong>Benjamin Eltzner</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 1994--2019.</p><p><strong>Abstract:</strong><br/>
For sequences of random backward nested subspaces as occur, say, in dimension reduction for manifold or stratified space valued data, asymptotic results are derived. In fact, we formulate our results more generally for backward nested families of descriptors (BNFD). Under rather general conditions, asymptotic strong consistency holds. Under additional, still rather general hypotheses, among them existence of a.s. local twice differentiable charts, asymptotic joint normality of a BNFD can be shown. If charts factor suitably, this leads to individual asymptotic normality for the last element, a principal nested mean or a principal nested geodesic, say. It turns out that these results pertain to principal nested spheres (PNS) and principal nested great subsphere (PNGS) analysis by Jung, Dryden and Marron [ Biometrika 99 (2012) 551–568] as well as to the intrinsic mean on a first geodesic principal component (IMo1GPC) for manifolds and Kendall’s shape spaces. A nested bootstrap two-sample test is derived and illustrated with simulations. In a study on real data, PNGS is applied to track early human mesenchymal stem cell differentiation over a coarse time grid and, among others, to locate a change point with direct consequences for the design of further studies.
</p>projecteuclid.org/euclid.aos/1534492826_20180817040040Fri, 17 Aug 2018 04:00 EDTChange-point detection in multinomial data with a large number of categorieshttps://projecteuclid.org/euclid.aos/1534492827<strong>Guanghui Wang</strong>, <strong>Changliang Zou</strong>, <strong>Guosheng Yin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2020--2044.</p><p><strong>Abstract:</strong><br/>
We consider a sequence of multinomial data for which the probabilities associated with the categories are subject to abrupt changes of unknown magnitudes at unknown locations. When the number of categories is comparable to or even larger than the number of subjects allocated to these categories, conventional methods such as the classical Pearson’s chi-squared test and the deviance test may not work well. Motivated by high-dimensional homogeneity tests, we propose a novel change-point detection procedure that allows the number of categories to tend to infinity. The null distribution of our test statistic is asymptotically normal and the test performs well with finite samples. The number of change-points is determined by minimizing a penalized objective function based on segmentation, and the locations of the change-points are estimated by minimizing the objective function with the dynamic programming algorithm. Under some mild conditions, the consistency of the estimators of multiple change-points is established. Simulation studies show that the proposed method performs satisfactorily for identifying change-points in terms of power and estimation accuracy, and it is illustrated with an analysis of a real data set.
</p>projecteuclid.org/euclid.aos/1534492827_20180817040040Fri, 17 Aug 2018 04:00 EDTLocal asymptotic normality property for fractional Gaussian noise under high-frequency observationshttps://projecteuclid.org/euclid.aos/1534492828<strong>Alexandre Brouste</strong>, <strong>Masaaki Fukasawa</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2045--2061.</p><p><strong>Abstract:</strong><br/>
Local Asymptotic Normality (LAN) property for fractional Gaussian noise under high-frequency observations is proved with nondiagonal rate matrices depending on the parameter to be estimated. In contrast to the LAN families in the literature, nondiagonal rate matrices are inevitable. As consequences of the LAN property, a maximum likelihood sequence of estimators is shown to be asymptotically efficient and the likelihood ratio test on the Hurst parameter is shown to be an asymptotically uniformly most powerful unbiased test for two-sided hypotheses.
</p>projecteuclid.org/euclid.aos/1534492828_20180817040040Fri, 17 Aug 2018 04:00 EDTGlobal testing against sparse alternatives under Ising modelshttps://projecteuclid.org/euclid.aos/1534492829<strong>Rajarshi Mukherjee</strong>, <strong>Sumit Mukherjee</strong>, <strong>Ming Yuan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2062--2093.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the effect of dependence on detecting sparse signals. In particular, we focus on global testing against sparse alternatives for the means of binary outcomes following an Ising model, and establish how the interplay between the strength and sparsity of a signal determines its detectability under various notions of dependence. The profound impact of dependence is best illustrated under the Curie–Weiss model where we observe the effect of a “thermodynamic” phase transition. In particular, the critical state exhibits a subtle “blessing of dependence” phenomenon in that one can detect much weaker signals at criticality than otherwise. Furthermore, we develop a testing procedure that is broadly applicable to account for dependence and show that it is asymptotically minimax optimal under fairly general regularity conditions.
</p>projecteuclid.org/euclid.aos/1534492829_20180817040040Fri, 17 Aug 2018 04:00 EDTPrincipal component analysis for second-order stationary vector time serieshttps://projecteuclid.org/euclid.aos/1534492830<strong>Jinyuan Chang</strong>, <strong>Bin Guo</strong>, <strong>Qiwei Yao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2094--2124.</p><p><strong>Abstract:</strong><br/>
We extend the principal component analysis (PCA) to second-order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a $p$-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When $p$ is large, an additional step is required to perform a permutation in terms of either maximum cross-correlations or FDR based on multiple tests. The asymptotic theory is established for both fixed $p$ and diverging $p$ when the sample size $n$ tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.
</p>projecteuclid.org/euclid.aos/1534492830_20180817040040Fri, 17 Aug 2018 04:00 EDTEstimation of a monotone density in $s$-sample biased sampling modelshttps://projecteuclid.org/euclid.aos/1534492831<strong>Kwun Chuen Gary Chan</strong>, <strong>Hok Kan Ling</strong>, <strong>Tony Sit</strong>, <strong>Sheung Chi Phillip Yam</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2125--2152.</p><p><strong>Abstract:</strong><br/>
We study the nonparametric estimation of a decreasing density function $g_{0}$ in a general $s$-sample biased sampling model with weight (or bias) functions $w_{i}$ for $i=1,\ldots,s$. The determination of the monotone maximum likelihood estimator $\hat{g}_{n}$ and its asymptotic distribution, except for the case when $s=1$, has been long missing in the literature due to certain nonstandard structures of the likelihood function, such as nonseparability and a lack of strictly positive second order derivatives of the negative of the log-likelihood function. The existence, uniqueness, self-characterization, consistency of $\hat{g}_{n}$ and its asymptotic distribution at a fixed point are established in this article. To overcome the barriers caused by nonstandard likelihood structures, for instance, we show the tightness of $\hat{g}_{n}$ via a purely analytic argument instead of an intrinsic geometric one and propose an indirect approach to attain the $\sqrt{n}$-rate of convergence of the linear functional $\int w_{i}\hat{g}_{n}$.
</p>projecteuclid.org/euclid.aos/1534492831_20180817040040Fri, 17 Aug 2018 04:00 EDTCommunity detection in degree-corrected block modelshttps://projecteuclid.org/euclid.aos/1534492832<strong>Chao Gao</strong>, <strong>Zongming Ma</strong>, <strong>Anderson Y. Zhang</strong>, <strong>Harrison H. Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2153--2185.</p><p><strong>Abstract:</strong><br/>
Community detection is a central problem of network data analysis. Given a network, the goal of community detection is to partition the network nodes into a small number of clusters, which could often help reveal interesting structures. The present paper studies community detection in Degree-Corrected Block Models (DCBMs). We first derive asymptotic minimax risks of the problem for a misclassification proportion loss under appropriate conditions. The minimax risks are shown to depend on degree-correction parameters, community sizes and average within and between community connectivities in an intuitive and interpretable way. In addition, we propose a polynomial time algorithm to adaptively perform consistent and even asymptotically optimal community detection in DCBMs.
</p>projecteuclid.org/euclid.aos/1534492832_20180817040040Fri, 17 Aug 2018 04:00 EDTCLT for largest eigenvalues and unit root testing for high-dimensional nonstationary time serieshttps://projecteuclid.org/euclid.aos/1534492833<strong>Bo Zhang</strong>, <strong>Guangming Pan</strong>, <strong>Jiti Gao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2186--2215.</p><p><strong>Abstract:</strong><br/>
Let $\{Z_{ij}\}$ be independent and identically distributed (i.i.d.) random variables with $EZ_{ij}=0$, $E\vert Z_{ij}\vert^{2}=1$ and $E\vert Z_{ij}\vert^{4}<\infty$. Define linear processes $Y_{tj}=\sum_{k=0}^{\infty}b_{k}Z_{t-k,j}$ with $\sum_{i=0}^{\infty}\vert b_{i}\vert <\infty$. Consider a $p$-dimensional time series model of the form $\mathbf{x}_{t}=\boldsymbol{\Pi} \mathbf{x}_{t-1}+\Sigma^{1/2}\mathbf{y}_{t},\ 1\leq t\leq T$ with $\mathbf{y}_{t}=(Y_{t1},\ldots,Y_{tp})'$ and $\Sigma^{1/2}$ be the square root of a symmetric positive definite matrix. Let $\mathbf{B}=(1/p)\mathbf{XX}^{*}$ with $\mathbf{X}=(\mathbf{x_{1}},\ldots,\mathbf{x_{T}})'$ and $X^{*}$ be the conjugate transpose. This paper establishes both the convergence in probability and the asymptotic joint distribution of the first $k$ largest eigenvalues of $\mathbf{B}$ when $\mathbf{x}_{t}$ is nonstationary. As an application, two new unit root tests for possible nonstationarity of high-dimensional time series are proposed and then studied both theoretically and numerically.
</p>projecteuclid.org/euclid.aos/1534492833_20180817040040Fri, 17 Aug 2018 04:00 EDTSmooth backfitting for errors-in-variables additive modelshttps://projecteuclid.org/euclid.aos/1534492834<strong>Kyunghee Han</strong>, <strong>Byeong U. Park</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2216--2250.</p><p><strong>Abstract:</strong><br/>
In this work, we develop a new smooth backfitting method and theory for estimating additive nonparametric regression models when the covariates are contaminated by measurement errors. For this, we devise a new kernel function that suitably deconvolutes the bias due to measurement errors as well as renders a projection interpretation to the resulting estimator in the space of additive functions. The deconvolution property and the projection interpretation are essential for a successful solution of the problem. We prove that the method based on the new kernel weighting scheme achieves the optimal rate of convergence in one-dimensional deconvolution problems when the smoothness of measurement error distribution is less than a threshold value. We find that the speed of convergence is slower than the univariate rate when the smoothness of measurement error distribution is above the threshold, but it is still much faster than the optimal rate in multivariate deconvolution problems. The theory requires a deliberate analysis of the nonnegligible effects of measurement errors being propagated to other additive components through backfitting operation. We present the finite sample performance of the deconvolution smooth backfitting estimators that confirms our theoretical findings.
</p>projecteuclid.org/euclid.aos/1534492834_20180817040040Fri, 17 Aug 2018 04:00 EDTUnifying Markov properties for graphical modelshttps://projecteuclid.org/euclid.aos/1534492835<strong>Steffen Lauritzen</strong>, <strong>Kayvan Sadeghi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2251--2278.</p><p><strong>Abstract:</strong><br/>
Several types of graphs with different conditional independence interpretations—also known as Markov properties—have been proposed and used in graphical models. In this paper, we unify these Markov properties by introducing a class of graphs with four types of edges—lines, arrows, arcs and dotted lines—and a single separation criterion. We show that independence structures defined by this class specialize to each of the previously defined cases, when suitable subclasses of graphs are considered. In addition, we define a pairwise Markov property for the subclass of chain mixed graphs, which includes chain graphs with the LWF interpretation, as well as summary graphs (and consequently ancestral graphs). We prove the equivalence of this pairwise Markov property to the global Markov property for compositional graphoid independence models.
</p>projecteuclid.org/euclid.aos/1534492835_20180817040040Fri, 17 Aug 2018 04:00 EDTAdaptation in log-concave density estimationhttps://projecteuclid.org/euclid.aos/1534492836<strong>Arlene K. H. Kim</strong>, <strong>Adityanand Guntuboyina</strong>, <strong>Richard J. Samworth</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2279--2306.</p><p><strong>Abstract:</strong><br/>
The log-concave maximum likelihood estimator of a density on the real line based on a sample of size $n$ is known to attain the minimax optimal rate of convergence of $O(n^{-4/5})$ with respect to, for example, squared Hellinger distance. In this paper, we show that it also enjoys attractive adaptation properties, in the sense that it achieves a faster rate of convergence when the logarithm of the true density is $k$-affine (i.e., made up of $k$ affine pieces), or close to $k$-affine, provided in each case that $k$ is not too large. Our results use two different techniques: the first relies on a new Marshall’s inequality for log-concave density estimation, and reveals that when the true density is close to log-linear on its support, the log-concave maximum likelihood estimator can achieve the parametric rate of convergence in total variation distance. Our second approach depends on local bracketing entropy methods, and allows us to prove a sharp oracle inequality, which implies in particular a risk bound with respect to various global loss functions, including Kullback–Leibler divergence, of $O(\frac{k}{n}\log^{5/4}(en/k))$ when the true density is log-concave and its logarithm is close to $k$-affine.
</p>projecteuclid.org/euclid.aos/1534492836_20180817040040Fri, 17 Aug 2018 04:00 EDTWeak convergence of a pseudo maximum likelihood estimator for the extremal indexhttps://projecteuclid.org/euclid.aos/1534492837<strong>Betina Berghaus</strong>, <strong>Axel Bücher</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2307--2335.</p><p><strong>Abstract:</strong><br/>
The extremes of a stationary time series typically occur in clusters. A primary measure for this phenomenon is the extremal index, representing the reciprocal of the expected cluster size. Both disjoint and sliding blocks estimator for the extremal index are analyzed in detail. In contrast to many competitors, the estimators only depend on the choice of one parameter sequence. We derive an asymptotic expansion, prove asymptotic normality and show consistency of an estimator for the asymptotic variance. Explicit calculations in certain models and a finite-sample Monte Carlo simulation study reveal that the sliding blocks estimator outperforms other blocks estimators, and that it is competitive to runs- and inter-exceedance estimators in various models. The methods are applied to a variety of financial time series.
</p>projecteuclid.org/euclid.aos/1534492837_20180817040040Fri, 17 Aug 2018 04:00 EDTSemiparametric efficiency bounds for high-dimensional modelshttps://projecteuclid.org/euclid.aos/1534492838<strong>Jana Janková</strong>, <strong>Sara van de Geer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2336--2359.</p><p><strong>Abstract:</strong><br/>
Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper, we propose a framework for obtaining semiparametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semiparametric point of view: we concentrate on one-dimensional functions of a high-dimensional parameter. We follow two different approaches to reach the lower bounds: asymptotic Cramér–Rao bounds and Le Cam’s type of analysis. Both of these approaches allow us to define a class of asymptotically unbiased or “regular” estimators for which a lower bound is derived. Consequently, we show that certain estimators obtained by de-sparsifying (or de-biasing) an $\ell_{1}$-penalized M-estimator are asymptotically unbiased and achieve the lower bound on the variance: thus in this sense they are asymptotically efficient. The paper discusses in detail the linear regression model and the Gaussian graphical model.
</p>projecteuclid.org/euclid.aos/1534492838_20180817040040Fri, 17 Aug 2018 04:00 EDTLimit theorems for eigenvectors of the normalized Laplacian for random graphshttps://projecteuclid.org/euclid.aos/1534492839<strong>Minh Tang</strong>, <strong>Carey E. Priebe</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2360--2415.</p><p><strong>Abstract:</strong><br/>
We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and, furthermore, the mean and the covariance matrix of each row are functions of the associated vertex’s block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.
</p>projecteuclid.org/euclid.aos/1534492839_20180817040040Fri, 17 Aug 2018 04:00 EDTOptimality and sub-optimality of PCA I: Spiked random matrix modelshttps://projecteuclid.org/euclid.aos/1534492840<strong>Amelia Perry</strong>, <strong>Alexander S. Wein</strong>, <strong>Afonso S. Bandeira</strong>, <strong>Ankur Moitra</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2416--2451.</p><p><strong>Abstract:</strong><br/>
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or “spike”) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including nonspectral tests. Our results leverage Le Cam’s notion of contiguity and include:
(i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike.
(ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries.
(iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.
</p>projecteuclid.org/euclid.aos/1534492840_20180817040040Fri, 17 Aug 2018 04:00 EDTOn the exponentially weighted aggregate with the Laplace priorhttps://projecteuclid.org/euclid.aos/1534492841<strong>Arnak S. Dalalyan</strong>, <strong>Edwin Grappin</strong>, <strong>Quentin Paris</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2452--2478.</p><p><strong>Abstract:</strong><br/>
In this paper, we study the statistical behaviour of the Exponentially Weighted Aggregate (EWA) in the problem of high-dimensional regression with fixed design. Under the assumption that the underlying regression vector is sparse, it is reasonable to use the Laplace distribution as a prior. The resulting estimator and, specifically, a particular instance of it referred to as the Bayesian lasso, was already used in the statistical literature because of its computational convenience, even though no thorough mathematical analysis of its statistical properties was carried out. The present work fills this gap by establishing sharp oracle inequalities for the EWA with the Laplace prior. These inequalities show that if the temperature parameter is small, the EWA with the Laplace prior satisfies the same type of oracle inequality as the lasso estimator does, as long as the quality of estimation is measured by the prediction loss. Extensions of the proposed methodology to the problem of prediction with low-rank matrices are considered.
</p>projecteuclid.org/euclid.aos/1534492841_20180817040040Fri, 17 Aug 2018 04:00 EDTGoodness-of-fit testing of error distribution in linear measurement error modelshttps://projecteuclid.org/euclid.aos/1534492842<strong>Hira L. Koul</strong>, <strong>Weixing Song</strong>, <strong>Xiaoqing Zhu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 5, 2479--2510.</p><p><strong>Abstract:</strong><br/>
This paper investigates a class of goodness-of-fit tests for fitting an error density in linear regression models with measurement error in covariates. Each test statistic is the integrated square difference between the deconvolution kernel density estimator of the regression model error density and a smoothed version of the null error density, an analog of the so-called Bickel and Rosenblatt test statistic. The asymptotic null distributions of the proposed test statistics are derived for both the ordinary smooth and super smooth cases. The asymptotic power behavior of the proposed tests against a fixed alternative and a class of local nonparametric alternatives for both cases is also described. The finite sample performance of the proposed test is evaluated by a simulation study. The simulation study shows some superiority of the proposed test over some other tests. Finally, a real data is used to illustrate the proposed test.
</p>projecteuclid.org/euclid.aos/1534492842_20180817040040Fri, 17 Aug 2018 04:00 EDTFinding a large submatrix of a Gaussian random matrixhttps://projecteuclid.org/euclid.aos/1536307224<strong>David Gamarnik</strong>, <strong>Quan Li</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2511--2561.</p><p><strong>Abstract:</strong><br/>
We consider the problem of finding a $k\times k$ submatrix of an $n\times n$ matrix with i.i.d. standard Gaussian entries, which has a large average entry. It was shown in [Bhamidi, Dey and Nobel (2012)] using nonconstructive methods that the largest average value of a $k\times k$ submatrix is $2(1+o(1))\sqrt{\log n/k}$, with high probability (w.h.p.), when $k=O(\log n/\log\log n)$. In the same paper, evidence was provided that a natural greedy algorithm called the Largest Average Submatrix ($\mathcal{LAS}$) for a constant $k$ should produce a matrix with average entry at most $(1+o(1))\sqrt{2\log n/k}$, namely approximately $\sqrt{2}$ smaller than the global optimum, though no formal proof of this fact was provided.
In this paper, we show that the average entry of the matrix produced by the $\mathcal{LAS}$ algorithm is indeed $(1+o(1))\sqrt{2\log n/k}$ w.h.p. when $k$ is constant and $n$ grows. Then, by drawing an analogy with the problem of finding cliques in random graphs, we propose a simple greedy algorithm which produces a $k\times k$ matrix with asymptotically the same average value $(1+o(1))\sqrt{2\log n/k}$ w.h.p., for $k=o(\log n)$. Since the greedy algorithm is the best known algorithm for finding cliques in random graphs, it is tempting to believe that beating the factor $\sqrt{2}$ performance gap suffered by both algorithms might be very challenging. Surprisingly, we construct a very simple algorithm which produces a $k\times k$ matrix with average value $(1+o_{k}(1)+o(1))(4/3)\sqrt{2\log n/k}$ for $k=o((\log n)^{1.5})$, that is, with the asymptotic factor $4/3$ when $k$ grows.
To get an insight into the algorithmic hardness of this problem, and motivated by methods originating in the theory of spin glasses, we conduct the so-called expected overlap analysis of matrices with average value asymptotically $(1+o(1))\alpha\sqrt{2\log n/k}$ for a fixed value $\alpha\in[1,\sqrt{2}]$. The overlap corresponds to the number of common rows and the number of common columns for pairs of matrices achieving this value (see the paper for details). We discover numerically an intriguing phase transition at $\alpha^{*}\triangleq5\sqrt{2}/(3\sqrt{3})\approx1.3608\ldots\in[4/3,\sqrt{2}]$: when $\alpha<\alpha^{*}$ the space of overlaps is a continuous subset of $[0,1]^{2}$, whereas $\alpha=\alpha^{*}$ marks the onset of discontinuity, and as a result the model exhibits the Overlap Gap Property (OGP) when $\alpha>\alpha^{*}$, appropriately defined. We conjecture that the OGP observed for $\alpha>\alpha^{*}$ also marks the onset of the algorithmic hardness—no polynomial time algorithm exists for finding matrices with average value at least $(1+o(1))\alpha\sqrt{2\log n/k}$, when $\alpha>\alpha^{*}$ and $k$ is a mildly growing function of $n$.
</p>projecteuclid.org/euclid.aos/1536307224_20180907040116Fri, 07 Sep 2018 04:01 EDTSupport pointshttps://projecteuclid.org/euclid.aos/1536307226<strong>Simon Mak</strong>, <strong>V. Roshan Joseph</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2562--2592.</p><p><strong>Abstract:</strong><br/>
This paper introduces a new way to compact a continuous probability distribution $F$ into a set of representative points called support points. These points are obtained by minimizing the energy distance, a statistical potential measure initially proposed by Székely and Rizzo [ InterStat 5 (2004) 1–6] for testing goodness-of-fit. The energy distance has two appealing features. First, its distance-based structure allows us to exploit the duality between powers of the Euclidean distance and its Fourier transform for theoretical analysis. Using this duality, we show that support points converge in distribution to $F$, and enjoy an improved error rate to Monte Carlo for integrating a large class of functions. Second, the minimization of the energy distance can be formulated as a difference-of-convex program, which we manipulate using two algorithms to efficiently generate representative point sets. In simulation studies, support points provide improved integration performance to both Monte Carlo and a specific quasi-Monte Carlo method. Two important applications of support points are then highlighted: (a) as a way to quantify the propagation of uncertainty in expensive simulations and (b) as a method to optimally compact Markov chain Monte Carlo (MCMC) samples in Bayesian computation.
</p>projecteuclid.org/euclid.aos/1536307226_20180907040116Fri, 07 Sep 2018 04:01 EDTDebiasing the lasso: Optimal sample size for Gaussian designshttps://projecteuclid.org/euclid.aos/1536307227<strong>Adel Javanmard</strong>, <strong>Andrea Montanari</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2593--2622.</p><p><strong>Abstract:</strong><br/>
Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators.
Here, we consider linear regression in the high-dimensional regime $p>>n$ and the Lasso estimator: we would like to perform inference on the parameter vector $\theta^{*}\in\mathbb{R}^{p}$. Important progress has been achieved in computing confidence intervals and $p$-values for single coordinates $\theta^{*}_{i}$, $i\in\{1,\dots,p\}$. A key role in these new inferential methods is played by a certain debiased estimator $\widehat{\theta}^{\mathrm{d}}$. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of $\widehat{\theta}^{\mathrm{d}}$ are asymptotically Gaussian provided the true parameters vector $\theta^{*}$ is $s_{0}$-sparse with $s_{0}=o(\sqrt{n}/\log p)$.
The condition $s_{0}=o(\sqrt{n}/\log p)$ is considerably stronger than the one for consistent estimation, namely $s_{0}=o(n/\log p)$. In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_{0}=o(n/(\log p)^{2})$.
The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients $\theta^{*}$, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor $1+o_{n}(1)$ for i.i.d. Gaussian designs.
</p>projecteuclid.org/euclid.aos/1536307227_20180907040116Fri, 07 Sep 2018 04:01 EDTMargins of discrete Bayesian networkshttps://projecteuclid.org/euclid.aos/1536307228<strong>Robin J. Evans</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2623--2656.</p><p><strong>Abstract:</strong><br/>
Bayesian network models with latent variables are widely used in statistics and machine learning. In this paper, we provide a complete algebraic characterization of these models when the observed variables are discrete and no assumption is made about the state-space of the latent variables. We show that it is algebraically equivalent to the so-called nested Markov model, meaning that the two are the same up to inequality constraints on the joint probabilities. In particular, these two models have the same dimension, differing only by inequality constraints for which there is no general description. The nested Markov model is therefore the closest possible description of the latent variable model that avoids consideration of inequalities. A consequence of this is that the constraint finding algorithm of Tian and Pearl [In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (2002) 519–527] is complete for finding equality constraints.
Latent variable models suffer from difficulties of unidentifiable parameters and nonregular asymptotics; in contrast the nested Markov model is fully identifiable, represents a curved exponential family of known dimension, and can easily be fitted using an explicit parameterization.
</p>projecteuclid.org/euclid.aos/1536307228_20180907040116Fri, 07 Sep 2018 04:01 EDTMulti-threshold accelerated failure time modelhttps://projecteuclid.org/euclid.aos/1536307229<strong>Jialiang Li</strong>, <strong>Baisuo Jin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2657--2682.</p><p><strong>Abstract:</strong><br/>
A two-stage procedure for simultaneously detecting multiple thresholds and achieving model selection in the segmented accelerated failure time (AFT) model is developed in this paper. In the first stage, we formulate the threshold problem as a group model selection problem so that a concave 2-norm group selection method can be applied. In the second stage, the thresholds are finalized via a refining method. We establish the strong consistency of the threshold estimates and regression coefficient estimates under some mild technical conditions. The proposed procedure performs satisfactorily in our simulation studies. Its real world applicability is demonstrated via analyzing a follicular lymphoma data.
</p>projecteuclid.org/euclid.aos/1536307229_20180907040116Fri, 07 Sep 2018 04:01 EDTMeasuring and testing for interval quantile dependencehttps://projecteuclid.org/euclid.aos/1536307230<strong>Liping Zhu</strong>, <strong>Yaowu Zhang</strong>, <strong>Kai Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2683--2710.</p><p><strong>Abstract:</strong><br/>
In this article, we introduce the notion of interval quantile independence which generalizes the notions of statistical independence and quantile independence. We suggest an index to measure and test departure from interval quantile independence. The proposed index is invariant to monotone transformations, nonnegative and equals zero if and only if the interval quantile independence holds true. We suggest a moment estimate to implement the test. The resultant estimator is root-$n$-consistent if the index is positive and $n$-consistent otherwise, leading to a consistent test of interval quantile independence. The asymptotic distribution of the moment estimator is free of parent distribution, which facilitates to decide the critical values for tests of interval quantile independence. When our proposed index is used to perform feature screening for ultrahigh dimensional data, it has the desirable sure screening property.
</p>projecteuclid.org/euclid.aos/1536307230_20180907040116Fri, 07 Sep 2018 04:01 EDTBarycentric subspace analysis on manifoldshttps://projecteuclid.org/euclid.aos/1536307231<strong>Xavier Pennec</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2711--2746.</p><p><strong>Abstract:</strong><br/>
This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and general type of family of subspaces in manifolds that we call barycentric subspaces. They are implicitly defined as the locus of points which are weighted means of $k+1$ reference points. As this definition relies on points and not on tangent vectors, it can also be extended to geodesic spaces which are not Riemannian. For instance, in stratified spaces, it naturally allows principal subspaces that span several strata, which is impossible in previous generalizations of PCA. We show that barycentric subspaces locally define a submanifold of dimension $k$ which generalizes geodesic subspaces.
Second, we rephrase PCA in Euclidean spaces as an optimization on flags of linear subspaces (a hierarchy of properly embedded linear subspaces of increasing dimension). We show that the Euclidean PCA minimizes the Accumulated Unexplained Variances by all the subspaces of the flag (AUV). Barycentric subspaces are naturally nested, allowing the construction of hierarchically nested subspaces. Optimizing the AUV criterion to optimally approximate data points with flags of affine spans in Riemannian manifolds lead to a particularly appealing generalization of PCA on manifolds called Barycentric Subspace Analysis (BSA).
</p>projecteuclid.org/euclid.aos/1536307231_20180907040116Fri, 07 Sep 2018 04:01 EDTThe landscape of empirical risk for nonconvex losseshttps://projecteuclid.org/euclid.aos/1536307232<strong>Song Mei</strong>, <strong>Yu Bai</strong>, <strong>Andrea Montanari</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2747--2774.</p><p><strong>Abstract:</strong><br/>
Most high-dimensional estimation methods propose to minimize a cost function (empirical risk) that is a sum of losses associated to each data point (each example). In this paper, we focus on the case of nonconvex losses. Classical empirical process theory implies uniform convergence of the empirical (or sample) risk to the population risk. While under additional assumptions, uniform convergence implies consistency of the resulting M-estimator, it does not ensure that the latter can be computed efficiently.
In order to capture the complexity of computing M-estimators, we study the landscape of the empirical risk, namely its stationary points and their properties. We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors). Consequently, good properties of the population risk can be carried to the empirical risk, and we are able to establish one-to-one correspondence of their stationary points. We demonstrate that in several problems such as nonconvex binary classification, robust regression and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms.
We extend our analysis to the very high-dimensional setting in which the number of parameters exceeds the number of samples, and provides a characterization of the empirical risk landscape under a nearly information-theoretically minimal condition. Namely, if the number of samples exceeds the sparsity of the parameters vector (modulo logarithmic factors), then a suitable uniform convergence result holds. We apply this result to nonconvex binary classification and robust regression in very high-dimension.
</p>projecteuclid.org/euclid.aos/1536307232_20180907040116Fri, 07 Sep 2018 04:01 EDTDesigns with blocks of size two and applications to microarray experimentshttps://projecteuclid.org/euclid.aos/1536307233<strong>Janet Godolphin</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2775--2805.</p><p><strong>Abstract:</strong><br/>
Designs with blocks of size two have numerous applications. In experimental situations where observation loss is common, it is important for a design to be robust against breakdown. For designs with one treatment factor and a single blocking factor, with blocks of size two, conditions for connectivity and robustness are obtained using combinatorial arguments and results from graph theory. Lower bounds are given for the breakdown number in terms of design parameters. For designs with equal or near equal treatment replication, the concepts of treatment and block partitions, and of linking blocks, are used to obtain information on the number of blocks required to guarantee various levels of robustness. The results provide guidance for construction of designs with good robustness properties.
Robustness conditions are also established for row column designs in which one of the blocking factors involves blocks of size two. Such designs are particularly relevant for microarray experiments, where the high risk of observation loss makes robustness important. Disconnectivity in row column designs can be classified as three types. Techniques are given to assess design robustness according to each type, leading to lower bounds for the breakdown number. Guidance is given for robust design construction.
Cyclic designs and interwoven loop designs are shown to have good robustness properties.
</p>projecteuclid.org/euclid.aos/1536307233_20180907040116Fri, 07 Sep 2018 04:01 EDTLocal robust estimation of the Pickands dependence functionhttps://projecteuclid.org/euclid.aos/1536307234<strong>Mikael Escobar-Bach</strong>, <strong>Yuri Goegebeur</strong>, <strong>Armelle Guillou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2806--2843.</p><p><strong>Abstract:</strong><br/>
We consider the robust estimation of the Pickands dependence function in the random covariate framework. Our estimator is based on local estimation with the minimum density power divergence criterion. We provide the main asymptotic properties, in particular the convergence of the stochastic process, correctly normalized, towards a tight centered Gaussian process. The finite sample performance of our estimator is evaluated with a simulation study involving both uncontaminated and contaminated samples. The method is illustrated on a dataset of air pollution measurements.
</p>projecteuclid.org/euclid.aos/1536307234_20180907040116Fri, 07 Sep 2018 04:01 EDTStrong identifiability and optimal minimax rates for finite mixture estimationhttps://projecteuclid.org/euclid.aos/1536307235<strong>Philippe Heinrich</strong>, <strong>Jonas Kahn</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2844--2870.</p><p><strong>Abstract:</strong><br/>
We study the rates of estimation of finite mixing distributions, that is, the parameters of the mixture. We prove that under some regularity and strong identifiability conditions, around a given mixing distribution with $m_{0}$ components, the optimal local minimax rate of estimation of a mixing distribution with $m$ components is $n^{-1/(4(m-m_{0})+2)}$. This corrects a previous paper by Chen [ Ann. Statist. 23 (1995) 221–233].
By contrast, it turns out that there are estimators with a (nonuniform) pointwise rate of estimation of $n^{-1/2}$ for all mixing distributions with a finite number of components.
</p>projecteuclid.org/euclid.aos/1536307235_20180907040116Fri, 07 Sep 2018 04:01 EDTSub-Gaussian estimators of the mean of a random matrix with heavy-tailed entrieshttps://projecteuclid.org/euclid.aos/1536307236<strong>Stanislav Minsker</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2871--2903.</p><p><strong>Abstract:</strong><br/>
Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as principal component analysis. However, frequently used empirical covariance estimator, and its modifications, is very sensitive to the presence of outliers in the data. As P. Huber wrote [ Ann. Math. Stat. 35 (1964) 73–101], “…This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance….” Motivated by Tukey’s question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We explain the key ideas behind our construction, and discuss applications to covariance estimation and matrix completion problems.
</p>projecteuclid.org/euclid.aos/1536307236_20180907040116Fri, 07 Sep 2018 04:01 EDTCausal inference in partially linear structural equation modelshttps://projecteuclid.org/euclid.aos/1536307237<strong>Dominik Rothenhäusler</strong>, <strong>Jan Ernest</strong>, <strong>Peter Bühlmann</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2904--2938.</p><p><strong>Abstract:</strong><br/>
We consider identifiability of partially linear additive structural equation models with Gaussian noise (PLSEMs) and estimation of distributionally equivalent models to a given PLSEM. Thereby, we also include robustness results for errors in the neighborhood of Gaussian distributions. Existing identifiability results in the framework of additive SEMs with Gaussian noise are limited to linear and nonlinear SEMs, which can be considered as special cases of PLSEMs with vanishing nonparametric or parametric part, respectively. We close the wide gap between these two special cases by providing a comprehensive theory of the identifiability of PLSEMs by means of (A) a graphical, (B) a transformational, (C) a functional and (D) a causal ordering characterization of PLSEMs that generate a given distribution $\mathbb{P}$. In particular, the characterizations (C) and (D) answer the fundamental question to which extent nonlinear functions in additive SEMs with Gaussian noise restrict the set of potential causal models, and hence influence the identifiability.
On the basis of the transformational characterization (B) we provide a score-based estimation procedure that outputs the graphical representation (A) of the distribution equivalence class of a given PLSEM. We derive its (high-dimensional) consistency and demonstrate its performance on simulated datasets.
</p>projecteuclid.org/euclid.aos/1536307237_20180907040116Fri, 07 Sep 2018 04:01 EDTOn MSE-optimal crossover designshttps://projecteuclid.org/euclid.aos/1536307238<strong>Christoph Neumann</strong>, <strong>Joachim Kunert</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2939--2959.</p><p><strong>Abstract:</strong><br/>
In crossover designs, each subject receives a series of treatments one after the other. Most papers on optimal crossover designs consider an estimate which is corrected for carryover effects. We look at the estimate for direct effects of treatment, which is not corrected for carryover effects. If there are carryover effects, this estimate will be biased. We try to find a design that minimizes the mean square error, that is, the sum of the squared bias and the variance. It turns out that the designs which are optimal for the corrected estimate are highly efficient for the uncorrected estimate.
</p>projecteuclid.org/euclid.aos/1536307238_20180907040116Fri, 07 Sep 2018 04:01 EDTTesting for periodicity in functional time serieshttps://projecteuclid.org/euclid.aos/1536307239<strong>Siegfried Hörmann</strong>, <strong>Piotr Kokoszka</strong>, <strong>Gilles Nisol</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2960--2984.</p><p><strong>Abstract:</strong><br/>
We derive several tests for the presence of a periodic component in a time series of functions. We consider both the traditional setting in which the periodic functional signal is contaminated by functional white noise, and a more general setting of a weakly dependent contaminating process. Several forms of the periodic component are considered. Our tests are motivated by the likelihood principle and fall into two broad categories, which we term multivariate and fully functional. Generally, for the functional series that motivate this research, the fully functional tests exhibit a superior balance of size and power. Asymptotic null distributions of all tests are derived and their consistency is established. Their finite sample performance is examined and compared by numerical studies and application to pollution data.
</p>projecteuclid.org/euclid.aos/1536307239_20180907040116Fri, 07 Sep 2018 04:01 EDTLimiting behavior of eigenvalues in high-dimensional MANOVA via RMThttps://projecteuclid.org/euclid.aos/1536307240<strong>Zhidong Bai</strong>, <strong>Kwok Pui Choi</strong>, <strong>Yasunori Fujikoshi</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 2985--3013.</p><p><strong>Abstract:</strong><br/>
In this paper, we derive the asymptotic joint distributions of the eigenvalues under the null case and the local alternative cases in the MANOVA model and multiple discriminant analysis when both the dimension and the sample size are large. Our results are obtained by random matrix theory (RMT) without assuming normality in the populations. It is worth pointing out that the null and nonnull distributions of the eigenvalues and invariant test statistics are asymptotically robust against departure from normality in high-dimensional situations. Similar properties are pointed out for the null distributions of the invariant tests in multivariate regression model. Some new formulas in RMT are also presented.
</p>projecteuclid.org/euclid.aos/1536307240_20180907040116Fri, 07 Sep 2018 04:01 EDTTwo-sample Kolmogorov–Smirnov-type tests revisited: Old and new tests in terms of local levelshttps://projecteuclid.org/euclid.aos/1536307241<strong>Helmut Finner</strong>, <strong>Veronika Gontscharuk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3014--3037.</p><p><strong>Abstract:</strong><br/>
From a multiple testing viewpoint, Kolmogorov–Smirnov (KS)-type tests are union-intersection tests which can be redefined in terms of local levels. The local level perspective offers a new viewpoint on ranges of sensitivity of KS-type tests and the design of new tests. We study the finite and asymptotic local level behavior of weighted KS tests which are either tail, intermediate or central sensitive. Furthermore, we provide new tests with approximately equal local levels and prove that the asymptotics of such tests with sample sizes $m$ and $n$ coincides with the asymptotics of one-sample higher criticism tests with sample size $\min (m,n)$. We compare the overall power of various tests and introduce local powers that are in line with local levels. Finally, suitably parameterized local level shape functions can be used to design new tests. We illustrate how to combine tests with different sensitivity in terms of local levels.
</p>projecteuclid.org/euclid.aos/1536307241_20180907040116Fri, 07 Sep 2018 04:01 EDTRobust Gaussian stochastic process emulationhttps://projecteuclid.org/euclid.aos/1536307242<strong>Mengyang Gu</strong>, <strong>Xiaojing Wang</strong>, <strong>James O. Berger</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3038--3066.</p><p><strong>Abstract:</strong><br/>
We consider estimation of the parameters of a Gaussian Stochastic Process (GaSP), in the context of emulation (approximation) of computer models for which the outcomes are real-valued scalars. The main focus is on estimation of the GaSP parameters through various generalized maximum likelihood methods, mostly involving finding posterior modes; this is because full Bayesian analysis in computer model emulation is typically prohibitively expensive.
The posterior modes that are studied arise from objective priors, such as the reference prior. These priors have been studied in the literature for the situation of an isotropic covariance function or under the assumption of separability in the design of inputs for model runs used in the GaSP construction. In this paper, we consider more general designs (e.g., a Latin Hypercube Design) with a class of commonly used anisotropic correlation functions, which can be written as a product of isotropic correlation functions, each having an unknown range parameter and a fixed roughness parameter. We discuss properties of the objective priors and marginal likelihoods for the parameters of the GaSP and establish the posterior propriety of the GaSP parameters, but our main focus is to demonstrate that certain parameterizations result in more robust estimation of the GaSP parameters than others, and that some parameterizations that are in common use should clearly be avoided. These results are applicable to many frequently used covariance functions, for example, power exponential, Matérn, rational quadratic and spherical covariance. We also generalize the results to the GaSP model with a nugget parameter. Both theoretical and numerical evidence is presented concerning the performance of the studied procedures.
</p>projecteuclid.org/euclid.aos/1536307242_20180907040116Fri, 07 Sep 2018 04:01 EDTConvergence of contrastive divergence algorithm in exponential familyhttps://projecteuclid.org/euclid.aos/1536307243<strong>Bai Jiang</strong>, <strong>Tung-Yu Wu</strong>, <strong>Yifan Jin</strong>, <strong>Wing H. Wong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3067--3098.</p><p><strong>Abstract:</strong><br/>
The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is computationally-cheap but biased. Whether and why the CD algorithm provides an asymptotically consistent estimate are still open questions. This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model. Suppose the CD algorithm runs $m$ MCMC transition steps at each iteration $t$ and iteratively generates a sequence of parameter estimates $\{\theta_{t}\}_{t\ge 0}$ given an i.i.d. data sample $\{X_{i}\}_{i=1}^{n}\sim p_{\theta_{\star }}$. Under conditions which are commonly obeyed by the CD algorithm in practice, we prove the existence of some bounded $m$ such that any limit point of the time average $\sum_{s=0}^{t-1}\theta_{s}/t$ as $t\to\infty $ is a consistent estimate for the true parameter $\theta_{\star }$. Our proof is based on the fact that $\{\theta_{t}\}_{t\ge 0}$ is a homogenous Markov chain conditional on the data sample $\{X_{i}\}_{i=1}^{n}$. This chain meets the Foster–Lyapunov drift criterion and converges to a random walk around the maximum likelihood estimate. The range of the random walk shrinks to zero at rate $\mathcal{O}(1/\sqrt[3]{{n}})$ as the sample size $n\to \infty $.
</p>projecteuclid.org/euclid.aos/1536307243_20180907040116Fri, 07 Sep 2018 04:01 EDTOvercoming the limitations of phase transition by higher order analysis of regularization techniqueshttps://projecteuclid.org/euclid.aos/1536307244<strong>Haolei Weng</strong>, <strong>Arian Maleki</strong>, <strong>Le Zheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3099--3129.</p><p><strong>Abstract:</strong><br/>
We study the problem of estimating a sparse vector $\beta\in\mathbb{R}^{p}$ from the response variables $y=X\beta+w$, where $w\sim N(0,\sigma_{w}^{2}I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $\delta$, $p\rightarrow\infty$, while $n/p\rightarrow\delta$. We consider the popular class of $\ell_{q}$-regularized least squares (LQLS), a.k.a. bridge estimators, given by the optimization problem \begin{equation*}\hat{\beta}(\lambda,q)\in\arg\min_{\beta}\frac{1}{2}\|y-X\beta\|_{2}^{2}+\lambda\|\beta\|_{q}^{q},\end{equation*} and characterize the almost sure limit of $\frac{1}{p}\|\hat{\beta}(\lambda,q)-\beta\|_{2}^{2}$, and call it asymptotic mean square error (AMSE). The expression we derive for this limit does not have explicit forms, and hence is not useful in comparing LQLS for different values of $q$, or providing information in evaluating the effect of $\delta$ or sparsity level of $\beta$. To simplify the expression, researchers have considered the ideal “error-free” regime, that is, $w=0$, and have characterized the values of $\delta$ for which AMSE is zero. This is known as the phase transition analysis.
In this paper, we first perform the phase transition analysis of LQLS. Our results reveal some of the limitations and misleading features of the phase transition analysis. To overcome these limitations, we propose the small error analysis of LQLS. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also describes when phase transition analysis is reliable, and presents a more accurate comparison among different regularizers.
</p>projecteuclid.org/euclid.aos/1536307244_20180907040116Fri, 07 Sep 2018 04:01 EDTOptimal adaptive estimation of linear functionals under sparsityhttps://projecteuclid.org/euclid.aos/1536307245<strong>Olivier Collier</strong>, <strong>Laëtitia Comminges</strong>, <strong>Alexandre B. Tsybakov</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3130--3150.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimation of a linear functional in the Gaussian sequence model where the unknown vector $\theta \in\mathbb{R}^{d}$ belongs to a class of $s$-sparse vectors with unknown $s$. We suggest an adaptive estimator achieving a nonasymptotic rate of convergence that differs from the minimax rate at most by a logarithmic factor. We also show that this optimal adaptive rate cannot be improved when $s$ is unknown. Furthermore, we address the issue of simultaneous adaptation to $s$ and to the variance $\sigma^{2}$ of the noise. We suggest an estimator that achieves the optimal adaptive rate when both $s$ and $\sigma^{2}$ are unknown.
</p>projecteuclid.org/euclid.aos/1536307245_20180907040116Fri, 07 Sep 2018 04:01 EDTHigh-dimensional consistency in score-based and hybrid structure learninghttps://projecteuclid.org/euclid.aos/1536307246<strong>Preetam Nandy</strong>, <strong>Alain Hauser</strong>, <strong>Marloes H. Maathuis</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3151--3183.</p><p><strong>Abstract:</strong><br/>
Main approaches for learning Bayesian networks can be classified as constraint-based, score-based or hybrid methods. Although high-dimensional consistency results are available for constraint-based methods like the PC algorithm, such results have not been proved for score-based or hybrid methods, and most of the hybrid methods have not even shown to be consistent in the classical setting where the number of variables remains fixed and the sample size tends to infinity. In this paper, we show that consistency of hybrid methods based on greedy equivalence search (GES) can be achieved in the classical setting with adaptive restrictions on the search space that depend on the current state of the algorithm. Moreover, we prove consistency of GES and adaptively restricted GES (ARGES) in several sparse high-dimensional settings. ARGES scales well to sparse graphs with thousands of variables and our simulation study indicates that both GES and ARGES generally outperform the PC algorithm.
</p>projecteuclid.org/euclid.aos/1536307246_20180907040116Fri, 07 Sep 2018 04:01 EDTA new scope of penalized empirical likelihood with high-dimensional estimating equationshttps://projecteuclid.org/euclid.aos/1536631271<strong>Jinyuan Chang</strong>, <strong>Cheng Yong Tang</strong>, <strong>Tong Tong Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3185--3216.</p><p><strong>Abstract:</strong><br/>
Statistical methods with empirical likelihood (EL) are appealing and effective especially in conjunction with estimating equations for flexibly and adaptively incorporating data information. It is known that EL approaches encounter difficulties when dealing with high-dimensional problems. To overcome the challenges, we begin our study with investigating high-dimensional EL from a new scope targeting at high-dimensional sparse model parameters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameters. Motivated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multiplier in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, a drastic dimension reduction in the number of estimating equations can be achieved. Most attractively, such a reduction in dimensionality of estimating equations can be viewed as a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for estimating high-dimensional sparse model parameters. Allowing both the dimensionalities of model parameters and estimating equations growing exponentially with the sample size, our theory demonstrates that our new penalized EL estimator is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penalized EL works promisingly.
</p>projecteuclid.org/euclid.aos/1536631271_20180910220136Mon, 10 Sep 2018 22:01 EDTApproximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphshttps://projecteuclid.org/euclid.aos/1536631272<strong>Zhou Fan</strong>, <strong>Leying Guan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3217--3245.</p><p><strong>Abstract:</strong><br/>
We study recovery of piecewise-constant signals on graphs by the estimator minimizing an $l_{0}$-edge-penalized objective. Although exact minimization of this objective may be computationally intractable, we show that the same statistical risk guarantees are achieved by the $\alpha$-expansion algorithm which computes an approximate minimizer in polynomial time. We establish that for graphs with small average vertex degree, these guarantees are minimax rate-optimal over classes of edge-sparse signals. For spatially inhomogeneous graphs, we propose minimization of an edge-weighted objective where each edge is weighted by its effective resistance or another measure of its contribution to the graph’s connectivity. We establish minimax optimality of the resulting estimators over corresponding edge-weighted sparsity classes. We show theoretically that these risk guarantees are not always achieved by the estimator minimizing the $l_{1}$/total-variation relaxation, and empirically that the $l_{0}$-based estimates are more accurate in high signal-to-noise settings.
</p>projecteuclid.org/euclid.aos/1536631272_20180910220136Mon, 10 Sep 2018 22:01 EDTMulticlass classification, information, divergence and surrogate riskhttps://projecteuclid.org/euclid.aos/1536631273<strong>John Duchi</strong>, <strong>Khashayar Khosravi</strong>, <strong>Feng Ruan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3246--3275.</p><p><strong>Abstract:</strong><br/>
We provide a unifying view of statistical information measures, multiway Bayesian hypothesis testing, loss functions for multiclass classification problems and multidistribution $f$-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of $f$-divergences to multiple distributions, and we provide a constructive equivalence between divergences, statistical information (in the sense of DeGroot) and losses for multiclass classification. A major application of our results is in multiclass classification problems in which we must both infer a discriminant function $\gamma$—for making predictions on a label $Y$ from datum $X$—and a data representation (or, in the setting of a hypothesis testing problem, an experimental design), represented as a quantizer $\mathsf{q}$ from a family of possible quantizers $\mathsf{Q}$. In this setting, we characterize the equivalence between loss functions, meaning that optimizing either of two losses yields an optimal discriminant and quantizer $\mathsf{q}$, complementing and extending earlier results of Nguyen et al. [ Ann. Statist. 37 (2009) 876–904] to the multiclass case. Our results provide a more substantial basis than standard classification calibration results for comparing different losses: we describe the convex losses that are consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multiclass classification problems.
</p>projecteuclid.org/euclid.aos/1536631273_20180910220136Mon, 10 Sep 2018 22:01 EDTHalfspace depths for scatter, concentration and shape matriceshttps://projecteuclid.org/euclid.aos/1536631274<strong>Davy Paindaveine</strong>, <strong>Germain Van Bever</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3276--3307.</p><p><strong>Abstract:</strong><br/>
We propose halfspace depth concepts for scatter, concentration and shape matrices. For scatter matrices, our concept is similar to those from Chen, Gao and Ren [Robust covariance and scatter matrix estimation under Huber’s contamination model (2018)] and Zhang [ J. Multivariate Anal. 82 (2002) 134–165]. Rather than focusing, as in these earlier works, on deepest scatter matrices, we thoroughly investigate the properties of the proposed depth and of the corresponding depth regions. We do so under minimal assumptions and, in particular, we do not restrict to elliptical distributions nor to absolutely continuous distributions. Interestingly, fully understanding scatter halfspace depth requires considering different geometries/topologies on the space of scatter matrices. We also discuss, in the spirit of Zuo and Serfling [ Ann. Statist. 28 (2000) 461–482], the structural properties a scatter depth should satisfy, and investigate whether or not these are met by scatter halfspace depth. Companion concepts of depth for concentration matrices and shape matrices are also proposed and studied. We show the practical relevance of the depth concepts considered in a real-data example from finance.
</p>projecteuclid.org/euclid.aos/1536631274_20180910220136Mon, 10 Sep 2018 22:01 EDTMultilayer tensor factorization with applications to recommender systemshttps://projecteuclid.org/euclid.aos/1536631275<strong>Xuan Bi</strong>, <strong>Annie Qu</strong>, <strong>Xiaotong Shen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3308--3333.</p><p><strong>Abstract:</strong><br/>
Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate between-subjects dependency. One major advantage is that the proposed method is able to address the “cold-start” issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through sub-group information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwise-coordinate-descent algorithm. In theory, we investigate algorithmic properties for convergence from an arbitrary initial point and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature.
</p>projecteuclid.org/euclid.aos/1536631275_20180910220136Mon, 10 Sep 2018 22:01 EDTPrincipal component analysis for functional data on Riemannian manifolds and sphereshttps://projecteuclid.org/euclid.aos/1536631276<strong>Xiongtao Dai</strong>, <strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3334--3361.</p><p><strong>Abstract:</strong><br/>
Functional data analysis on nonlinear manifolds has drawn recent interest. Sphere-valued functional data, which are encountered, for example, as movement trajectories on the surface of the earth are an important special case. We consider an intrinsic principal component analysis for smooth Riemannian manifold-valued functional data and study its asymptotic properties. Riemannian functional principal component analysis (RFPCA) is carried out by first mapping the manifold-valued data through Riemannian logarithm maps to tangent spaces around the Fréchet mean function, and then performing a classical functional principal component analysis (FPCA) on the linear tangent spaces. Representations of the Riemannian manifold-valued functions and the eigenfunctions on the original manifold are then obtained with exponential maps. The tangent-space approximation yields upper bounds to residual variances if the Riemannian manifold has nonnegative curvature. We derive a central limit theorem for the mean function, as well as root-$n$ uniform convergence rates for other model components. Our applications include a novel framework for the analysis of longitudinal compositional data, achieved by mapping longitudinal compositional data to trajectories on the sphere, illustrated with longitudinal fruit fly behavior patterns. RFPCA is shown to outperform an unrestricted FPCA in terms of trajectory recovery and prediction in applications and simulations.
</p>projecteuclid.org/euclid.aos/1536631276_20180910220136Mon, 10 Sep 2018 22:01 EDTAssessing robustness of classification using an angular breakdown pointhttps://projecteuclid.org/euclid.aos/1536631277<strong>Junlong Zhao</strong>, <strong>Guan Yu</strong>, <strong>Yufeng Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3362--3389.</p><p><strong>Abstract:</strong><br/>
Robustness is a desirable property for many statistical techniques. As an important measure of robustness, the breakdown point has been widely used for regression problems and many other settings. Despite the existing development, we observe that the standard breakdown point criterion is not directly applicable for many classification problems. In this paper, we propose a new breakdown point criterion, namely angular breakdown point, to better quantify the robustness of different classification methods. Using this new breakdown point criterion, we study the robustness of binary large margin classification techniques, although the idea is applicable to general classification methods. Both bounded and unbounded loss functions with linear and kernel learning are considered. These studies provide useful insights on the robustness of different classification methods. Numerical results further confirm our theoretical findings.
</p>projecteuclid.org/euclid.aos/1536631277_20180910220136Mon, 10 Sep 2018 22:01 EDTTail-greedy bottom-up data decompositions and fast multiple change-point detectionhttps://projecteuclid.org/euclid.aos/1536631278<strong>Piotr Fryzlewicz</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3390--3421.</p><p><strong>Abstract:</strong><br/>
This article proposes a “tail-greedy”, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen unbalanced Haar wavelet basis. The “tail-greediness” of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple change-points in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN.
</p>projecteuclid.org/euclid.aos/1536631278_20180910220136Mon, 10 Sep 2018 22:01 EDTROCKET: Ro bust c onfidence intervals via Ke ndall’s t au for transelliptical graphical modelshttps://projecteuclid.org/euclid.aos/1536631279<strong>Rina Foygel Barber</strong>, <strong>Mladen Kolar</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3422--3450.</p><p><strong>Abstract:</strong><br/>
Understanding complex relationships between random variables is of fundamental importance in high-dimensional statistics, with numerous applications in biological and social sciences. Undirected graphical models are often used to represent dependencies between random variables, where an edge between two random variables is drawn if they are conditionally dependent given all the other measured variables. A large body of literature exists on methods that estimate the structure of an undirected graphical model, however, little is known about the distributional properties of the estimators beyond the Gaussian setting. In this paper, we focus on inference for edge parameters in a high-dimensional transelliptical model, which generalizes Gaussian and nonparanormal graphical models. We propose ROCKET, a novel procedure for estimating parameters in the latent inverse covariance matrix. We establish asymptotic normality of ROCKET in an ultra high-dimensional setting under mild assumptions, without relying on oracle model selection results. ROCKET requires the same number of samples that are known to be necessary for obtaining a $\sqrt{n}$ consistent estimator of an element in the precision matrix under a Gaussian model. Hence, it is an optimal estimator under a much larger family of distributions. The result hinges on a tight control of the sparse spectral norm of the nonparametric Kendall’s tau estimator of the correlation matrix, which is of independent interest. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory.
</p>projecteuclid.org/euclid.aos/1536631279_20180910220136Mon, 10 Sep 2018 22:01 EDTAdaptive invariant density estimation for ergodic diffusions over anisotropic classeshttps://projecteuclid.org/euclid.aos/1536631280<strong>Claudia Strauch</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3451--3480.</p><p><strong>Abstract:</strong><br/>
Consider some multivariate diffusion process $\mathbf{X}=(X_{t})_{t\geq0}$ with unique invariant probability measure and associated invariant density $\rho$, and assume that a continuous record of observations $X^{T}=(X_{t})_{0\leq t\leq T}$ of $\mathbf{X}$ is available. Recent results on functional inequalities for symmetric Markov semigroups are used in the statistical analysis of kernel estimators $\widehat{\rho}_{T}=\widehat{\rho}_{T}(X^{T})$ of $\rho$. For the basic problem of estimation with respect to $\mathrm{sup}$-norm risk under anisotropic Hölder smoothness constraints, the proposed approach yields an adaptive estimator which converges at a substantially faster rate than in standard multivariate density estimation from i.i.d. observations.
</p>projecteuclid.org/euclid.aos/1536631280_20180910220136Mon, 10 Sep 2018 22:01 EDTRobust low-rank matrix estimationhttps://projecteuclid.org/euclid.aos/1536631281<strong>Andreas Elsener</strong>, <strong>Sara van de Geer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3481--3509.</p><p><strong>Abstract:</strong><br/>
Many results have been proved for various nuclear norm penalized estimators of the uniform sampling matrix completion problem. However, most of these estimators are not robust: in most of the cases the quadratic loss function and its modifications are used. We consider robust nuclear norm penalized estimators using two well-known robust loss functions: the absolute value loss and the Huber loss. Under several conditions on the sparsity of the problem (i.e., the rank of the parameter matrix) and on the regularity of the risk function sharp and nonsharp oracle inequalities for these estimators are shown to hold with high probability. As a consequence, the asymptotic behavior of the estimators is derived. Similar error bounds are obtained under the assumption of weak sparsity, that is, the case where the matrix is assumed to be only approximately low-rank. In all of our results, we consider a high-dimensional setting. In this case, this means that we assume $n\leq pq$. Finally, various simulations confirm our theoretical results.
</p>projecteuclid.org/euclid.aos/1536631281_20180910220136Mon, 10 Sep 2018 22:01 EDTSieve bootstrap for functional time serieshttps://projecteuclid.org/euclid.aos/1536631282<strong>Efstathios Paparoditis</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3510--3538.</p><p><strong>Abstract:</strong><br/>
A bootstrap procedure for functional time series is proposed which exploits a general vector autoregressive representation of the time series of Fourier coefficients appearing in the Karhunen–Loève expansion of the functional process. A double sieve-type bootstrap method is developed, which avoids the estimation of process operators and generates functional pseudo-time series that appropriately mimics the dependence structure of the functional time series at hand. The method uses a finite set of functional principal components to capture the essential driving parts of the infinite dimensional process and a finite order vector autoregressive process to imitate the temporal dependence structure of the corresponding vector time series of Fourier coefficients. By allowing the number of functional principal components as well as the autoregressive order used to increase to infinity (at some appropriate rate) as the sample size increases, consistency of the functional sieve bootstrap can be established. We demonstrate this by proving a basic bootstrap central limit theorem for functional finite Fourier transforms and by establishing bootstrap validity in the context of a fully functional testing problem. A novel procedure to select the number of functional principal components is introduced while simulations illustrate the good finite sample performance of the new bootstrap method proposed.
</p>projecteuclid.org/euclid.aos/1536631282_20180910220136Mon, 10 Sep 2018 22:01 EDTRestricted strong convexity implies weak submodularityhttps://projecteuclid.org/euclid.aos/1536631283<strong>Ethan R. Elenberg</strong>, <strong>Rajiv Khanna</strong>, <strong>Alexandros G. Dimakis</strong>, <strong>Sahand Negahban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3539--3568.</p><p><strong>Abstract:</strong><br/>
We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe [In ICML (2011) 1057–1064] from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.
</p>projecteuclid.org/euclid.aos/1536631283_20180910220136Mon, 10 Sep 2018 22:01 EDTMultiscale scanning in inverse problemshttps://projecteuclid.org/euclid.aos/1536631284<strong>Katharina Proksch</strong>, <strong>Frank Werner</strong>, <strong>Axel Munk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3569--3602.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a multiscale scanning method to determine active components of a quantity $f$ w.r.t. a dictionary $\mathcal{U}$ from observations $Y$ in an inverse regression model $Y=Tf+\xi$ with linear operator $T$ and general random error $\xi$. To this end, we provide uniform confidence statements for the coefficients $\langle\varphi,f\rangle$, $\varphi\in\mathcal{U}$, under the assumption that $(T^{*})^{-1}(\mathcal{U})$ is of wavelet-type. Based on this, we obtain a multiple test that allows to identify the active components of $\mathcal{U}$, that is, $\langle f,\varphi\rangle\neq0$, $\varphi\in\mathcal{U}$, at controlled, family-wise error rate. Our results rely on a Gaussian approximation of the underlying multiscale statistic with a novel scale penalty adapted to the ill-posedness of the problem. The scale penalty furthermore ensures convergence of the statistic’s distribution towards a Gumbel limit under reasonable assumptions. The important special cases of tomography and deconvolution are discussed in detail. Further, the regression case, when $T=\text{id}$ and the dictionary consists of moving windows of various sizes (scales), is included, generalizing previous results for this setting. We show that our method obeys an oracle optimality, that is, it attains the same asymptotic power as a single-scale testing procedure at the correct scale. Simulations support our theory and we illustrate the potential of the method as an inferential tool for imaging. As a particular application, we discuss super-resolution microscopy and analyze experimental STED data to locate single DNA origami.
</p>projecteuclid.org/euclid.aos/1536631284_20180910220136Mon, 10 Sep 2018 22:01 EDTSlope meets Lasso: Improved oracle bounds and optimalityhttps://projecteuclid.org/euclid.aos/1536631285<strong>Pierre C. Bellec</strong>, <strong>Guillaume Lecué</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3603--3642.</p><p><strong>Abstract:</strong><br/>
We show that two polynomial time methods, a Lasso estimator with adaptively chosen tuning parameter and a Slope estimator, adaptively achieve the minimax prediction and $\ell_{2}$ estimation rate $(s/n)\log(p/s)$ in high-dimensional linear regression on the class of $s$-sparse vectors in $\mathbb{R}^{p}$. This is done under the Restricted Eigenvalue (RE) condition for the Lasso and under a slightly more constraining assumption on the design for the Slope. The main results have the form of sharp oracle inequalities accounting for the model misspecification error. The minimax optimal bounds are also obtained for the $\ell_{q}$ estimation errors with $1\le q\le2$ when the model is well specified. The results are nonasymptotic, and hold both in probability and in expectation. The assumptions that we impose on the design are satisfied with high probability for a large class of random matrices with independent and possibly anisotropically distributed rows. We give a comparative analysis of conditions, under which oracle bounds for the Lasso and Slope estimators can be obtained. In particular, we show that several known conditions, such as the RE condition and the sparse eigenvalue condition are equivalent if the $\ell_{2}$-norms of regressors are uniformly bounded.
</p>projecteuclid.org/euclid.aos/1536631285_20180910220136Mon, 10 Sep 2018 22:01 EDTUniformly valid post-regularization confidence regions for many functional parameters in z-estimation frameworkhttps://projecteuclid.org/euclid.aos/1536631286<strong>Alexandre Belloni</strong>, <strong>Victor Chernozhukov</strong>, <strong>Denis Chetverikov</strong>, <strong>Ying Wei</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3643--3675.</p><p><strong>Abstract:</strong><br/>
In this paper, we develop procedures to construct simultaneous confidence bands for ${\tilde{p}}$ potentially infinite-dimensional parameters after model selection for general moment condition models where ${\tilde{p}}$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the ${\tilde{p}}$ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for ${{\tilde{p}}\gg n}$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.
</p>projecteuclid.org/euclid.aos/1536631286_20180910220136Mon, 10 Sep 2018 22:01 EDTLocal asymptotic equivalence of pure states ensembles and quantum Gaussian white noisehttps://projecteuclid.org/euclid.aos/1536631287<strong>Cristina Butucea</strong>, <strong>Mădălin Guţă</strong>, <strong>Michael Nussbaum</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3676--3706.</p><p><strong>Abstract:</strong><br/>
Quantum technology is increasingly relying on specialised statistical inference methods for analysing quantum measurement data. This motivates the development of “quantum statistics”, a field that is shaping up at the overlap of quantum physics and “classical” statistics. One of the less investigated topics to date is that of statistical inference for infinite dimensional quantum systems, which can be seen as quantum counterpart of nonparametric statistics. In this paper, we analyse the asymptotic theory of quantum statistical models consisting of ensembles of quantum systems which are identically prepared in a pure state. In the limit of large ensembles, we establish the local asymptotic equivalence (LAE) of this i.i.d. model to a quantum Gaussian white noise model. We use the LAE result in order to establish minimax rates for the estimation of pure states belonging to Hermite–Sobolev classes of wave functions. Moreover, for quadratic functional estimation of the same states we note an elbow effect in the rates, whereas for testing a pure state a sharp parametric rate is attained over the nonparametric Hermite–Sobolev class.
</p>projecteuclid.org/euclid.aos/1536631287_20180910220136Mon, 10 Sep 2018 22:01 EDTExtremal quantile treatment effectshttps://projecteuclid.org/euclid.aos/1536631288<strong>Yichong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3707--3740.</p><p><strong>Abstract:</strong><br/>
This paper establishes an asymptotic theory and inference method for quantile treatment effect estimators when the quantile index is close to or equal to zero. Such quantile treatment effects are of interest in many applications, such as the effect of maternal smoking on an infant’s adverse birth outcomes. When the quantile index is close to zero, the sparsity of data jeopardizes conventional asymptotic theory and bootstrap inference. When the quantile index is zero, there are no existing inference methods directly applicable in the treatment effect context. This paper addresses both of these issues by proposing new inference methods that are shown to be asymptotically valid as well as having adequate finite sample properties.
</p>projecteuclid.org/euclid.aos/1536631288_20180910220136Mon, 10 Sep 2018 22:01 EDTOptimal maximin $L_{1}$-distance Latin hypercube designs based on good lattice point designshttps://projecteuclid.org/euclid.aos/1536631289<strong>Lin Wang</strong>, <strong>Qian Xiao</strong>, <strong>Hongquan Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3741--3766.</p><p><strong>Abstract:</strong><br/>
Maximin distance Latin hypercube designs are commonly used for computer experiments, but the construction of such designs is challenging. We construct a series of maximin Latin hypercube designs via Williams transformations of good lattice point designs. Some constructed designs are optimal under the maximin $L_{1}$-distance criterion, while others are asymptotically optimal. Moreover, these designs are also shown to have small pairwise correlations between columns.
</p>projecteuclid.org/euclid.aos/1536631289_20180910220136Mon, 10 Sep 2018 22:01 EDTRho-estimators revisited: General theory and applicationshttps://projecteuclid.org/euclid.aos/1536631290<strong>Yannick Baraud</strong>, <strong>Lucien Birgé</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3767--3804.</p><p><strong>Abstract:</strong><br/>
Following Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517], we pursue our attempt to design a robust universal estimator of the joint distribution of $n$ independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution $\mathbf{P}$ and a dominated model $\mathscr{Q}$ for $\mathbf{P}$, we build an estimator $\widehat{\mathbf{P}}$ based on $\mathscr{Q}$ (a $\rho$-estimator) and measure its risk by an Hellinger-type distance. When $\mathbf{P}$ does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of $\mathbf{P}$. In most situations, this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When $\mathbf{P}$ does not belong to the model, its risk involves an additional bias term proportional to the distance between $\mathbf{P}$ and $\mathscr{Q}$, whatever the true distribution $\mathbf{P}$. From this point of view, this new version of $\rho$-estimators improves upon the previous one described in Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517] which required that $\mathbf{P}$ be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the maximum likelihood estimator to be a $\rho$-estimator. Finally, we consider the situation where the statistician has at her or his disposal many different models and we build a penalized version of the $\rho$-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows one to estimate the regression function but also the distribution of the errors.
</p>projecteuclid.org/euclid.aos/1536631290_20180910220136Mon, 10 Sep 2018 22:01 EDTThink globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embeddinghttps://projecteuclid.org/euclid.aos/1536631291<strong>Hau-Tieng Wu</strong>, <strong>Nan Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3805--3837.</p><p><strong>Abstract:</strong><br/>
Since its introduction in 2000, Locally Linear Embedding (LLE) has been widely applied in data science. We provide an asymptotical analysis of LLE under the manifold setup. We show that for a general manifold, asymptotically we may not obtain the Laplace–Beltrami operator, and the result may depend on nonuniform sampling unless a correct regularization is chosen. We also derive the corresponding kernel function, which indicates that LLE is not a Markov process. A comparison with other commonly applied nonlinear algorithms, particularly a diffusion map, is provided and its relationship with locally linear regression is also discussed.
</p>projecteuclid.org/euclid.aos/1536631291_20180910220136Mon, 10 Sep 2018 22:01 EDTNonparametric covariate-adjusted response-adaptive design based on a functional urn modelhttps://projecteuclid.org/euclid.aos/1536631292<strong>Giacomo Aletti</strong>, <strong>Andrea Ghiglietti</strong>, <strong>William F. Rosenberger</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3838--3866.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a general class of covariate-adjusted response-adaptive (CARA) designs based on a new functional urn model. We prove strong consistency concerning the functional urn proportion and the proportion of subjects assigned to the treatment groups, in the whole study and for each covariate profile, allowing the distribution of the responses conditioned on covariates to be estimated nonparametrically. In addition, we establish joint central limit theorems for the above quantities and the sufficient statistics of features of interest, which allow to construct procedures to make inference on the conditional response distributions. These results are then applied to typical situations concerning Gaussian and binary responses.
</p>projecteuclid.org/euclid.aos/1536631292_20180910220136Mon, 10 Sep 2018 22:01 EDT