The Annals of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.aos
The latest articles from The Annals of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTTue, 07 Jun 2011 09:09 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
http://projecteuclid.org/euclid.aos/1278861454
<strong>James G. Scott</strong>, <strong>James O. Berger</strong><p><strong>Source: </strong>Ann. Statist., Volume 38, Number 5, 2587--2619.</p><p><strong>Abstract:</strong><br/>
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
</p>projecteuclid.org/euclid.aos/1278861454_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTConvergence of contrastive divergence algorithm in exponential familyhttps://projecteuclid.org/euclid.aos/1536307243<strong>Bai Jiang</strong>, <strong>Tung-Yu Wu</strong>, <strong>Yifan Jin</strong>, <strong>Wing H. Wong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3067--3098.</p><p><strong>Abstract:</strong><br/>
The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is computationally-cheap but biased. Whether and why the CD algorithm provides an asymptotically consistent estimate are still open questions. This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model. Suppose the CD algorithm runs $m$ MCMC transition steps at each iteration $t$ and iteratively generates a sequence of parameter estimates $\{\theta_{t}\}_{t\ge 0}$ given an i.i.d. data sample $\{X_{i}\}_{i=1}^{n}\sim p_{\theta_{\star }}$. Under conditions which are commonly obeyed by the CD algorithm in practice, we prove the existence of some bounded $m$ such that any limit point of the time average $\sum_{s=0}^{t-1}\theta_{s}/t$ as $t\to\infty $ is a consistent estimate for the true parameter $\theta_{\star }$. Our proof is based on the fact that $\{\theta_{t}\}_{t\ge 0}$ is a homogenous Markov chain conditional on the data sample $\{X_{i}\}_{i=1}^{n}$. This chain meets the Foster–Lyapunov drift criterion and converges to a random walk around the maximum likelihood estimate. The range of the random walk shrinks to zero at rate $\mathcal{O}(1/\sqrt[3]{{n}})$ as the sample size $n\to \infty $.
</p>projecteuclid.org/euclid.aos/1536307243_20180907040116Fri, 07 Sep 2018 04:01 EDTOvercoming the limitations of phase transition by higher order analysis of regularization techniqueshttps://projecteuclid.org/euclid.aos/1536307244<strong>Haolei Weng</strong>, <strong>Arian Maleki</strong>, <strong>Le Zheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3099--3129.</p><p><strong>Abstract:</strong><br/>
We study the problem of estimating a sparse vector $\beta\in\mathbb{R}^{p}$ from the response variables $y=X\beta+w$, where $w\sim N(0,\sigma_{w}^{2}I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $\delta$, $p\rightarrow\infty$, while $n/p\rightarrow\delta$. We consider the popular class of $\ell_{q}$-regularized least squares (LQLS), a.k.a. bridge estimators, given by the optimization problem \begin{equation*}\hat{\beta}(\lambda,q)\in\arg\min_{\beta}\frac{1}{2}\|y-X\beta\|_{2}^{2}+\lambda\|\beta\|_{q}^{q},\end{equation*} and characterize the almost sure limit of $\frac{1}{p}\|\hat{\beta}(\lambda,q)-\beta\|_{2}^{2}$, and call it asymptotic mean square error (AMSE). The expression we derive for this limit does not have explicit forms, and hence is not useful in comparing LQLS for different values of $q$, or providing information in evaluating the effect of $\delta$ or sparsity level of $\beta$. To simplify the expression, researchers have considered the ideal “error-free” regime, that is, $w=0$, and have characterized the values of $\delta$ for which AMSE is zero. This is known as the phase transition analysis.
In this paper, we first perform the phase transition analysis of LQLS. Our results reveal some of the limitations and misleading features of the phase transition analysis. To overcome these limitations, we propose the small error analysis of LQLS. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also describes when phase transition analysis is reliable, and presents a more accurate comparison among different regularizers.
</p>projecteuclid.org/euclid.aos/1536307244_20180907040116Fri, 07 Sep 2018 04:01 EDTOptimal adaptive estimation of linear functionals under sparsityhttps://projecteuclid.org/euclid.aos/1536307245<strong>Olivier Collier</strong>, <strong>Laëtitia Comminges</strong>, <strong>Alexandre B. Tsybakov</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3130--3150.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimation of a linear functional in the Gaussian sequence model where the unknown vector $\theta \in\mathbb{R}^{d}$ belongs to a class of $s$-sparse vectors with unknown $s$. We suggest an adaptive estimator achieving a nonasymptotic rate of convergence that differs from the minimax rate at most by a logarithmic factor. We also show that this optimal adaptive rate cannot be improved when $s$ is unknown. Furthermore, we address the issue of simultaneous adaptation to $s$ and to the variance $\sigma^{2}$ of the noise. We suggest an estimator that achieves the optimal adaptive rate when both $s$ and $\sigma^{2}$ are unknown.
</p>projecteuclid.org/euclid.aos/1536307245_20180907040116Fri, 07 Sep 2018 04:01 EDTHigh-dimensional consistency in score-based and hybrid structure learninghttps://projecteuclid.org/euclid.aos/1536307246<strong>Preetam Nandy</strong>, <strong>Alain Hauser</strong>, <strong>Marloes H. Maathuis</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6A, 3151--3183.</p><p><strong>Abstract:</strong><br/>
Main approaches for learning Bayesian networks can be classified as constraint-based, score-based or hybrid methods. Although high-dimensional consistency results are available for constraint-based methods like the PC algorithm, such results have not been proved for score-based or hybrid methods, and most of the hybrid methods have not even shown to be consistent in the classical setting where the number of variables remains fixed and the sample size tends to infinity. In this paper, we show that consistency of hybrid methods based on greedy equivalence search (GES) can be achieved in the classical setting with adaptive restrictions on the search space that depend on the current state of the algorithm. Moreover, we prove consistency of GES and adaptively restricted GES (ARGES) in several sparse high-dimensional settings. ARGES scales well to sparse graphs with thousands of variables and our simulation study indicates that both GES and ARGES generally outperform the PC algorithm.
</p>projecteuclid.org/euclid.aos/1536307246_20180907040116Fri, 07 Sep 2018 04:01 EDTA new scope of penalized empirical likelihood with high-dimensional estimating equationshttps://projecteuclid.org/euclid.aos/1536631271<strong>Jinyuan Chang</strong>, <strong>Cheng Yong Tang</strong>, <strong>Tong Tong Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3185--3216.</p><p><strong>Abstract:</strong><br/>
Statistical methods with empirical likelihood (EL) are appealing and effective especially in conjunction with estimating equations for flexibly and adaptively incorporating data information. It is known that EL approaches encounter difficulties when dealing with high-dimensional problems. To overcome the challenges, we begin our study with investigating high-dimensional EL from a new scope targeting at high-dimensional sparse model parameters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameters. Motivated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multiplier in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, a drastic dimension reduction in the number of estimating equations can be achieved. Most attractively, such a reduction in dimensionality of estimating equations can be viewed as a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for estimating high-dimensional sparse model parameters. Allowing both the dimensionalities of model parameters and estimating equations growing exponentially with the sample size, our theory demonstrates that our new penalized EL estimator is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penalized EL works promisingly.
</p>projecteuclid.org/euclid.aos/1536631271_20180910220136Mon, 10 Sep 2018 22:01 EDTApproximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphshttps://projecteuclid.org/euclid.aos/1536631272<strong>Zhou Fan</strong>, <strong>Leying Guan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3217--3245.</p><p><strong>Abstract:</strong><br/>
We study recovery of piecewise-constant signals on graphs by the estimator minimizing an $l_{0}$-edge-penalized objective. Although exact minimization of this objective may be computationally intractable, we show that the same statistical risk guarantees are achieved by the $\alpha$-expansion algorithm which computes an approximate minimizer in polynomial time. We establish that for graphs with small average vertex degree, these guarantees are minimax rate-optimal over classes of edge-sparse signals. For spatially inhomogeneous graphs, we propose minimization of an edge-weighted objective where each edge is weighted by its effective resistance or another measure of its contribution to the graph’s connectivity. We establish minimax optimality of the resulting estimators over corresponding edge-weighted sparsity classes. We show theoretically that these risk guarantees are not always achieved by the estimator minimizing the $l_{1}$/total-variation relaxation, and empirically that the $l_{0}$-based estimates are more accurate in high signal-to-noise settings.
</p>projecteuclid.org/euclid.aos/1536631272_20180910220136Mon, 10 Sep 2018 22:01 EDTMulticlass classification, information, divergence and surrogate riskhttps://projecteuclid.org/euclid.aos/1536631273<strong>John Duchi</strong>, <strong>Khashayar Khosravi</strong>, <strong>Feng Ruan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3246--3275.</p><p><strong>Abstract:</strong><br/>
We provide a unifying view of statistical information measures, multiway Bayesian hypothesis testing, loss functions for multiclass classification problems and multidistribution $f$-divergences, elaborating equivalence results between all of these objects, and extending existing results for binary outcome spaces to more general ones. We consider a generalization of $f$-divergences to multiple distributions, and we provide a constructive equivalence between divergences, statistical information (in the sense of DeGroot) and losses for multiclass classification. A major application of our results is in multiclass classification problems in which we must both infer a discriminant function $\gamma$—for making predictions on a label $Y$ from datum $X$—and a data representation (or, in the setting of a hypothesis testing problem, an experimental design), represented as a quantizer $\mathsf{q}$ from a family of possible quantizers $\mathsf{Q}$. In this setting, we characterize the equivalence between loss functions, meaning that optimizing either of two losses yields an optimal discriminant and quantizer $\mathsf{q}$, complementing and extending earlier results of Nguyen et al. [ Ann. Statist. 37 (2009) 876–904] to the multiclass case. Our results provide a more substantial basis than standard classification calibration results for comparing different losses: we describe the convex losses that are consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multiclass classification problems.
</p>projecteuclid.org/euclid.aos/1536631273_20180910220136Mon, 10 Sep 2018 22:01 EDTHalfspace depths for scatter, concentration and shape matriceshttps://projecteuclid.org/euclid.aos/1536631274<strong>Davy Paindaveine</strong>, <strong>Germain Van Bever</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3276--3307.</p><p><strong>Abstract:</strong><br/>
We propose halfspace depth concepts for scatter, concentration and shape matrices. For scatter matrices, our concept is similar to those from Chen, Gao and Ren [Robust covariance and scatter matrix estimation under Huber’s contamination model (2018)] and Zhang [ J. Multivariate Anal. 82 (2002) 134–165]. Rather than focusing, as in these earlier works, on deepest scatter matrices, we thoroughly investigate the properties of the proposed depth and of the corresponding depth regions. We do so under minimal assumptions and, in particular, we do not restrict to elliptical distributions nor to absolutely continuous distributions. Interestingly, fully understanding scatter halfspace depth requires considering different geometries/topologies on the space of scatter matrices. We also discuss, in the spirit of Zuo and Serfling [ Ann. Statist. 28 (2000) 461–482], the structural properties a scatter depth should satisfy, and investigate whether or not these are met by scatter halfspace depth. Companion concepts of depth for concentration matrices and shape matrices are also proposed and studied. We show the practical relevance of the depth concepts considered in a real-data example from finance.
</p>projecteuclid.org/euclid.aos/1536631274_20180910220136Mon, 10 Sep 2018 22:01 EDTMultilayer tensor factorization with applications to recommender systemshttps://projecteuclid.org/euclid.aos/1536631275<strong>Xuan Bi</strong>, <strong>Annie Qu</strong>, <strong>Xiaotong Shen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3308--3333.</p><p><strong>Abstract:</strong><br/>
Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate between-subjects dependency. One major advantage is that the proposed method is able to address the “cold-start” issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through sub-group information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwise-coordinate-descent algorithm. In theory, we investigate algorithmic properties for convergence from an arbitrary initial point and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature.
</p>projecteuclid.org/euclid.aos/1536631275_20180910220136Mon, 10 Sep 2018 22:01 EDTPrincipal component analysis for functional data on Riemannian manifolds and sphereshttps://projecteuclid.org/euclid.aos/1536631276<strong>Xiongtao Dai</strong>, <strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3334--3361.</p><p><strong>Abstract:</strong><br/>
Functional data analysis on nonlinear manifolds has drawn recent interest. Sphere-valued functional data, which are encountered, for example, as movement trajectories on the surface of the earth are an important special case. We consider an intrinsic principal component analysis for smooth Riemannian manifold-valued functional data and study its asymptotic properties. Riemannian functional principal component analysis (RFPCA) is carried out by first mapping the manifold-valued data through Riemannian logarithm maps to tangent spaces around the Fréchet mean function, and then performing a classical functional principal component analysis (FPCA) on the linear tangent spaces. Representations of the Riemannian manifold-valued functions and the eigenfunctions on the original manifold are then obtained with exponential maps. The tangent-space approximation yields upper bounds to residual variances if the Riemannian manifold has nonnegative curvature. We derive a central limit theorem for the mean function, as well as root-$n$ uniform convergence rates for other model components. Our applications include a novel framework for the analysis of longitudinal compositional data, achieved by mapping longitudinal compositional data to trajectories on the sphere, illustrated with longitudinal fruit fly behavior patterns. RFPCA is shown to outperform an unrestricted FPCA in terms of trajectory recovery and prediction in applications and simulations.
</p>projecteuclid.org/euclid.aos/1536631276_20180910220136Mon, 10 Sep 2018 22:01 EDTAssessing robustness of classification using an angular breakdown pointhttps://projecteuclid.org/euclid.aos/1536631277<strong>Junlong Zhao</strong>, <strong>Guan Yu</strong>, <strong>Yufeng Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3362--3389.</p><p><strong>Abstract:</strong><br/>
Robustness is a desirable property for many statistical techniques. As an important measure of robustness, the breakdown point has been widely used for regression problems and many other settings. Despite the existing development, we observe that the standard breakdown point criterion is not directly applicable for many classification problems. In this paper, we propose a new breakdown point criterion, namely angular breakdown point, to better quantify the robustness of different classification methods. Using this new breakdown point criterion, we study the robustness of binary large margin classification techniques, although the idea is applicable to general classification methods. Both bounded and unbounded loss functions with linear and kernel learning are considered. These studies provide useful insights on the robustness of different classification methods. Numerical results further confirm our theoretical findings.
</p>projecteuclid.org/euclid.aos/1536631277_20180910220136Mon, 10 Sep 2018 22:01 EDTTail-greedy bottom-up data decompositions and fast multiple change-point detectionhttps://projecteuclid.org/euclid.aos/1536631278<strong>Piotr Fryzlewicz</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3390--3421.</p><p><strong>Abstract:</strong><br/>
This article proposes a “tail-greedy”, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen unbalanced Haar wavelet basis. The “tail-greediness” of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple change-points in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN.
</p>projecteuclid.org/euclid.aos/1536631278_20180910220136Mon, 10 Sep 2018 22:01 EDTROCKET: Ro bust c onfidence intervals via Ke ndall’s t au for transelliptical graphical modelshttps://projecteuclid.org/euclid.aos/1536631279<strong>Rina Foygel Barber</strong>, <strong>Mladen Kolar</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3422--3450.</p><p><strong>Abstract:</strong><br/>
Understanding complex relationships between random variables is of fundamental importance in high-dimensional statistics, with numerous applications in biological and social sciences. Undirected graphical models are often used to represent dependencies between random variables, where an edge between two random variables is drawn if they are conditionally dependent given all the other measured variables. A large body of literature exists on methods that estimate the structure of an undirected graphical model, however, little is known about the distributional properties of the estimators beyond the Gaussian setting. In this paper, we focus on inference for edge parameters in a high-dimensional transelliptical model, which generalizes Gaussian and nonparanormal graphical models. We propose ROCKET, a novel procedure for estimating parameters in the latent inverse covariance matrix. We establish asymptotic normality of ROCKET in an ultra high-dimensional setting under mild assumptions, without relying on oracle model selection results. ROCKET requires the same number of samples that are known to be necessary for obtaining a $\sqrt{n}$ consistent estimator of an element in the precision matrix under a Gaussian model. Hence, it is an optimal estimator under a much larger family of distributions. The result hinges on a tight control of the sparse spectral norm of the nonparametric Kendall’s tau estimator of the correlation matrix, which is of independent interest. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory.
</p>projecteuclid.org/euclid.aos/1536631279_20180910220136Mon, 10 Sep 2018 22:01 EDTAdaptive invariant density estimation for ergodic diffusions over anisotropic classeshttps://projecteuclid.org/euclid.aos/1536631280<strong>Claudia Strauch</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3451--3480.</p><p><strong>Abstract:</strong><br/>
Consider some multivariate diffusion process $\mathbf{X}=(X_{t})_{t\geq0}$ with unique invariant probability measure and associated invariant density $\rho$, and assume that a continuous record of observations $X^{T}=(X_{t})_{0\leq t\leq T}$ of $\mathbf{X}$ is available. Recent results on functional inequalities for symmetric Markov semigroups are used in the statistical analysis of kernel estimators $\widehat{\rho}_{T}=\widehat{\rho}_{T}(X^{T})$ of $\rho$. For the basic problem of estimation with respect to $\mathrm{sup}$-norm risk under anisotropic Hölder smoothness constraints, the proposed approach yields an adaptive estimator which converges at a substantially faster rate than in standard multivariate density estimation from i.i.d. observations.
</p>projecteuclid.org/euclid.aos/1536631280_20180910220136Mon, 10 Sep 2018 22:01 EDTRobust low-rank matrix estimationhttps://projecteuclid.org/euclid.aos/1536631281<strong>Andreas Elsener</strong>, <strong>Sara van de Geer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3481--3509.</p><p><strong>Abstract:</strong><br/>
Many results have been proved for various nuclear norm penalized estimators of the uniform sampling matrix completion problem. However, most of these estimators are not robust: in most of the cases the quadratic loss function and its modifications are used. We consider robust nuclear norm penalized estimators using two well-known robust loss functions: the absolute value loss and the Huber loss. Under several conditions on the sparsity of the problem (i.e., the rank of the parameter matrix) and on the regularity of the risk function sharp and nonsharp oracle inequalities for these estimators are shown to hold with high probability. As a consequence, the asymptotic behavior of the estimators is derived. Similar error bounds are obtained under the assumption of weak sparsity, that is, the case where the matrix is assumed to be only approximately low-rank. In all of our results, we consider a high-dimensional setting. In this case, this means that we assume $n\leq pq$. Finally, various simulations confirm our theoretical results.
</p>projecteuclid.org/euclid.aos/1536631281_20180910220136Mon, 10 Sep 2018 22:01 EDTSieve bootstrap for functional time serieshttps://projecteuclid.org/euclid.aos/1536631282<strong>Efstathios Paparoditis</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3510--3538.</p><p><strong>Abstract:</strong><br/>
A bootstrap procedure for functional time series is proposed which exploits a general vector autoregressive representation of the time series of Fourier coefficients appearing in the Karhunen–Loève expansion of the functional process. A double sieve-type bootstrap method is developed, which avoids the estimation of process operators and generates functional pseudo-time series that appropriately mimics the dependence structure of the functional time series at hand. The method uses a finite set of functional principal components to capture the essential driving parts of the infinite dimensional process and a finite order vector autoregressive process to imitate the temporal dependence structure of the corresponding vector time series of Fourier coefficients. By allowing the number of functional principal components as well as the autoregressive order used to increase to infinity (at some appropriate rate) as the sample size increases, consistency of the functional sieve bootstrap can be established. We demonstrate this by proving a basic bootstrap central limit theorem for functional finite Fourier transforms and by establishing bootstrap validity in the context of a fully functional testing problem. A novel procedure to select the number of functional principal components is introduced while simulations illustrate the good finite sample performance of the new bootstrap method proposed.
</p>projecteuclid.org/euclid.aos/1536631282_20180910220136Mon, 10 Sep 2018 22:01 EDTRestricted strong convexity implies weak submodularityhttps://projecteuclid.org/euclid.aos/1536631283<strong>Ethan R. Elenberg</strong>, <strong>Rajiv Khanna</strong>, <strong>Alexandros G. Dimakis</strong>, <strong>Sahand Negahban</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3539--3568.</p><p><strong>Abstract:</strong><br/>
We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe [In ICML (2011) 1057–1064] from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.
</p>projecteuclid.org/euclid.aos/1536631283_20180910220136Mon, 10 Sep 2018 22:01 EDTMultiscale scanning in inverse problemshttps://projecteuclid.org/euclid.aos/1536631284<strong>Katharina Proksch</strong>, <strong>Frank Werner</strong>, <strong>Axel Munk</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3569--3602.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a multiscale scanning method to determine active components of a quantity $f$ w.r.t. a dictionary $\mathcal{U}$ from observations $Y$ in an inverse regression model $Y=Tf+\xi$ with linear operator $T$ and general random error $\xi$. To this end, we provide uniform confidence statements for the coefficients $\langle\varphi,f\rangle$, $\varphi\in\mathcal{U}$, under the assumption that $(T^{*})^{-1}(\mathcal{U})$ is of wavelet-type. Based on this, we obtain a multiple test that allows to identify the active components of $\mathcal{U}$, that is, $\langle f,\varphi\rangle\neq0$, $\varphi\in\mathcal{U}$, at controlled, family-wise error rate. Our results rely on a Gaussian approximation of the underlying multiscale statistic with a novel scale penalty adapted to the ill-posedness of the problem. The scale penalty furthermore ensures convergence of the statistic’s distribution towards a Gumbel limit under reasonable assumptions. The important special cases of tomography and deconvolution are discussed in detail. Further, the regression case, when $T=\text{id}$ and the dictionary consists of moving windows of various sizes (scales), is included, generalizing previous results for this setting. We show that our method obeys an oracle optimality, that is, it attains the same asymptotic power as a single-scale testing procedure at the correct scale. Simulations support our theory and we illustrate the potential of the method as an inferential tool for imaging. As a particular application, we discuss super-resolution microscopy and analyze experimental STED data to locate single DNA origami.
</p>projecteuclid.org/euclid.aos/1536631284_20180910220136Mon, 10 Sep 2018 22:01 EDTSlope meets Lasso: Improved oracle bounds and optimalityhttps://projecteuclid.org/euclid.aos/1536631285<strong>Pierre C. Bellec</strong>, <strong>Guillaume Lecué</strong>, <strong>Alexandre B. Tsybakov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3603--3642.</p><p><strong>Abstract:</strong><br/>
We show that two polynomial time methods, a Lasso estimator with adaptively chosen tuning parameter and a Slope estimator, adaptively achieve the minimax prediction and $\ell_{2}$ estimation rate $(s/n)\log(p/s)$ in high-dimensional linear regression on the class of $s$-sparse vectors in $\mathbb{R}^{p}$. This is done under the Restricted Eigenvalue (RE) condition for the Lasso and under a slightly more constraining assumption on the design for the Slope. The main results have the form of sharp oracle inequalities accounting for the model misspecification error. The minimax optimal bounds are also obtained for the $\ell_{q}$ estimation errors with $1\le q\le2$ when the model is well specified. The results are nonasymptotic, and hold both in probability and in expectation. The assumptions that we impose on the design are satisfied with high probability for a large class of random matrices with independent and possibly anisotropically distributed rows. We give a comparative analysis of conditions, under which oracle bounds for the Lasso and Slope estimators can be obtained. In particular, we show that several known conditions, such as the RE condition and the sparse eigenvalue condition are equivalent if the $\ell_{2}$-norms of regressors are uniformly bounded.
</p>projecteuclid.org/euclid.aos/1536631285_20180910220136Mon, 10 Sep 2018 22:01 EDTUniformly valid post-regularization confidence regions for many functional parameters in z-estimation frameworkhttps://projecteuclid.org/euclid.aos/1536631286<strong>Alexandre Belloni</strong>, <strong>Victor Chernozhukov</strong>, <strong>Denis Chetverikov</strong>, <strong>Ying Wei</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3643--3675.</p><p><strong>Abstract:</strong><br/>
In this paper, we develop procedures to construct simultaneous confidence bands for ${\tilde{p}}$ potentially infinite-dimensional parameters after model selection for general moment condition models where ${\tilde{p}}$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the ${\tilde{p}}$ parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for ${{\tilde{p}}\gg n}$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.
</p>projecteuclid.org/euclid.aos/1536631286_20180910220136Mon, 10 Sep 2018 22:01 EDTLocal asymptotic equivalence of pure states ensembles and quantum Gaussian white noisehttps://projecteuclid.org/euclid.aos/1536631287<strong>Cristina Butucea</strong>, <strong>Mădălin Guţă</strong>, <strong>Michael Nussbaum</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3676--3706.</p><p><strong>Abstract:</strong><br/>
Quantum technology is increasingly relying on specialised statistical inference methods for analysing quantum measurement data. This motivates the development of “quantum statistics”, a field that is shaping up at the overlap of quantum physics and “classical” statistics. One of the less investigated topics to date is that of statistical inference for infinite dimensional quantum systems, which can be seen as quantum counterpart of nonparametric statistics. In this paper, we analyse the asymptotic theory of quantum statistical models consisting of ensembles of quantum systems which are identically prepared in a pure state. In the limit of large ensembles, we establish the local asymptotic equivalence (LAE) of this i.i.d. model to a quantum Gaussian white noise model. We use the LAE result in order to establish minimax rates for the estimation of pure states belonging to Hermite–Sobolev classes of wave functions. Moreover, for quadratic functional estimation of the same states we note an elbow effect in the rates, whereas for testing a pure state a sharp parametric rate is attained over the nonparametric Hermite–Sobolev class.
</p>projecteuclid.org/euclid.aos/1536631287_20180910220136Mon, 10 Sep 2018 22:01 EDTExtremal quantile treatment effectshttps://projecteuclid.org/euclid.aos/1536631288<strong>Yichong Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3707--3740.</p><p><strong>Abstract:</strong><br/>
This paper establishes an asymptotic theory and inference method for quantile treatment effect estimators when the quantile index is close to or equal to zero. Such quantile treatment effects are of interest in many applications, such as the effect of maternal smoking on an infant’s adverse birth outcomes. When the quantile index is close to zero, the sparsity of data jeopardizes conventional asymptotic theory and bootstrap inference. When the quantile index is zero, there are no existing inference methods directly applicable in the treatment effect context. This paper addresses both of these issues by proposing new inference methods that are shown to be asymptotically valid as well as having adequate finite sample properties.
</p>projecteuclid.org/euclid.aos/1536631288_20180910220136Mon, 10 Sep 2018 22:01 EDTOptimal maximin $L_{1}$-distance Latin hypercube designs based on good lattice point designshttps://projecteuclid.org/euclid.aos/1536631289<strong>Lin Wang</strong>, <strong>Qian Xiao</strong>, <strong>Hongquan Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3741--3766.</p><p><strong>Abstract:</strong><br/>
Maximin distance Latin hypercube designs are commonly used for computer experiments, but the construction of such designs is challenging. We construct a series of maximin Latin hypercube designs via Williams transformations of good lattice point designs. Some constructed designs are optimal under the maximin $L_{1}$-distance criterion, while others are asymptotically optimal. Moreover, these designs are also shown to have small pairwise correlations between columns.
</p>projecteuclid.org/euclid.aos/1536631289_20180910220136Mon, 10 Sep 2018 22:01 EDTRho-estimators revisited: General theory and applicationshttps://projecteuclid.org/euclid.aos/1536631290<strong>Yannick Baraud</strong>, <strong>Lucien Birgé</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3767--3804.</p><p><strong>Abstract:</strong><br/>
Following Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517], we pursue our attempt to design a robust universal estimator of the joint distribution of $n$ independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution $\mathbf{P}$ and a dominated model $\mathscr{Q}$ for $\mathbf{P}$, we build an estimator $\widehat{\mathbf{P}}$ based on $\mathscr{Q}$ (a $\rho$-estimator) and measure its risk by an Hellinger-type distance. When $\mathbf{P}$ does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of $\mathbf{P}$. In most situations, this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When $\mathbf{P}$ does not belong to the model, its risk involves an additional bias term proportional to the distance between $\mathbf{P}$ and $\mathscr{Q}$, whatever the true distribution $\mathbf{P}$. From this point of view, this new version of $\rho$-estimators improves upon the previous one described in Baraud, Birgé and Sart [ Invent. Math. 207 (2017) 425–517] which required that $\mathbf{P}$ be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the maximum likelihood estimator to be a $\rho$-estimator. Finally, we consider the situation where the statistician has at her or his disposal many different models and we build a penalized version of the $\rho$-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows one to estimate the regression function but also the distribution of the errors.
</p>projecteuclid.org/euclid.aos/1536631290_20180910220136Mon, 10 Sep 2018 22:01 EDTThink globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embeddinghttps://projecteuclid.org/euclid.aos/1536631291<strong>Hau-Tieng Wu</strong>, <strong>Nan Wu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3805--3837.</p><p><strong>Abstract:</strong><br/>
Since its introduction in 2000, Locally Linear Embedding (LLE) has been widely applied in data science. We provide an asymptotical analysis of LLE under the manifold setup. We show that for a general manifold, asymptotically we may not obtain the Laplace–Beltrami operator, and the result may depend on nonuniform sampling unless a correct regularization is chosen. We also derive the corresponding kernel function, which indicates that LLE is not a Markov process. A comparison with other commonly applied nonlinear algorithms, particularly a diffusion map, is provided and its relationship with locally linear regression is also discussed.
</p>projecteuclid.org/euclid.aos/1536631291_20180910220136Mon, 10 Sep 2018 22:01 EDTNonparametric covariate-adjusted response-adaptive design based on a functional urn modelhttps://projecteuclid.org/euclid.aos/1536631292<strong>Giacomo Aletti</strong>, <strong>Andrea Ghiglietti</strong>, <strong>William F. Rosenberger</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 46, Number 6B, 3838--3866.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a general class of covariate-adjusted response-adaptive (CARA) designs based on a new functional urn model. We prove strong consistency concerning the functional urn proportion and the proportion of subjects assigned to the treatment groups, in the whole study and for each covariate profile, allowing the distribution of the responses conditioned on covariates to be estimated nonparametrically. In addition, we establish joint central limit theorems for the above quantities and the sufficient statistics of features of interest, which allow to construct procedures to make inference on the conditional response distributions. These results are then applied to typical situations concerning Gaussian and binary responses.
</p>projecteuclid.org/euclid.aos/1536631292_20180910220136Mon, 10 Sep 2018 22:01 EDTFunctional data analysis by matrix completionhttps://projecteuclid.org/euclid.aos/1543568580<strong>Marie-Hélène Descary</strong>, <strong>Victor M. Panaretos</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 1--38.</p><p><strong>Abstract:</strong><br/>
Functional data analyses typically proceed by smoothing, followed by functional PCA. This paradigm implicitly assumes that rough variation is due to nuisance noise. Nevertheless, relevant functional features such as time-localised or short scale fluctuations may indeed be rough relative to the global scale, but still smooth at shorter scales. These may be confounded with the global smooth components of variation by the smoothing and PCA, potentially distorting the parsimony and interpretability of the analysis. The goal of this paper is to investigate how both smooth and rough variations can be recovered on the basis of discretely observed functional data. Assuming that a functional datum arises as the sum of two uncorrelated components, one smooth and one rough, we develop identifiability conditions for the recovery of the two corresponding covariance operators. The key insight is that they should possess complementary forms of parsimony: one smooth and finite rank (large scale), and the other banded and potentially infinite rank (small scale). Our conditions elucidate the precise interplay between rank, bandwidth and grid resolution. Under these conditions, we show that the recovery problem is equivalent to rank-constrained matrix completion, and exploit this to construct estimators of the two covariances, without assuming knowledge of the true bandwidth or rank; we study their asymptotic behaviour, and then use them to recover the smooth and rough components of each functional datum by best linear prediction. As a result, we effectively produce separate functional PCAs for smooth and rough variation.
</p>projecteuclid.org/euclid.aos/1543568580_20181130040326Fri, 30 Nov 2018 04:03 ESTBayesian fractional posteriorshttps://projecteuclid.org/euclid.aos/1543568581<strong>Anirban Bhattacharya</strong>, <strong>Debdeep Pati</strong>, <strong>Yun Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 39--66.</p><p><strong>Abstract:</strong><br/>
We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback–Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in misspecified models. Our derivation reveals several advantages of averaging based Bayesian procedures over optimization based frequentist procedures. As an application of the Bayesian oracle inequality, we derive a sharp oracle inequality in multivariate convex regression problems. We also illustrate the theory in Gaussian process regression and density estimation problems.
</p>projecteuclid.org/euclid.aos/1543568581_20181130040326Fri, 30 Nov 2018 04:03 ESTDistribution theory for hierarchical processeshttps://projecteuclid.org/euclid.aos/1543568582<strong>Federico Camerlenghi</strong>, <strong>Antonio Lijoi</strong>, <strong>Peter Orbanz</strong>, <strong>Igor Prünster</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 67--92.</p><p><strong>Abstract:</strong><br/>
Hierarchies of discrete probability measures are remarkably popular as nonparametric priors in applications, arguably due to two key properties: (i) they naturally represent multiple heterogeneous populations; (ii) they produce ties across populations, resulting in a shrinkage property often described as “sharing of information.” In this paper, we establish a distribution theory for hierarchical random measures that are generated via normalization, thus encompassing both the hierarchical Dirichlet and hierarchical Pitman–Yor processes. These results provide a probabilistic characterization of the induced (partially exchangeable) partition structure, including the distribution and the asymptotics of the number of partition sets, and a complete posterior characterization. They are obtained by representing hierarchical processes in terms of completely random measures, and by applying a novel technique for deriving the associated distributions. Moreover, they also serve as building blocks for new simulation algorithms, and we derive marginal and conditional algorithms for Bayesian inference.
</p>projecteuclid.org/euclid.aos/1543568582_20181130040326Fri, 30 Nov 2018 04:03 ESTAdaptive estimation of the sparsity in the Gaussian vector modelhttps://projecteuclid.org/euclid.aos/1543568583<strong>Alexandra Carpentier</strong>, <strong>Nicolas Verzelen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 93--126.</p><p><strong>Abstract:</strong><br/>
Consider the Gaussian vector model with mean value $\theta$. We study the twin problems of estimating the number $\Vert \theta \Vert_{0}$ of nonzero components of $\theta$ and testing whether $\Vert \theta \Vert_{0}$ is smaller than some value. For testing, we establish the minimax separation distances for this model and introduce a minimax adaptive test. Extensions to the case of unknown variance are also discussed. Rewriting the estimation of $\Vert \theta \Vert_{0}$ as a multiple testing problem of all hypotheses $\{\Vert \theta \Vert_{0}\leq q\}$, we both derive a new way of assessing the optimality of a sparsity estimator and we exhibit such an optimal procedure. This general approach provides a roadmap for estimating the complexity of the signal in various statistical models.
</p>projecteuclid.org/euclid.aos/1543568583_20181130040326Fri, 30 Nov 2018 04:03 ESTApproximate optimal designs for multivariate polynomial regressionhttps://projecteuclid.org/euclid.aos/1543568584<strong>Yohann De Castro</strong>, <strong>Fabrice Gamboa</strong>, <strong>Didier Henrion</strong>, <strong>Roxana Hess</strong>, <strong>Jean-Bernard Lasserre</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 127--155.</p><p><strong>Abstract:</strong><br/>
We introduce a new approach aiming at computing approximate optimal designs for multivariate polynomial regressions on compact (semialgebraic) design spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically the approximate optimal design problem. The geometry of the design is recovered via semidefinite programming duality theory. This article shows that the hierarchy converges to the approximate optimal design as the order of the hierarchy increases. Furthermore, we provide a dual certificate ensuring finite convergence of the hierarchy and showing that the approximate optimal design can be computed numerically with our method. As a byproduct, we revisit the equivalence theorem of the experimental design theory: it is linked to the Christoffel polynomial and it characterizes finite convergence of the moment-sum-of-square hierarchies.
</p>projecteuclid.org/euclid.aos/1543568584_20181130040326Fri, 30 Nov 2018 04:03 ESTEfficient estimation of integrated volatility functionals via multiscale jackknifehttps://projecteuclid.org/euclid.aos/1543568585<strong>Jia Li</strong>, <strong>Yunxiao Liu</strong>, <strong>Dacheng Xiu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 156--176.</p><p><strong>Abstract:</strong><br/>
We propose semiparametrically efficient estimators for general integrated volatility functionals of multivariate semimartingale processes. A plug-in method that uses nonparametric estimates of spot volatilities is known to induce high-order biases that need to be corrected to obey a central limit theorem. Such bias terms arise from boundary effects, the diffusive and jump movements of stochastic volatility and the sampling error from the nonparametric spot volatility estimation. We propose a novel jackknife method for bias correction. The jackknife estimator is simply formed as a linear combination of a few uncorrected estimators associated with different local window sizes used in the estimation of spot volatility. We show theoretically that our estimator is asymptotically mixed Gaussian, semiparametrically efficient, and more robust to the choice of local windows. To facilitate the practical use, we introduce a simulation-based estimator of the asymptotic variance, so that our inference is derivative-free, and hence is convenient to implement.
</p>projecteuclid.org/euclid.aos/1543568585_20181130040326Fri, 30 Nov 2018 04:03 ESTNonasymptotic rates for manifold, tangent space and curvature estimationhttps://projecteuclid.org/euclid.aos/1543568586<strong>Eddie Aamari</strong>, <strong>Clément Levrard</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 177--204.</p><p><strong>Abstract:</strong><br/>
Given a noisy sample from a submanifold $M\subset\mathbb{R}^{D}$, we derive optimal rates for the estimation of tangent spaces $T_{X}M$, the second fundamental form $\mathit{II}_{X}^{M}$ and the submanifold $M$. After motivating their study, we introduce a quantitative class of $\mathcal{C}^{k}$-submanifolds in analogy with Hölder classes. The proposed estimators are based on local polynomials and allow to deal simultaneously with the three problems at stake. Minimax lower bounds are derived using a conditional version of Assouad’s lemma when the base point $X$ is random.
</p>projecteuclid.org/euclid.aos/1543568586_20181130040326Fri, 30 Nov 2018 04:03 ESTNonparametric testing for multiple survival functions with noninferiority marginshttps://projecteuclid.org/euclid.aos/1543568587<strong>Hsin-wen Chang</strong>, <strong>Ian W. McKeague</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 205--232.</p><p><strong>Abstract:</strong><br/>
New nonparametric tests for the ordering of multiple survival functions are developed with the possibility of right censorship taken into account. The motivation comes from noninferiority trials with multiple treatments. The proposed tests are based on nonparametric likelihood ratio statistics, which are known to provide more powerful tests than Wald-type procedures, but in this setting have only been studied for pairs of survival functions or in the absence of censoring. We introduce a novel type of pool adjacent violator algorithm that leads to a complete solution of the problem. The limit distributions can be expressed as weighted sums of squares involving projections of certain Gaussian processes onto the given ordered alternative. A simulation study shows that the new procedures have superior power to a competing combined-pairwise Cox model approach. We illustrate the proposed methods using data from a three-arm noninferiority trial.
</p>projecteuclid.org/euclid.aos/1543568587_20181130040326Fri, 30 Nov 2018 04:03 ESTOracle inequalities and adaptive estimation in the convolution structure density modelhttps://projecteuclid.org/euclid.aos/1543568588<strong>O. V. Lepski</strong>, <strong>T. Willer</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 233--287.</p><p><strong>Abstract:</strong><br/>
We study the problem of nonparametric estimation under $\mathbb{L}_{p}$-loss, $p\in[1,\infty)$, in the framework of the convolution structure density model on $\mathbb{R}^{d}$. This observation scheme is a generalization of two classical statistical models, namely density estimation under direct and indirect observations. The original pointwise selection rule from a family of “kernel-type” estimators is proposed. For the selected estimator, we prove an $\mathbb{L}_{p}$-norm oracle inequality and several of its consequences. Next, the problem of adaptive minimax estimation under $\mathbb{L}_{p}$-loss over the scale of anisotropic Nikol’skii classes is addressed. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. We prove that the proposed selection rule leads to the construction of an optimally or nearly optimally (up to logarithmic factors) adaptive estimator.
</p>projecteuclid.org/euclid.aos/1543568588_20181130040326Fri, 30 Nov 2018 04:03 ESTEfficient multivariate entropy estimation via $k$-nearest neighbour distanceshttps://projecteuclid.org/euclid.aos/1543568589<strong>Thomas B. Berrett</strong>, <strong>Richard J. Samworth</strong>, <strong>Ming Yuan</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 288--318.</p><p><strong>Abstract:</strong><br/>
Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [ Probl. Inform. Transm. 23 (1987), 95–101], based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in $\mathbb{R}^{d}$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when $d\leq 3$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.
</p>projecteuclid.org/euclid.aos/1543568589_20181130040326Fri, 30 Nov 2018 04:03 ESTPosterior graph selection and estimation consistency for high-dimensional Bayesian DAG modelshttps://projecteuclid.org/euclid.aos/1543568590<strong>Xuan Cao</strong>, <strong>Kshitij Khare</strong>, <strong>Malay Ghosh</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 319--348.</p><p><strong>Abstract:</strong><br/>
Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying variables. A variety of priors have been developed in recent years for Bayesian inference in DAG models, yet crucial convergence and sparsity selection properties for these models have not been thoroughly investigated. Most of these priors are adaptations/generalizations of the Wishart distribution in the DAG context. In this paper, we consider a flexible and general class of these “DAG-Wishart” priors with multiple shape parameters. Under mild regularity assumptions, we establish strong graph selection consistency and establish posterior convergence rates for estimation when the number of variables $p$ is allowed to grow at an appropriate subexponential rate with the sample size $n$.
</p>projecteuclid.org/euclid.aos/1543568590_20181130040326Fri, 30 Nov 2018 04:03 ESTLocally adaptive confidence bandshttps://projecteuclid.org/euclid.aos/1543568591<strong>Tim Patschkowski</strong>, <strong>Angelika Rohde</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 349--381.</p><p><strong>Abstract:</strong><br/>
We develop honest and locally adaptive confidence bands for probability densities. They provide substantially improved confidence statements in case of inhomogeneous smoothness, and are easily implemented and visualized. The article contributes conceptual work on locally adaptive inference as a straightforward modification of the global setting imposes severe obstacles for statistical purposes. Among others, we introduce a statistical notion of local Hölder regularity and prove a correspondingly strong version of local adaptivity. We substantially relax the straightforward localization of the self-similarity condition in order not to rule out prototypical densities. The set of densities permanently excluded from the consideration is shown to be pathological in a mathematically rigorous sense. On a technical level, the crucial component for the verification of honesty is the identification of an asymptotically least favorable stationary case by means of Slepian’s comparison inequality.
</p>projecteuclid.org/euclid.aos/1543568591_20181130040326Fri, 30 Nov 2018 04:03 ESTAsymptotic distribution-free change-point detection for multivariate and non-Euclidean datahttps://projecteuclid.org/euclid.aos/1543568592<strong>Lynna Chu</strong>, <strong>Hao Chen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 382--414.</p><p><strong>Abstract:</strong><br/>
We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution-free under the null hypothesis of no change. Analytic $p$-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.
</p>projecteuclid.org/euclid.aos/1543568592_20181130040326Fri, 30 Nov 2018 04:03 ESTStatistics on the Stiefel manifold: Theory and applicationshttps://projecteuclid.org/euclid.aos/1543568593<strong>Rudrasis Chakraborty</strong>, <strong>Baba C. Vemuri</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 415--438.</p><p><strong>Abstract:</strong><br/>
A Stiefel manifold of the compact type is often encountered in many fields of engineering including, signal and image processing, machine learning, numerical optimization and others. The Stiefel manifold is a Riemannian homogeneous space but not a symmetric space. In previous work, researchers have defined probability distributions on symmetric spaces and performed statistical analysis of data residing in these spaces. In this paper, we present original work involving definition of Gaussian distributions on a homogeneous space and show that the maximum-likelihood estimate of the location parameter of a Gaussian distribution on the homogeneous space yields the Fréchet mean (FM) of the samples drawn from this distribution. Further, we present an algorithm to sample from the Gaussian distribution on the Stiefel manifold and recursively compute the FM of these samples. We also prove the weak consistency of this recursive FM estimator. Several synthetic and real data experiments are then presented, demonstrating the superior computational performance of this estimator over the gradient descent based nonrecursive counter part as well as the stochastic gradient descent based method prevalent in literature.
</p>projecteuclid.org/euclid.aos/1543568593_20181130040326Fri, 30 Nov 2018 04:03 ESTGoodness-of-fit tests for the functional linear model based on randomly projected empirical processeshttps://projecteuclid.org/euclid.aos/1543568594<strong>Juan A. Cuesta-Albertos</strong>, <strong>Eduardo García-Portugués</strong>, <strong>Manuel Febrero-Bande</strong>, <strong>Wenceslao González-Manteiga</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 439--467.</p><p><strong>Abstract:</strong><br/>
We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-$n$ convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The computation of the test in practice involves calibration by wild bootstrap resampling and the combination of several $p$-values, arising from different projections, by means of the false discovery rate method. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives. The software provided implements the tests and allows the replication of simulations and data applications.
</p>projecteuclid.org/euclid.aos/1543568594_20181130040326Fri, 30 Nov 2018 04:03 ESTConvolved subsampling estimation with applications to block bootstraphttps://projecteuclid.org/euclid.aos/1543568595<strong>Johannes Tewes</strong>, <strong>Dimitris N. Politis</strong>, <strong>Daniel J. Nordman</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 468--496.</p><p><strong>Abstract:</strong><br/>
The block bootstrap approximates sampling distributions from dependent data by resampling data blocks. A fundamental problem is establishing its consistency for the distribution of a sample mean, as a prototypical statistic. We use a structural relationship with subsampling to characterize the bootstrap in a new and general manner. While subsampling and block bootstrap differ, the block bootstrap distribution of a sample mean equals that of a $k$-fold self-convolution of a subsampling distribution. Motivated by this, we provide simple necessary and sufficient conditions for a convolved subsampling estimator to produce a normal limit that matches the target of bootstrap estimation. These conditions may be linked to consistency properties of an original subsampling distribution, which are often obtainable under minimal assumptions. Through several examples, the results are shown to validate the block bootstrap for means under significantly weakened assumptions in many existing (and some new) dependence settings, which also addresses a standing conjecture of Politis, Romano and Wolf [ Subsampling (1999) Springer]. Beyond sample means, convolved subsampling may not match the block bootstrap, but instead provides an alternative resampling estimator that may be of interest. Under minimal dependence conditions, results also broadly establish convolved subsampling for general statistics having normal limits.
</p>projecteuclid.org/euclid.aos/1543568595_20181130040326Fri, 30 Nov 2018 04:03 ESTFeature elimination in kernel machines in moderately high dimensionshttps://projecteuclid.org/euclid.aos/1543568596<strong>Sayan Dasgupta</strong>, <strong>Yair Goldberg</strong>, <strong>Michael R. Kosorok</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 497--526.</p><p><strong>Abstract:</strong><br/>
We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features. We present theoretical properties of this method and show that it is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present a few case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach.
</p>projecteuclid.org/euclid.aos/1543568596_20181130040326Fri, 30 Nov 2018 04:03 ESTHigh-dimensional covariance matrices in elliptical distributions with application to spherical testhttps://projecteuclid.org/euclid.aos/1543568597<strong>Jiang Hu</strong>, <strong>Weiming Li</strong>, <strong>Zhi Liu</strong>, <strong>Wang Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 527--555.</p><p><strong>Abstract:</strong><br/>
This paper discusses fluctuations of linear spectral statistics of high-dimensional sample covariance matrices when the underlying population follows an elliptical distribution. Such population often possesses high order correlations among their coordinates, which have great impact on the asymptotic behaviors of linear spectral statistics. Taking such kind of dependency into consideration, we establish a new central limit theorem for the linear spectral statistics in this paper for a class of elliptical populations. This general theoretical result has wide applications and, as an example, it is then applied to test the sphericity of elliptical populations.
</p>projecteuclid.org/euclid.aos/1543568597_20181130040326Fri, 30 Nov 2018 04:03 ESTA critical threshold for design effects in network samplinghttps://projecteuclid.org/euclid.aos/1543568598<strong>Karl Rohe</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 556--582.</p><p><strong>Abstract:</strong><br/>
Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample.
In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value $\operatorname{DE}$, then constructing an estimator from the novel design makes the variance of the estimator $\operatorname{DE}$ times greater than it would be under a simple random sample with the same sample size $n$. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, $\lambda_{2}$. If $m<1/\lambda_{2}^{2}$, then the design effect is finite (i.e., the standard estimator is $\sqrt{n}$-consistent). However, if $m>1/\lambda_{2}^{2}$, then the design effect grows with $n$ (i.e., the standard estimator is no longer $\sqrt{n}$-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of $n^{\log_{m}\lambda_{2}}$. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.
</p>projecteuclid.org/euclid.aos/1543568598_20181130040326Fri, 30 Nov 2018 04:03 ESTPermutation $p$-value approximation via generalized Stolarsky invariancehttps://projecteuclid.org/euclid.aos/1543568599<strong>Hera Y. He</strong>, <strong>Kinjal Basu</strong>, <strong>Qingyuan Zhao</strong>, <strong>Art B. Owen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 583--611.</p><p><strong>Abstract:</strong><br/>
It is common for genomic data analysis to use $p$-values from a large number of permutation tests. The multiplicity of tests may require very tiny $p$-values in order to reject any null hypotheses and the common practice of using randomly sampled permutations then becomes very expensive. We propose an inexpensive approximation to $p$-values for two sample linear test statistics, derived from Stolarsky’s invariance principle. The method creates a geometrically derived reference set of approximate $p$-values for each hypothesis. The average of that set is used as a point estimate $\hat{p}$ and our generalization of the invariance principle allows us to compute the variance of the $p$-values in that set. We find that in cases where the point estimate is small, the variance is a modest multiple of the square of that point estimate, yielding a relative error property similar to that of saddlepoint approximations. On a Parkinson’s disease data set, the new approximation is faster and more accurate than the saddlepoint approximation. We also obtain a simple probabilistic explanation of Stolarsky’s invariance principle.
</p>projecteuclid.org/euclid.aos/1543568599_20181130040326Fri, 30 Nov 2018 04:03 ESTCanonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank casehttps://projecteuclid.org/euclid.aos/1543568600<strong>Zhigang Bao</strong>, <strong>Jiang Hu</strong>, <strong>Guangming Pan</strong>, <strong>Wang Zhou</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 612--640.</p><p><strong>Abstract:</strong><br/>
Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$, respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the canonical correlation analysis. We investigate the high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population cross-covariance matrix of random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, that is, there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_{1}\geq\cdots\geq r_{k}>0$. We study the sample counterparts of $r_{i},i=1,\ldots,k$, that is, the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{xx}^{-1}S_{xy}S_{yy}^{-1}S_{yx}$, denoted by $\lambda_{1}\geq\cdots\geq\lambda_{k}$. We show that there exists a threshold $r_{c}\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_{i}\leq r_{c}$, $\lambda_{i}$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_{i}>r_{c}$, $\lambda_{i}$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_{i}$’s in turn, thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of $\lambda_{i}$’s under appropriate normalization. Specifically, $\lambda_{i}$ possesses Gaussian type fluctuation if $r_{i}>r_{c}$, and follows Tracy–Widom distribution if $r_{i}<r_{c}$. Some applications of our results are also discussed.
</p>projecteuclid.org/euclid.aos/1543568600_20181130040326Fri, 30 Nov 2018 04:03 ESTUniform projection designshttps://projecteuclid.org/euclid.aos/1543568601<strong>Fasheng Sun</strong>, <strong>Yaping Wang</strong>, <strong>Hongquan Xu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 1, 641--661.</p><p><strong>Abstract:</strong><br/>
Efficient designs are in high demand in practice for both computer and physical experiments. Existing designs (such as maximin distance designs and uniform designs) may have bad low-dimensional projections, which is undesirable when only a few factors are active. We propose a new design criterion, called uniform projection criterion, by focusing on projection uniformity. Uniform projection designs generated under the new criterion scatter points uniformly in all dimensions and have good space-filling properties in terms of distance, uniformity and orthogonality. We show that the new criterion is a function of the pairwise $L_{1}$-distances between the rows, so that the new criterion can be computed at no more cost than a design criterion that ignores projection properties. We develop some theoretical results and show that maximin $L_{1}$-equidistant designs are uniform projection designs. In addition, a class of asymptotically optimal uniform projection designs based on good lattice point sets are constructed. We further illustrate an application of uniform projection designs via a multidrug combination experiment.
</p>projecteuclid.org/euclid.aos/1543568601_20181130040326Fri, 30 Nov 2018 04:03 ESTComputation of maximum likelihood estimates in cyclic structural equation modelshttps://projecteuclid.org/euclid.aos/1547197234<strong>Mathias Drton</strong>, <strong>Christopher Fox</strong>, <strong>Y. Samuel Wang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 663--690.</p><p><strong>Abstract:</strong><br/>
Software for computation of maximum likelihood estimates in linear structural equation models typically employs general techniques from nonlinear optimization, such as quasi-Newton methods. In practice, careful tuning of initial values is often required to avoid convergence issues. As an alternative approach, we propose a block-coordinate descent method that cycles through the considered variables, updating only the parameters related to a given variable in each step. We show that the resulting block update problems can be solved in closed form even when the structural equation model comprises feedback cycles. Furthermore, we give a characterization of the models for which the block-coordinate descent algorithm is well defined, meaning that for generic data and starting values all block optimization problems admit a unique solution. For the characterization, we represent each model by its mixed graph (also known as path diagram), which leads to criteria that can be checked in time that is polynomial in the number of considered variables.
</p>projecteuclid.org/euclid.aos/1547197234_20190111040129Fri, 11 Jan 2019 04:01 ESTFréchet regression for random objects with Euclidean predictorshttps://projecteuclid.org/euclid.aos/1547197235<strong>Alexander Petersen</strong>, <strong>Hans-Georg Müller</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 691--719.</p><p><strong>Abstract:</strong><br/>
Increasingly, statisticians are faced with the task of analyzing complex data that are non-Euclidean and specifically do not lie in a vector space. To address the need for statistical methods for such data, we introduce the concept of Fréchet regression. This is a general approach to regression when responses are complex random objects in a metric space and predictors are in $\mathcal{R}^{p}$, achieved by extending the classical concept of a Fréchet mean to the notion of a conditional Fréchet mean. We develop generalized versions of both global least squares regression and local weighted least squares smoothing. The target quantities are appropriately defined population versions of global and local regression for response objects in a metric space. We derive asymptotic rates of convergence for the corresponding fitted regressions using observed data to the population targets under suitable regularity conditions by applying empirical process methods. For the special case of random objects that reside in a Hilbert space, such as regression models with vector predictors and functional data as responses, we obtain a limit distribution. The proposed methods have broad applicability. Illustrative examples include responses that consist of probability distributions and correlation matrices, and we demonstrate both global and local Fréchet regression for demographic and brain imaging data. Local Fréchet regression is also illustrated via a simulation with response data which lie on the sphere.
</p>projecteuclid.org/euclid.aos/1547197235_20190111040129Fri, 11 Jan 2019 04:01 ESTDivide and conquer in nonstandard problems and the super-efficiency phenomenonhttps://projecteuclid.org/euclid.aos/1547197236<strong>Moulinath Banerjee</strong>, <strong>Cécile Durot</strong>, <strong>Bodhisattva Sen</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 720--757.</p><p><strong>Abstract:</strong><br/>
We study how the divide and conquer principle works in non-standard problems where rates of convergence are typically slower than $\sqrt{n}$ and limit distributions are non-Gaussian, and provide a detailed treatment for a variety of important and well-studied problems involving nonparametric estimation of a monotone function. We find that for a fixed model, the pooled estimator, obtained by averaging nonstandard estimates across mutually exclusive subsamples, outperforms the nonstandard monotonicity-constrained (global) estimator based on the entire sample in the sense of pointwise estimation of the function. We also show that, under appropriate conditions, if the number of subsamples is allowed to increase at appropriate rates, the pooled estimator is asymptotically normally distributed with a variance that is empirically estimable from the subsample-level estimates. Further, in the context of monotone regression, we show that this gain in efficiency under a fixed model comes at a price—the pooled estimator’s performance, in a uniform sense (maximal risk) over a class of models worsens as the number of subsamples increases, leading to a version of the super-efficiency phenomenon. In the process, we develop analytical results for the order of the bias in isotonic regression, which are of independent interest.
</p>projecteuclid.org/euclid.aos/1547197236_20190111040129Fri, 11 Jan 2019 04:01 ESTRank verification for exponential familieshttps://projecteuclid.org/euclid.aos/1547197237<strong>Kenneth Hung</strong>, <strong>William Fithian</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 758--782.</p><p><strong>Abstract:</strong><br/>
Many statistical experiments involve comparing multiple population groups. For example, a public opinion poll may ask which of several political candidates commands the most support; a social scientific survey may report the most common of several responses to a question; or, a clinical trial may compare binary patient outcomes under several treatment conditions to determine the most effective treatment. Having observed the “winner” (largest observed response) in a noisy experiment, it is natural to ask whether that candidate, survey response or treatment is actually the “best” (stochastically largest response). This article concerns the problem of rank verification —post hoc significance tests of whether the orderings discovered in the data reflect the population ranks. For exponential family models, we show under mild conditions that an unadjusted two-tailed pairwise test comparing the first two-order statistics (i.e., comparing the “winner” to the “runner-up”) is a valid test of whether the winner is truly the best. We extend our analysis to provide equally simple procedures to obtain lower confidence bounds on the gap between the winning population and the others, and to verify ranks beyond the first.
</p>projecteuclid.org/euclid.aos/1547197237_20190111040129Fri, 11 Jan 2019 04:01 ESTSub-Gaussian estimators of the mean of a random vectorhttps://projecteuclid.org/euclid.aos/1547197238<strong>Gábor Lugosi</strong>, <strong>Shahar Mendelson</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 783--794.</p><p><strong>Abstract:</strong><br/>
We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.
</p>projecteuclid.org/euclid.aos/1547197238_20190111040129Fri, 11 Jan 2019 04:01 ESTCombinatorial inference for graphical modelshttps://projecteuclid.org/euclid.aos/1547197239<strong>Matey Neykov</strong>, <strong>Junwei Lu</strong>, <strong>Han Liu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 795--827.</p><p><strong>Abstract:</strong><br/>
We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. To begin with, we study the information-theoretic limits of a large family of combinatorial inference problems. We propose new concepts including structural packing and buffer entropies to characterize how the complexity of combinatorial graph structures impacts the corresponding minimax lower bounds. On the other hand, we propose a family of novel and practical structural testing algorithms to match the lower bounds. We provide numerical results on both synthetic graphical models and brain networks to illustrate the usefulness of these proposed methods.
</p>projecteuclid.org/euclid.aos/1547197239_20190111040129Fri, 11 Jan 2019 04:01 ESTEstimation and prediction using generalized Wendland covariance functions under fixed domain asymptoticshttps://projecteuclid.org/euclid.aos/1547197240<strong>Moreno Bevilacqua</strong>, <strong>Tarik Faouzi</strong>, <strong>Reinhard Furrer</strong>, <strong>Emilio Porcu</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 828--856.</p><p><strong>Abstract:</strong><br/>
We study estimation and prediction of Gaussian random fields with covariance models belonging to the generalized Wendland (GW) class, under fixed domain asymptotics. As for the Matérn case, this class allows for a continuous parameterization of smoothness of the underlying Gaussian random field, being additionally compactly supported. The paper is divided into three parts: first, we characterize the equivalence of two Gaussian measures with GW covariance function, and we provide sufficient conditions for the equivalence of two Gaussian measures with Matérn and GW covariance functions. In the second part, we establish strong consistency and asymptotic distribution of the maximum likelihood estimator of the microergodic parameter associated to GW covariance model, under fixed domain asymptotics. The third part elucidates the consequences of our results in terms of (misspecified) best linear unbiased predictor, under fixed domain asymptotics. Our findings are illustrated through a simulation study: the former compares the finite sample behavior of the maximum likelihood estimation of the microergodic parameter with the given asymptotic distribution. The latter compares the finite-sample behavior of the prediction and its associated mean square error when using two equivalent Gaussian measures with Matérn and GW covariance models, using covariance tapering as benchmark.
</p>projecteuclid.org/euclid.aos/1547197240_20190111040129Fri, 11 Jan 2019 04:01 ESTChebyshev polynomials, moment matching, and optimal estimation of the unseenhttps://projecteuclid.org/euclid.aos/1547197241<strong>Yihong Wu</strong>, <strong>Pengkun Yang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 857--883.</p><p><strong>Abstract:</strong><br/>
We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least $\frac{1}{k}$. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of $\varepsilon k$ with probability at least 0.1 is within universal constant factors of $\frac{k}{\log k}\log^{2}\frac{1}{\varepsilon }$, which improves the state-of-the-art result of $\frac{k}{\varepsilon^{2}\log k}$ in [In Advances in Neural Information Processing Systems (2013) 2157–2165]. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in $O(n+\log^{2}k)$ time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.
</p>projecteuclid.org/euclid.aos/1547197241_20190111040129Fri, 11 Jan 2019 04:01 ESTPartial least squares prediction in high-dimensional regressionhttps://projecteuclid.org/euclid.aos/1547197242<strong>R. Dennis Cook</strong>, <strong>Liliana Forzani</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 884--908.</p><p><strong>Abstract:</strong><br/>
We study the asymptotic behavior of predictions from partial least squares (PLS) regression as the sample size and number of predictors diverge in various alignments. We show that there is a range of regression scenarios where PLS predictions have the usual root-$n$ convergence rate, even when the sample size is substantially smaller than the number of predictors, and an even wider range where the rate is slower but may still produce practically useful results. We show also that PLS predictions achieve their best asymptotic behavior in abundant regressions where many predictors contribute information about the response. Their asymptotic behavior tends to be undesirable in sparse regressions where few predictors contribute information about the response.
</p>projecteuclid.org/euclid.aos/1547197242_20190111040129Fri, 11 Jan 2019 04:01 ESTSignal aliasing in Gaussian random fields for experiments with qualitative factorshttps://projecteuclid.org/euclid.aos/1547197243<strong>Ming-Chung Chang</strong>, <strong>Shao-Wei Cheng</strong>, <strong>Ching-Shui Cheng</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 909--935.</p><p><strong>Abstract:</strong><br/>
Signal aliasing is an inevitable consequence of using fractional factorial designs. Unlike linear models with fixed factorial effects, for Gaussian random field models advocated in some Bayesian design and computer experiment literature, the issue of signal aliasing has not received comparable attention. In the present article, this issue is tackled for experiments with qualitative factors. The signals in a Gaussian random field can be characterized by the random effects identified from the covariance function. The aliasing severity of the signals is determined by two key elements: (i) the aliasing pattern, which depends only on the chosen design, and (ii) the effect priority, which is related to the variances of the random effects and depends on the model parameters. We first apply this framework to study the signal-aliasing problem for regular fractional factorial designs. For general factorial designs including nonregular ones, we propose an aliasing severity index to quantify the severity of signal aliasing. We also observe that the aliasing severity index is highly correlated with the prediction variance.
</p>projecteuclid.org/euclid.aos/1547197243_20190111040129Fri, 11 Jan 2019 04:01 ESTCross: Efficient low-rank tensor completionhttps://projecteuclid.org/euclid.aos/1547197244<strong>Anru Zhang</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 936--964.</p><p><strong>Abstract:</strong><br/>
The completion of tensors, or high-order arrays, attracts significant attention in recent research. Current literature on tensor completion primarily focuses on recovery from a set of uniformly randomly measured entries, and the required number of measurements to achieve recovery is not guaranteed to be optimal. In addition, the implementation of some previous methods are NP-hard. In this article, we propose a framework for low-rank tensor completion via a novel tensor measurement scheme that we name Cross. The proposed procedure is efficient and easy to implement. In particular, we show that a third-order tensor of Tucker rank-$(r_{1},r_{2},r_{3})$ in $p_{1}$-by-$p_{2}$-by-$p_{3}$ dimensional space can be recovered from as few as $r_{1}r_{2}r_{3}+r_{1}(p_{1}-r_{1})+r_{2}(p_{2}-r_{2})+r_{3}(p_{3}-r_{3})$ noiseless measurements, which matches the sample complexity lower bound. In the case of noisy measurements, we also develop a theoretical upper bound and the matching minimax lower bound for recovery error over certain classes of low-rank tensors for the proposed procedure. The results can be further extended to fourth or higher-order tensors. Simulation studies show that the method performs well under a variety of settings. Finally, the procedure is illustrated through a real dataset in neuroimaging.
</p>projecteuclid.org/euclid.aos/1547197244_20190111040129Fri, 11 Jan 2019 04:01 ESTCovariate balancing propensity score by tailored loss functionshttps://projecteuclid.org/euclid.aos/1547197245<strong>Qingyuan Zhao</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 965--993.</p><p><strong>Abstract:</strong><br/>
In observational studies, propensity scores are commonly estimated by maximum likelihood but may fail to balance high-dimensional pretreatment covariates even after specification search. We introduce a general framework that unifies and generalizes several recent proposals to improve covariate balance when designing an observational study. Instead of the likelihood function, we propose to optimize special loss functions—covariate balancing scoring rules (CBSR)—to estimate the propensity score. A CBSR is uniquely determined by the link function in the GLM and the estimand (a weighted average treatment effect). We show CBSR does not lose asymptotic efficiency in estimating the weighted average treatment effect compared to the Bernoulli likelihood, but CBSR is much more robust in finite samples. Borrowing tools developed in statistical learning, we propose practical strategies to balance covariate functions in rich function classes. This is useful to estimate the maximum bias of the inverse probability weighting (IPW) estimators and construct honest confidence intervals in finite samples. Lastly, we provide several numerical examples to demonstrate the tradeoff of bias and variance in the IPW-type estimators and the tradeoff in balancing different function classes of the covariates.
</p>projecteuclid.org/euclid.aos/1547197245_20190111040129Fri, 11 Jan 2019 04:01 ESTThe geometry of hypothesis testing over convex cones: Generalized likelihood ratio tests and minimax radiihttps://projecteuclid.org/euclid.aos/1547197246<strong>Yuting Wei</strong>, <strong>Martin J. Wainwright</strong>, <strong>Adityanand Guntuboyina</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 994--1024.</p><p><strong>Abstract:</strong><br/>
We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arises in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing and shape-constrained inference in nonparametric statistics. We provide a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones. When applied to concrete examples, this result reveals some interesting phenomena that do not arise in the analogous problems of estimation under convex constraints. In particular, in contrast to estimation error, the testing error no longer depends purely on the problem complexity via a volume-based measure (such as metric entropy or Gaussian complexity); other geometric properties of the cones also play an important role. In order to address the issue of optimality, we prove information-theoretic lower bounds for the minimax testing radius again in terms of geometric quantities. Our general theorems are illustrated by examples including the cases of monotone and orthant cones, and involve some results of independent interest.
</p>projecteuclid.org/euclid.aos/1547197246_20190111040129Fri, 11 Jan 2019 04:01 ESTNonparametric implied Lévy densitieshttps://projecteuclid.org/euclid.aos/1547197247<strong>Likuan Qin</strong>, <strong>Viktor Todorov</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1025--1060.</p><p><strong>Abstract:</strong><br/>
This paper develops a nonparametric estimator for the Lévy density of an asset price, following an Itô semimartingale, implied by short-maturity options. The asymptotic setup is one in which the time to maturity of the available options decreases, the mesh of the available strike grid shrinks and the strike range expands. The estimation is based on aggregating the observed option data into nonparametric estimates of the conditional characteristic function of the return distribution, the derivatives of which allow to infer the Fourier transform of a known transform of the Lévy density in a way which is robust to the level of the unknown diffusive volatility of the asset price. The Lévy density estimate is then constructed via Fourier inversion. We derive an asymptotic bound for the integrated squared error of the estimator in the general case as well as its probability limit in the special Lévy case. We further show rate optimality of our Lévy density estimator in a minimax sense. An empirical application to market index options reveals relative stability of the left tail decay during high and low volatility periods.
</p>projecteuclid.org/euclid.aos/1547197247_20190111040129Fri, 11 Jan 2019 04:01 ESTOn model selection from a finite family of possibly misspecified time series modelshttps://projecteuclid.org/euclid.aos/1547197248<strong>Hsiang-Ling Hsu</strong>, <strong>Ching-Kang Ing</strong>, <strong>Howell Tong</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1061--1087.</p><p><strong>Abstract:</strong><br/>
Consider finite parametric time series models. “I have $n$ observations and $k$ models, which model should I choose on the basis of the data alone” is a frequently asked question in many practical situations. This poses the key problem of selecting a model from a collection of candidate models, none of which is necessarily the true data generating process (DGP). Although existing literature on model selection is vast, there is a serious lacuna in that the above problem does not seem to have received much attention. In fact, existing model selection criteria have avoided addressing the above problem directly, either by assuming that the true DGP is included among the candidate models and aiming at choosing this DGP, or by assuming that the true DGP can be asymptotically approximated by an increasing sequence of candidate models and aiming at choosing the candidate having the best predictive capability in some asymptotic sense. In this article, we propose a misspecification-resistant information criterion (MRIC) to address the key problem directly. We first prove the asymptotic efficiency of MRIC whether the true DGP is among the candidates or not, within the fixed-dimensional framework. We then extend this result to the high-dimensional case in which the number of candidate variables is much larger than the sample size. In particular, we show that MRIC can be used in conjunction with a high-dimensional model selection method to select the (asymptotically) best predictive model across several high-dimensional misspecified time series models.
</p>projecteuclid.org/euclid.aos/1547197248_20190111040129Fri, 11 Jan 2019 04:01 ESTEstimating the algorithmic variance of randomized ensembles via the bootstraphttps://projecteuclid.org/euclid.aos/1547197249<strong>Miles E. Lopes</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1088--1112.</p><p><strong>Abstract:</strong><br/>
Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is “large enough”—so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of “algorithmic variance” (i.e., the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable $\mathrm{ERR}_{t}$ denote the prediction error of a randomized ensemble of size $t$. Working under a “first-order model” for randomized ensembles, we prove that the centered law of $\mathrm{ERR}_{t}$ can be consistently approximated via the proposed method as $t\to\infty$. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of $\mathrm{ERR}_{t}$ are negligible.
</p>projecteuclid.org/euclid.aos/1547197249_20190111040129Fri, 11 Jan 2019 04:01 ESTEfficient nonparametric Bayesian inference for $X$-ray transformshttps://projecteuclid.org/euclid.aos/1547197250<strong>François Monard</strong>, <strong>Richard Nickl</strong>, <strong>Gabriel P. Paternain</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1113--1147.</p><p><strong>Abstract:</strong><br/>
We consider the statistical inverse problem of recovering a function $f:M\to \mathbb{R}$, where $M$ is a smooth compact Riemannian manifold with boundary, from measurements of general $X$-ray transforms $I_{a}(f)$ of $f$, corrupted by additive Gaussian noise. For $M$ equal to the unit disk with “flat” geometry and $a=0$ this reduces to the standard Radon transform, but our general setting allows for anisotropic media $M$ and can further model local “attenuation” effects—both highly relevant in practical imaging problems such as SPECT tomography. We study a nonparametric Bayesian inference method based on standard Gaussian process priors for $f$. The posterior reconstruction of $f$ corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty that does not require the calculation of the singular value decomposition of the forward operator $I_{a}$. We prove Bernstein–von Mises theorems for a large family of one-dimensional linear functionals of $f$, and they entail that posterior-based inferences such as credible sets are valid and optimal from a frequentist point of view. In particular we derive the asymptotic distribution of smooth linear functionals of the Tikhonov regulariser, which attains the semiparametric information lower bound. The proofs rely on an invertibility result for the “Fisher information” operator $I_{a}^{*}I_{a}$ between suitable function spaces, a result of independent interest that relies on techniques from microlocal analysis. We illustrate the performance of the proposed method via simulations in various settings.
</p>projecteuclid.org/euclid.aos/1547197250_20190111040129Fri, 11 Jan 2019 04:01 ESTGeneralized random forestshttps://projecteuclid.org/euclid.aos/1547197251<strong>Susan Athey</strong>, <strong>Julie Tibshirani</strong>, <strong>Stefan Wager</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1148--1178.</p><p><strong>Abstract:</strong><br/>
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [ Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
</p>projecteuclid.org/euclid.aos/1547197251_20190111040129Fri, 11 Jan 2019 04:01 ESTA classification criterion for definitive screening designshttps://projecteuclid.org/euclid.aos/1547197252<strong>Eric D. Schoen</strong>, <strong>Pieter T. Eendebak</strong>, <strong>Peter Goos</strong>. <p><strong>Source: </strong>The Annals of Statistics, Volume 47, Number 2, 1179--1202.</p><p><strong>Abstract:</strong><br/>
A conference design is a rectangular matrix with orthogonal columns, one zero in each column, at most one zero in each row and $-1$’s and $+1$’s elsewhere. A definitive screening design can be constructed by folding over a conference design and adding a row vector of zeroes. We prove that, for a given even number of rows, there is just one isomorphism class for conference designs with two or three columns. Next, we derive all isomorphism classes for conference designs with four columns. Based on our results, we propose a classification criterion for definitive screening designs founded on projections into four factors. We illustrate the potential of the criterion by studying designs with 24 and 82 factors.
</p>projecteuclid.org/euclid.aos/1547197252_20190111040129Fri, 11 Jan 2019 04:01 EST