Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTSimultaneous variable selection and smoothing for high-dimensional function-on-scalar regressionhttps://projecteuclid.org/euclid.ejs/1545382951<strong>Alice Parodi</strong>, <strong>Matthew Reimherr</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4602--4639.</p><p><strong>Abstract:</strong><br/>
We present a new methodology, called FLAME , which simultaneously selects important predictors and produces smooth estimates in a function-on-scalar linear model with a large number of scalar predictors. Our framework applies quite generally by viewing the functional outcomes as elements of an arbitrary real separable Hilbert space. To select important predictors while also producing smooth parameter estimates, we utilize operators to define subspaces that are imbued with certain desirable properties as determined by the practitioner and the setting, such as smoothness or periodicity. In special cases one can show that these subspaces correspond to Reproducing Kernel Hilbert Spaces, however our methodology applies more broadly. We provide a very fast algorithm for computing the estimators, which is based on a functional coordinate descent, and an R package, flm, whose backend is written in C++. Asymptotic properties of the estimators are developed and simulations are provided to illustrate the advantages of FLAME over existing methods, both in terms of statistical performance and computational efficiency. We conclude with an application to childhood asthma, where we find a potentially important genetic mutation that was not selected by previous functional data based methods.
</p>projecteuclid.org/euclid.ejs/1545382951_20181221040301Fri, 21 Dec 2018 04:03 ESTExact and efficient inference for partial Bayes problemshttps://projecteuclid.org/euclid.ejs/1545382952<strong>Yixuan Qiu</strong>, <strong>Lingsong Zhang</strong>, <strong>Chuanhai Liu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4640--4668.</p><p><strong>Abstract:</strong><br/>
Bayesian methods are useful for statistical inference. However, real-world problems can be challenging using Bayesian methods when the data analyst has only limited prior knowledge. In this paper we consider a class of problems, called partial Bayes problems, in which the prior information is only partially available. Taking the recently proposed inferential model approach, we develop a general inference framework for partial Bayes problems, and derive both exact and efficient solutions. In addition to the theoretical investigation, numerical results and real applications are used to demonstrate the superior performance of the proposed method.
</p>projecteuclid.org/euclid.ejs/1545382952_20181221040301Fri, 21 Dec 2018 04:03 ESTFast learning rate of non-sparse multiple kernel learning and optimal regularization strategieshttps://projecteuclid.org/euclid.ejs/1531468825<strong>Taiji Suzuki</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2141--2192.</p><p><strong>Abstract:</strong><br/>
In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including $\ell_{p}$-MKL. According to the numerical experiments, it is known that the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy ($\ell_{1}$ or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, $\ell_{1}$-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse $\ell_{1}$-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.
</p>projecteuclid.org/euclid.ejs/1531468825_20181221221108Fri, 21 Dec 2018 22:11 ESTModel-free envelope dimension selectionhttps://projecteuclid.org/euclid.ejs/1531814505<strong>Xin Zhang</strong>, <strong>Qing Mai</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2193--2216.</p><p><strong>Abstract:</strong><br/>
An envelope is a targeted dimension reduction subspace for simultaneously achieving dimension reduction and improving parameter estimation efficiency. While many envelope methods have been proposed in recent years, all envelope methods hinge on the knowledge of a key hyperparameter, the structural dimension of the envelope. How to estimate the envelope dimension consistently is of substantial interest from both theoretical and practical aspects. Moreover, very recent advances in the literature have generalized envelope as a model-free method, which makes selecting the envelope dimension even more challenging. Likelihood-based approaches such as information criteria and likelihood-ratio tests either cannot be directly applied or have no theoretical justification. To address this critical issue of dimension selection, we propose two unified approaches – called FG and 1D selections – for determining the envelope dimension that can be applied to any envelope models and methods. The two model-free selection approaches are based on the two different envelope optimization procedures: the full Grassmannian (FG) optimization and the 1D algorithm [11], and are shown to be capable of correctly identifying the structural dimension with a probability tending to 1 under mild moment conditions as the sample size increases. While the FG selection unifies and generalizes the BIC and modified BIC approaches that existing in the literature, and hence provides the theoretical justification of them under weak moment condition and model-free context, the 1D selection is computationally more stable and efficient in finite sample. Extensive simulations and a real data analysis demonstrate the superb performance of our proposals.
</p>projecteuclid.org/euclid.ejs/1531814505_20181221221108Fri, 21 Dec 2018 22:11 ESTPrediction of dynamical time series using kernel based regression and smooth splineshttps://projecteuclid.org/euclid.ejs/1532333003<strong>Raymundo Navarrete</strong>, <strong>Divakar Viswanath</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2217--2237.</p><p><strong>Abstract:</strong><br/>
Prediction of dynamical time series with additive noise using support vector machines or kernel based regression is consistent for certain classes of discrete dynamical systems. Consistency implies that these methods are effective at computing the expected value of a point at a future time given the present coordinates. However, the present coordinates themselves are noisy, and therefore, these methods are not necessarily effective at removing noise. In this article, we consider denoising and prediction as separate problems for flows, as opposed to discrete time dynamical systems, and show that the use of smooth splines is more effective at removing noise. Combination of smooth splines and kernel based regression yields predictors that are more accurate on benchmarks typically by a factor of 2 or more. We prove that kernel based regression in combination with smooth splines converges to the exact predictor for time series extracted from any compact invariant set of any sufficiently smooth flow. As a consequence of convergence, one can find examples where the combination of kernel based regression with smooth splines is superior by even a factor of $100$. The predictors that we analyze and compute operate on delay coordinate data and not the full state vector, which is typically not observable.
</p>projecteuclid.org/euclid.ejs/1532333003_20181221221108Fri, 21 Dec 2018 22:11 ESTConfidence intervals for linear unbiased estimators under constrained dependencehttps://projecteuclid.org/euclid.ejs/1532333004<strong>Peter M. Aronow</strong>, <strong>Forrest W. Crawford</strong>, <strong>José R. Zubizarreta</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2238--2252.</p><p><strong>Abstract:</strong><br/>
We propose an approach for conducting inference for linear unbiased estimators applied to dependent outcomes given constraints on their independence relations, in the form of a dependency graph. We establish the consistency of an oracle variance estimator when a dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when a dependency graph is unknown, but topological or degree-based constraints are available on one such graph. We develop alternative bounds, including a closed-form bound, under an additional homoskedasticity assumption. We establish a basis for Wald-type confidence intervals that are guaranteed to have asymptotically conservative coverage.
</p>projecteuclid.org/euclid.ejs/1532333004_20181221221108Fri, 21 Dec 2018 22:11 ESTUpper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real linehttps://projecteuclid.org/euclid.ejs/1532333005<strong>Jérémie Bigot</strong>, <strong>Raúl Gouet</strong>, <strong>Thierry Klein</strong>, <strong>Alfredo López</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2253--2289.</p><p><strong>Abstract:</strong><br/>
This paper is focused on the statistical analysis of probability measures $\boldsymbol{\nu }_{1},\ldots ,\boldsymbol{\nu }_{n}$ on ${\mathbb{R}}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $\boldsymbol{\nu }_{i}$ are absolutely continuous with densities $\boldsymbol{f}_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1\leq i\leq n;\;1\leq j\leq p_{i}}$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots ,X_{i,p_{i}}$ are iid observations sampled from a random measure $\boldsymbol{\nu }_{i}$ for each $1\leq i\leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1\leq i\leq n}p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
</p>projecteuclid.org/euclid.ejs/1532333005_20181221221108Fri, 21 Dec 2018 22:11 ESTExchangeable trait allocationshttps://projecteuclid.org/euclid.ejs/1532484331<strong>Trevor Campbell</strong>, <strong>Diana Cai</strong>, <strong>Tamara Broderick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2290--2322.</p><p><strong>Abstract:</strong><br/>
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering—a special case of trait allocation—exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence “dust” groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.
</p>projecteuclid.org/euclid.ejs/1532484331_20181221221108Fri, 21 Dec 2018 22:11 ESTNon-parametric estimation of time varying AR(1)–processes with local stationarity and periodicityhttps://projecteuclid.org/euclid.ejs/1532484332<strong>Jean-Marc Bardet</strong>, <strong>Paul Doukhan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2323--2354.</p><p><strong>Abstract:</strong><br/>
Extending the ideas of [7], this paper aims at providing a kernel based non-parametric estimation of a new class of time varying AR(1) processes $(X_{t})$, with local stationarity and periodic features (with a known period $T$), inducing the definition $X_{t}=a_{t}(t/nT)X_{t-1}+\xi_{t}$ for $t\in \mathbb{N}$ and with $a_{t+T}\equiv a_{t}$. Central limit theorems are established for kernel estimators $\widehat{a}_{s}(u)$ reaching classical minimax rates and only requiring low order moment conditions of the white noise $(\xi_{t})_{t}$ up to the second order.
</p>projecteuclid.org/euclid.ejs/1532484332_20181221221108Fri, 21 Dec 2018 22:11 ESTScalable methods for Bayesian selective inferencehttps://projecteuclid.org/euclid.ejs/1532484333<strong>Snigdha Panigrahi</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2355--2400.</p><p><strong>Abstract:</strong><br/>
Modeled along the truncated approach in [20], selection-adjusted inference in a Bayesian regime is based on a selective posterior . Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in the use of a truncated likelihood. The normalizer of the truncated law in the adjusted framework is the probability of the selection event; this typically lacks a closed form expression leading to the computational bottleneck in sampling from such a posterior. The current work provides an optimization problem that approximates the otherwise intractable selective posterior and leads to scalable methods that give valid post-selective Bayesian inference. The selection procedures are posed as data-queries that solve a randomized version of a convex learning program which have the advantage of preserving more left-over information for inference.
We propose a randomization scheme under which the approximating optimization has separable constraints that result in a partially separable objective in lower dimensions for many commonly used selective queries. We show that the proposed optimization gives a valid exponential rate of decay for the selection probability on a large deviation scale under a Gaussian randomization scheme. On the implementation side, we offer a primal-dual method to solve the optimization problem leading to an approximate posterior; this allows us to exploit the usual merits of a Bayesian machinery in both low and high dimensional regimes when the underlying signal is effectively sparse. We show that the adjusted estimates empirically demonstrate better frequentist properties in comparison to the unadjusted estimates based on the usual posterior, when applied to a wide range of constrained, convex data queries.
</p>projecteuclid.org/euclid.ejs/1532484333_20181221221108Fri, 21 Dec 2018 22:11 ESTAsymptotic minimum scoring rule predictionhttps://projecteuclid.org/euclid.ejs/1532484334<strong>Federica Giummolè</strong>, <strong>Valentina Mameli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2401--2429.</p><p><strong>Abstract:</strong><br/>
Most of the methods nowadays employed in forecast problems are based on scoring rules. There is a divergence function associated to each scoring rule, that can be used as a measure of discrepancy between probability distributions. This approach is commonly used in the literature for comparing two competing predictive distributions on the basis of their relative expected divergence from the true distribution.
In this paper we focus on the use of scoring rules as a tool for finding predictive distributions for an unknown of interest. The proposed predictive distributions are asymptotic modifications of the estimative solutions, obtained by minimizing the expected divergence related to a general scoring rule.
The asymptotic properties of such predictive distributions are strictly related to the geometry induced by the considered divergence on a regular parametric model. In particular, the existence of a global optimal predictive distribution is guaranteed for invariant divergences, whose local behaviour is similar to well known $\alpha $-divergences.
We show that a wide class of divergences obtained from weighted scoring rules share invariance properties with $\alpha $-divergences. For weighted scoring rules it is thus possible to obtain a global solution to the prediction problem. Unfortunately, the divergences associated to many widely used scoring rules are not invariant. Still for these cases we provide a locally optimal predictive distribution, within a specified parametric model.
</p>projecteuclid.org/euclid.ejs/1532484334_20181221221108Fri, 21 Dec 2018 22:11 ESTOn the role of the overall effect in exponential familieshttps://projecteuclid.org/euclid.ejs/1532484335<strong>Anna Klimova</strong>, <strong>Tamás Rudas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2430--2453.</p><p><strong>Abstract:</strong><br/>
Exponential families of discrete probability distributions when the normalizing constant (or overall effect) is added or removed are compared in this paper. The latter setup, in which the exponential family is curved, is particularly relevant when the sample space is an incomplete Cartesian product or when it is very large, so that the computational burden is significant. The lack or presence of the overall effect has a fundamental impact on the properties of the exponential family. When the overall effect is added, the family becomes the smallest regular exponential family containing the curved one. The procedure is related to the homogenization of an inhomogeneous variety discussed in algebraic geometry, of which a statistical interpretation is given as an augmentation of the sample space. The changes in the kernel basis representation when the overall effect is included or removed are derived. The geometry of maximum likelihood estimates, also allowing zero observed frequencies, is described with and without the overall effect, and various algorithms are compared. The importance of the results is illustrated by an example from cell biology, showing that routinely including the overall effect leads to estimates which are not in the model intended by the researchers.
</p>projecteuclid.org/euclid.ejs/1532484335_20181221221108Fri, 21 Dec 2018 22:11 ESTA new design strategy for hypothesis testing under response adaptive randomizationhttps://projecteuclid.org/euclid.ejs/1532484336<strong>Alessandro Baldi Antognini</strong>, <strong>Alessandro Vagheggini</strong>, <strong>Maroussa Zagoraiou</strong>, <strong>Marco Novelli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2454--2481.</p><p><strong>Abstract:</strong><br/>
The aim of this paper is to provide a new design strategy for response adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more efficient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed approach.
</p>projecteuclid.org/euclid.ejs/1532484336_20181221221108Fri, 21 Dec 2018 22:11 ESTWasserstein and total variation distance between marginals of Lévy processeshttps://projecteuclid.org/euclid.ejs/1532657104<strong>Ester Mariucci</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2482--2514.</p><p><strong>Abstract:</strong><br/>
We present upper bounds for the Wasserstein distance of order $p$ between the marginals of Lévy processes, including Gaussian approximations for jumps of infinite activity. Using the convolution structure, we further derive upper bounds for the total variation distance between the marginals of Lévy processes. Connections to other metrics like Zolotarev and Toscani-Fourier distances are established. The theory is illustrated by concrete examples and an application to statistical lower bounds.
</p>projecteuclid.org/euclid.ejs/1532657104_20181221221108Fri, 21 Dec 2018 22:11 ESTA noninformative Bayesian approach for selecting a good post-stratificationhttps://projecteuclid.org/euclid.ejs/1532678418<strong>Patrick Zimmerman</strong>, <strong>Glen Meeden</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2515--2536.</p><p><strong>Abstract:</strong><br/>
In the standard design approach to survey sampling prior information is often used to stratify the population of interest. A good choice of the strata can yield significant improvement in the resulting estimator. However, if there are several possible ways to stratify the population, it might not be clear which is best. Here we assume that before the sample is taken a limited number of possible stratifications have been defined. We will propose an objective Bayesian approach that allows one to consider these several different possible stratifications simultaneously. Given the sample the posterior distribution will assign more weight to the good stratifications and less to the others. Empirical results suggest that the resulting estimator will typically be almost as good as the estimator based on the best stratification and better than the estimator which does not use stratification. It will also have a sensible estimate of precision.
</p>projecteuclid.org/euclid.ejs/1532678418_20181221221108Fri, 21 Dec 2018 22:11 ESTOn kernel methods for covariates that are rankingshttps://projecteuclid.org/euclid.ejs/1534233701<strong>Horia Mania</strong>, <strong>Aaditya Ramdas</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>, <strong>Benjamin Recht</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2537--2577.</p><p><strong>Abstract:</strong><br/>
Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way when numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. We characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that the Mallows kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.
</p>projecteuclid.org/euclid.ejs/1534233701_20181221221108Fri, 21 Dec 2018 22:11 ESTAssessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous datahttps://projecteuclid.org/euclid.ejs/1543568429<strong>Andreas Anastasiou</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3794--3828.</p><p><strong>Abstract:</strong><br/>
The asymptotic normality of the maximum likelihood estimator (MLE) under regularity conditions is a cornerstone of statistical theory. In this paper, we give explicit upper bounds on the distributional distance between the distribution of the MLE of a vector parameter, and the multivariate normal distribution. We work with possibly high-dimensional, independent but not necessarily identically distributed random vectors. In addition, we obtain upper bounds in cases where the MLE cannot be expressed analytically.
</p>projecteuclid.org/euclid.ejs/1543568429_20181221221108Fri, 21 Dec 2018 22:11 ESTGaussian process bandits with adaptive discretizationhttps://projecteuclid.org/euclid.ejs/1543892564<strong>Shubhanshu Shekhar</strong>, <strong>Tara Javidi</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3829--3874.</p><p><strong>Abstract:</strong><br/>
In this paper, the problem of maximizing a black-box function $f:\mathcal{X}\to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally, an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.
</p>projecteuclid.org/euclid.ejs/1543892564_20181221221108Fri, 21 Dec 2018 22:11 ESTGeneralized subsampling procedure for non-stationary time serieshttps://projecteuclid.org/euclid.ejs/1543979029<strong>Łukasz Lenart</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3875--3907.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a generalization of the subsampling procedure for non-stationary time series. The proposed generalization is simply related to the usual subsampling procedure. We formulate the sufficient conditions for the consistency of such a generalization. These sufficient conditions are a generalization of those presented for the usual subsampling procedure for non-stationary time series. Finally, we demonstrate the consistency of the generalized subsampling procedure for the Fourier coefficient in mean expansion of Almost Periodically Correlated time series.
</p>projecteuclid.org/euclid.ejs/1543979029_20181221221108Fri, 21 Dec 2018 22:11 ESTHeterogeneity adjustment with applications to graphical model inferencehttps://projecteuclid.org/euclid.ejs/1543979030<strong>Jianqing Fan</strong>, <strong>Han Liu</strong>, <strong>Weichen Wang</strong>, <strong>Ziwei Zhu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3908--3952.</p><p><strong>Abstract:</strong><br/>
Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for A daptive L ow-rank P rincipal H eterogeneity A djustment) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the ‘Bless of Dimensionality’. As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.
</p>projecteuclid.org/euclid.ejs/1543979030_20181221221108Fri, 21 Dec 2018 22:11 ESTEmpirical Bayes analysis of spike and slab posterior distributionshttps://projecteuclid.org/euclid.ejs/1544238109<strong>Ismaël Castillo</strong>, <strong>Romain Mismer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 3953--4001.</p><p><strong>Abstract:</strong><br/>
In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared–$L^{2}$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distributions. Possible choices include standard spike and slab with heavy tailed slab, and the spike and slab LASSO of Ročková and George with heavy tailed slab. Surprisingly, the popular Laplace slab is shown to lead to a suboptimal rate for the empirical Bayes posterior itself. This provides a striking example where convergence of aspects of the empirical Bayes posterior such as the posterior mean or median does not entail convergence of the complete empirical Bayes posterior itself.
</p>projecteuclid.org/euclid.ejs/1544238109_20181221221108Fri, 21 Dec 2018 22:11 ESTAnalysis of a mode clustering diagramhttps://projecteuclid.org/euclid.ejs/1545123625<strong>Isabella Verdinelli</strong>, <strong>Larry Wasserman</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4288--4312.</p><p><strong>Abstract:</strong><br/>
Mode-based clustering methods define clusters in terms of the modes of a density estimate. The most common mode-based method is mean shift clustering which defines clusters to be the basins of attraction of the modes. Specifically, the gradient of the density defines a flow which is estimated using a gradient ascent algorithm. Rodriguez and Laio (2014) introduced a new method that is faster and simpler than mean shift clustering. Furthermore, they define a clustering diagram that provides a simple, two-dimensional summary of the clustering information. We study the statistical properties of this diagram and we propose some improvements and extensions. In particular, we show a connection between the diagram and robust linear regression.
</p>projecteuclid.org/euclid.ejs/1545123625_20181221221108Fri, 21 Dec 2018 22:11 ESTBandwidth selection for kernel density estimators of multivariate level sets and highest density regionshttps://projecteuclid.org/euclid.ejs/1545123626<strong>Charles R. Doss</strong>, <strong>Guangwei Weng</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4313--4376.</p><p><strong>Abstract:</strong><br/>
We consider bandwidth matrix selection for kernel density estimators of density level sets in $\mathbb{R} ^{d}$, $d\ge 2$. We also consider estimation of highest density regions, which differs from estimating level sets in that one specifies the probability content of the set rather than specifying the level directly. This complicates the problem. Bandwidth selection for KDEs is well studied, but the goal of most methods is to minimize a global loss function for the density or its derivatives. The loss we consider here is instead the measure of the symmetric difference of the true set and estimated set. We derive an asymptotic approximation to the corresponding risk. The approximation depends on unknown quantities which can be estimated, and the approximation can then be minimized to yield a choice of bandwidth, which we show in simulations performs well. We provide an R package lsbs for implementing our procedure.
</p>projecteuclid.org/euclid.ejs/1545123626_20181221221108Fri, 21 Dec 2018 22:11 ESTPeriodic dynamic factor models: estimation approaches and applicationshttps://projecteuclid.org/euclid.ejs/1545123627<strong>Changryong Baek</strong>, <strong>Richard A. Davis</strong>, <strong>Vladas Pipiras</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4377--4411.</p><p><strong>Abstract:</strong><br/>
A periodic dynamic factor model (PDFM) is introduced as a dynamic factor modeling approach to multivariate time series data exhibiting cyclical behavior and, in particular, periodic dependence structure. In the PDFM, the loading matrices are allowed to depend on the “season” and the factors are assumed to follow a periodic vector autoregressive (PVAR) model. Estimation of the loading matrices and the underlying PVAR model is studied. A simulation study is presented to assess the performance of the introduced estimation procedures, and applications to several real data sets are provided.
</p>projecteuclid.org/euclid.ejs/1545123627_20181221221108Fri, 21 Dec 2018 22:11 ESTConvergence analysis of the block Gibbs sampler for Bayesian probit linear mixed models with improper priorshttps://projecteuclid.org/euclid.ejs/1545123629<strong>Xin Wang</strong>, <strong>Vivekananda Roy</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4412--4439.</p><p><strong>Abstract:</strong><br/>
In this article, we consider Markov chain Monte Carlo (MCMC) algorithms for exploring the intractable posterior density associated with Bayesian probit linear mixed models under improper priors on the regression coefficients and variance components. In particular, we construct a two-block Gibbs sampler using the data augmentation (DA) techniques. Furthermore, we prove geometric ergodicity of the Gibbs sampler, which is the foundation for building central limit theorems for MCMC based estimators and subsequent inferences. The conditions for geometric convergence are similar to those guaranteeing posterior propriety. We also provide conditions for the propriety of posterior distributions with a general link function when the design matrices take commonly observed forms. In general, the Haar parameter expansion for DA (PX-DA) algorithm is an improvement of the DA algorithm and it has been shown that it is theoretically at least as good as the DA algorithm. Here we construct a Haar PX-DA algorithm, which has essentially the same computational cost as the two-block Gibbs sampler.
</p>projecteuclid.org/euclid.ejs/1545123629_20181221221108Fri, 21 Dec 2018 22:11 ESTConsistent change-point detection with kernelshttps://projecteuclid.org/euclid.ejs/1545123630<strong>Damien Garreau</strong>, <strong>Sylvain Arlot</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4440--4486.</p><p><strong>Abstract:</strong><br/>
In this paper we study the kernel change-point algorithm (KCP) proposed by Arlot, Celisse and Harchaoui [5], which aims at locating an unknown number of change-points in the distribution of a sequence of independent data taking values in an arbitrary set. The change-points are selected by model selection with a penalized kernel empirical criterion. We provide a non-asymptotic result showing that, with high probability, the KCP procedure retrieves the correct number of change-points, provided that the constant in the penalty is well-chosen; in addition, KCP estimates the change-points location at the optimal rate. As a consequence, when using a characteristic kernel, KCP detects all kinds of change in the distribution (not only changes in the mean or the variance), and it is able to do so for complex structured data (not necessarily in $\mathbb{R}^{d}$). Most of the analysis is conducted assuming that the kernel is bounded; part of the results can be extended when we only assume a finite second-order moment. We also demonstrate KCP on both synthetic and real data.
</p>projecteuclid.org/euclid.ejs/1545123630_20181221221108Fri, 21 Dec 2018 22:11 ESTBayesian classification of multiclass functional datahttps://projecteuclid.org/euclid.ejs/1545448229<strong>Xiuqi Li</strong>, <strong>Subhashis Ghosal</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4669--4696.</p><p><strong>Abstract:</strong><br/>
We propose a Bayesian approach to estimating parameters in multiclass functional models. Unordered multinomial probit, ordered multinomial probit and multinomial logistic models are considered. We use finite random series priors based on a suitable basis such as B-splines in these three multinomial models, and classify the functional data using the Bayes rule. We average over models based on the marginal likelihood estimated from Markov Chain Monte Carlo (MCMC) output. Posterior contraction rates for the three multinomial models are computed. We also consider Bayesian linear and quadratic discriminant analyses on the multivariate data obtained by applying a functional principal component technique on the original functional data. A simulation study is conducted to compare these methods on different types of data. We also apply these methods to a phoneme dataset.
</p>projecteuclid.org/euclid.ejs/1545448229_20181221221108Fri, 21 Dec 2018 22:11 ESTEstimating a network from multiple noisy realizationshttps://projecteuclid.org/euclid.ejs/1545448230<strong>Can M. Le</strong>, <strong>Keith Levin</strong>, <strong>Elizaveta Levina</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 4697--4740.</p><p><strong>Abstract:</strong><br/>
Complex interactions between entities are often represented as edges in a network. In practice, the network is often constructed from noisy measurements and inevitably contains some errors. In this paper we consider the problem of estimating a network from multiple noisy observations where edges of the original network are recorded with both false positives and false negatives. This problem is motivated by neuroimaging applications where brain networks of a group of patients with a particular brain condition could be viewed as noisy versions of an unobserved true network corresponding to the disease. The key to optimally leveraging these multiple observations is to take advantage of network structure, and here we focus on the case where the true network contains communities. Communities are common in real networks in general and in particular are believed to be presented in brain networks. Under a community structure assumption on the truth, we derive an efficient method to estimate the noise levels and the original network, with theoretical guarantees on the convergence of our estimates. We show on synthetic networks that the performance of our method is close to an oracle method using the true parameter values, and apply our method to fMRI brain data, demonstrating that it constructs stable and plausible estimates of the population network.
</p>projecteuclid.org/euclid.ejs/1545448230_20181221221108Fri, 21 Dec 2018 22:11 ESTLinear regression with sparsely permuted datahttps://projecteuclid.org/euclid.ejs/1546570940<strong>Martin Slawski</strong>, <strong>Emanuel Ben-David</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 1--36.</p><p><strong>Abstract:</strong><br/>
In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of “permuted data” in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of “sparsely permuted data” in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.
</p>projecteuclid.org/euclid.ejs/1546570940_20190103220223Thu, 03 Jan 2019 22:02 ESTConvergence rates of latent topic models under relaxed identifiability conditionshttps://projecteuclid.org/euclid.ejs/1546570941<strong>Yining Wang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 37--66.</p><p><strong>Abstract:</strong><br/>
In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei, Ng and Jordan, 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein’s distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.
</p>projecteuclid.org/euclid.ejs/1546570941_20190103220223Thu, 03 Jan 2019 22:02 ESTGeneralised additive dependency inflated models including aggregated covariateshttps://projecteuclid.org/euclid.ejs/1546570942<strong>Young K. Lee</strong>, <strong>Enno Mammen</strong>, <strong>Jens P. Nielsen</strong>, <strong>Byeong U. Park</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 67--93.</p><p><strong>Abstract:</strong><br/>
Let us assume that $X$, $Y$ and $U$ are observed and that the conditional mean of $U$ given $X$ and $Y$ can be expressed via an additive dependency of $X$, $\lambda(X)Y$ and $X+Y$ for some unspecified function $\lambda$. This structured regression model can be transferred to a hazard model or a density model when applied on some appropriate grid, and has important forecasting applications via structured marker dependent hazards models or structured density models including age-period-cohort relationships. The structured regression model is also important when the severity of the dependent variable has a complicated dependency on waiting times $X$, $Y$ and the total waiting time $X+Y$. In case the conditional mean of $U$ approximates a density, the regression model can be used to analyse the age-period-cohort model, also when exposure data are not available. In case the conditional mean of $U$ approximates a marker dependent hazard, the regression model introduces new relevant age-period-cohort time scale interdependencies in understanding longevity. A direct use of the regression relationship introduced in this paper is the estimation of the severity of outstanding liabilities in non-life insurance companies. The technical approach taken is to use B-splines to capture the underlying one-dimensional unspecified functions. It is shown via finite sample simulation studies and an application for forecasting future asbestos related deaths in the UK that the B-spline approach works well in practice. Special consideration has been given to ensure identifiability of all models considered.
</p>projecteuclid.org/euclid.ejs/1546570942_20190103220223Thu, 03 Jan 2019 22:02 ESTExact adaptive confidence intervals for linear regression coefficientshttps://projecteuclid.org/euclid.ejs/1546570943<strong>Peter Hoff</strong>, <strong>Chaoyu Yu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 94--119.</p><p><strong>Abstract:</strong><br/>
We propose an adaptive confidence interval procedure (CIP) for the coefficients in the normal linear regression model. This procedure has a frequentist coverage rate that is constant as a function of the model parameters, yet provides smaller intervals than the usual interval procedure, on average across regression coefficients. The proposed procedure is obtained by defining a class of CIPs that all have exact $1-\alpha $ frequentist coverage, and then selecting from this class the procedure that minimizes a prior expected interval width. We describe an adaptive approach for estimating the prior distribution from the data, so that the potential risk of a poorly specified prior is reduced. The resulting adaptive confidence intervals maintain exact non-asymptotic $1-\alpha $ coverage if two conditions are met - that the design matrix is full rank (which will be known) and that the errors are normally distributed (which can be checked empirically). No assumptions on the unknown parameters are necessary to maintain exact coverage. Additionally, in a “$p$ growing with $n$” asymptotic scenario, this adaptive FAB procedure is asymptotically Bayes-optimal among $1-\alpha $ frequentist CIPs.
</p>projecteuclid.org/euclid.ejs/1546570943_20190103220223Thu, 03 Jan 2019 22:02 ESTAuxiliary information: the raking-ratio empirical processhttps://projecteuclid.org/euclid.ejs/1546570944<strong>Mickael Albertus</strong>, <strong>Philippe Berthet</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 120--165.</p><p><strong>Abstract:</strong><br/>
We study the empirical measure associated to a sample of size $n$ and modified by $N$ iterations of the raking-ratio method. This empirical measure is adjusted to match the true probability of sets in a finite partition which changes each step. We establish asymptotic properties of the raking-ratio empirical process indexed by functions as $n\rightarrow +\infty $, for $N$ fixed. We study nonasymptotic properties by using a Gaussian approximation which yields uniform Berry-Esseen type bounds depending on $n,N$ and provides estimates of the uniform quadratic risk reduction. A closed-form expression of the limiting covariance matrices is derived as $N\rightarrow +\infty $. In the two-way contingency table case the limiting process has a simple explicit formula.
</p>projecteuclid.org/euclid.ejs/1546570944_20190103220223Thu, 03 Jan 2019 22:02 ESTTrace class Markov chains for the Normal-Gamma Bayesian shrinkage modelhttps://projecteuclid.org/euclid.ejs/1547607848<strong>Liyuan Zhang</strong>, <strong>Kshitij Khare</strong>, <strong>Zeren Xing</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 166--207.</p><p><strong>Abstract:</strong><br/>
High-dimensional data, where the number of variables exceeds or is comparable to the sample size, is now pervasive in many scientific applications. In recent years, Bayesian shrinkage models have been developed as effective and computationally feasible tools to analyze such data, especially in the context of linear regression. In this paper, we focus on the Normal-Gamma shrinkage model developed by Griffin and Brown [7]. This model subsumes the popular Bayesian lasso model, and a three-block Gibbs sampling algorithm to sample from the resulting intractable posterior distribution has been developed in [7]. We consider an alternative two-block Gibbs sampling algorithm, and rigorously demonstrate its advantage over the three-block sampler by comparing specific spectral properties. In particular, we show that the Markov operator corresponding to the two-block sampler is trace class (and hence Hilbert-Schmidt), whereas the operator corresponding to the three-block sampler is not even Hilbert-Schmidt. The trace class property for the two-block sampler implies geometric convergence for the associated Markov chain, which justifies the use of Markov chain CLT’s to obtain practical error bounds for MCMC based estimates. Additionally, it facilitates theoretical comparisons of the two-block sampler with sandwich algorithms which aim to improve performance by inserting inexpensive extra steps in between the two conditional draws of the two-block sampler.
</p>projecteuclid.org/euclid.ejs/1547607848_20190115220419Tue, 15 Jan 2019 22:04 ESTDetection of sparse mixtures: higher criticism and scan statistichttps://projecteuclid.org/euclid.ejs/1547607852<strong>Ery Arias-Castro</strong>, <strong>Andrew Ying</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 208--230.</p><p><strong>Abstract:</strong><br/>
We consider the problem of detecting a sparse mixture as studied by Ingster (1997) and Donoho and Jin (2004). We consider a wide array of base distributions. In particular, we study the situation when the base distribution has polynomial tails, a situation that has not received much attention in the literature. Perhaps surprisingly, we find that in the context of such a power-law distribution, the higher criticism does not achieve the detection boundary. However, the scan statistic does.
</p>projecteuclid.org/euclid.ejs/1547607852_20190115220419Tue, 15 Jan 2019 22:04 ESTImportance sampling the union of rare events with an application to power systems analysishttps://projecteuclid.org/euclid.ejs/1548817590<strong>Art B. Owen</strong>, <strong>Yury Maximov</strong>, <strong>Michael Chertkov</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 231--254.</p><p><strong>Abstract:</strong><br/>
We consider importance sampling to estimate the probability $\mu$ of a union of $J$ rare events $H_{j}$ defined by a random variable $\boldsymbol{x}$. The sampler we study has been used in spatial statistics, genomics and combinatorics going back at least to Karp and Luby (1983). It works by sampling one event at random, then sampling $\boldsymbol{x}$ conditionally on that event happening and it constructs an unbiased estimate of $\mu$ by multiplying an inverse moment of the number of occuring events by the union bound. We prove some variance bounds for this sampler. For a sample size of $n$, it has a variance no larger than $\mu(\bar{\mu}-\mu)/n$ where $\bar{\mu}$ is the union bound. It also has a coefficient of variation no larger than $\sqrt{(J+J^{-1}-2)/(4n)}$ regardless of the overlap pattern among the $J$ events. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the $J$ rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by $5772$ constraints in $326$ dimensions, with probability below $10^{-22}$, are estimated with a coefficient of variation of about $0.0024$ with only $n=10{,}000$ sample values.
</p>projecteuclid.org/euclid.ejs/1548817590_20190129220635Tue, 29 Jan 2019 22:06 ESTEstimation of spectral functionals for Levy-driven continuous-time linear models with tapered datahttps://projecteuclid.org/euclid.ejs/1548817591<strong>Mamikon S. Ginovyan</strong>, <strong>Artur A. Sahakyan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 255--283.</p><p><strong>Abstract:</strong><br/>
The paper is concerned with the nonparametric statistical estimation of linear spectral functionals for Lévy-driven continuous-time stationary linear models with tapered data. As an estimator for unknown functional we consider the averaged tapered periodogram. We analyze the bias of the estimator and obtain sufficient conditions assuring the proper rate of convergence of the bias to zero, necessary for asymptotic normality of the estimator. We prove a a central limit theorem for a suitable normalized stochastic process generated by a tapered Toeplitz type quadratic functional of the model. As a consequence of these results we obtain the asymptotic normality of our estimator.
</p>projecteuclid.org/euclid.ejs/1548817591_20190129220635Tue, 29 Jan 2019 22:06 ESTFast Bayesian variable selection for high dimensional linear models: Marginal solo spike and slab priorshttps://projecteuclid.org/euclid.ejs/1549335678<strong>Su Chen</strong>, <strong>Stephen G. Walker</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 284--309.</p><p><strong>Abstract:</strong><br/>
This paper presents a method for fast Bayesian variable selection in the normal linear regression model with high dimensional data. A novel approach is adopted in which an explicit posterior probability for including a covariate is obtained. The method is sequential but not order dependent, one deals with each covariate one by one, and a spike and slab prior is only assigned to the coefficient under investigation. We adopt the well-known spike and slab Gaussian priors with a sample size dependent variance, which achieves strong selection consistency for marginal posterior probabilities even when the number of covariates grows almost exponentially with sample size. Numerical illustrations are presented where it is shown that the new approach provides essentially equivalent results to the standard spike and slab priors, i.e. the same marginal posterior probabilities of the coefficients being nonzero, which are estimated via Gibbs sampling. Hence, we obtain the same results via the direct calculation of $p$ probabilities, compared to a stochastic search over a space of $2^{p}$ elements. Our procedure only requires $p$ probabilities to be calculated, which can be done exactly, hence parallel computation when $p$ is large is feasible.
</p>projecteuclid.org/euclid.ejs/1549335678_20190204220140Mon, 04 Feb 2019 22:01 ESTWeak dependence and GMM estimation of supOU and mixed moving average processeshttps://projecteuclid.org/euclid.ejs/1549681240<strong>Imma Valentina Curato</strong>, <strong>Robert Stelzer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 310--360.</p><p><strong>Abstract:</strong><br/>
We consider a mixed moving average (MMA) process $X$ driven by a Lévy basis and prove that it is weakly dependent with rates computable in terms of the moving average kernel and the characteristic quadruple of the Lévy basis. Using this property, we show conditions ensuring that sample mean and autocovariances of $X$ have a limiting normal distribution. We extend these results to stochastic volatility models and then investigate a Generalized Method of Moments estimator for the supOU process and the supOU stochastic volatility model after choosing a suitable distribution for the mean reversion parameter. For these estimators, we analyze the asymptotic behavior in detail.
</p>projecteuclid.org/euclid.ejs/1549681240_20190208220053Fri, 08 Feb 2019 22:00 ESTOptimal designs for regression with spherical datahttps://projecteuclid.org/euclid.ejs/1549681241<strong>Holger Dette</strong>, <strong>Maria Konstantinou</strong>, <strong>Kirsten Schorning</strong>, <strong>Josua Gösmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 361--390.</p><p><strong>Abstract:</strong><br/>
In this paper optimal designs for regression problems with spherical predictors of arbitrary dimension are considered. Our work is motivated by applications in material sciences, where crystallographic textures such as the misorientation distribution or the grain boundary distribution (depending on a four dimensional spherical predictor) are represented by series of hyperspherical harmonics, which are estimated from experimental or simulated data.
For this type of estimation problems we explicitly determine optimal designs with respect to the $\Phi _{p}$-criteria introduced by Kiefer (1974) and a class of orthogonally invariant information criteria recently introduced in the literature. In particular, we show that the uniform distribution on the $m$-dimensional sphere is optimal and construct discrete and implementable designs with the same information matrices as the continuous optimal designs. Finally, we illustrate the advantages of the new designs for series estimation by hyperspherical harmonics, which are symmetric with respect to the first and second crystallographic point group.
</p>projecteuclid.org/euclid.ejs/1549681241_20190208220053Fri, 08 Feb 2019 22:00 ESTAdditive partially linear models for massive heterogeneous datahttps://projecteuclid.org/euclid.ejs/1549681242<strong>Binhuan Wang</strong>, <strong>Yixin Fang</strong>, <strong>Heng Lian</strong>, <strong>Hua Liang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 391--431.</p><p><strong>Abstract:</strong><br/>
We consider an additive partially linear framework for modelling massive heterogeneous data. The major goal is to extract multiple common features simultaneously across all sub-populations while exploring heterogeneity of each sub-population. We propose an aggregation type of estimators for the commonality parameters that possess the asymptotic optimal bounds and the asymptotic distributions as if there were no heterogeneity. This oracle result holds when the number of sub-populations does not grow too fast and the tuning parameters are selected carefully. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. Furthermore, we develop a heterogeneity test for the linear components and a homogeneity test for the non-linear components accordingly. The performance of the proposed methods is evaluated via simulation studies and an application to the Medicare Provider Utilization and Payment data.
</p>projecteuclid.org/euclid.ejs/1549681242_20190208220053Fri, 08 Feb 2019 22:00 ESTMonte Carlo modified profile likelihood in models for clustered datahttps://projecteuclid.org/euclid.ejs/1549962031<strong>Claudia Di Caterina</strong>, <strong>Giuliana Cortese</strong>, <strong>Nicola Sartori</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 432--464.</p><p><strong>Abstract:</strong><br/>
The main focus of the analysts who deal with clustered data is usually not on the clustering variables, and hence the group-specific parameters are treated as nuisance. If a fixed effects formulation is preferred and the total number of clusters is large relative to the single-group sizes, classical frequentist techniques relying on the profile likelihood are often misleading. The use of alternative tools, such as modifications to the profile likelihood or integrated likelihoods, for making accurate inference on a parameter of interest can be complicated by the presence of nonstandard modelling and/or sampling assumptions. We show here how to employ Monte Carlo simulation in order to approximate the modified profile likelihood in some of these unconventional frameworks. The proposed solution is widely applicable and is shown to retain the usual properties of the modified profile likelihood. The approach is examined in two instances particularly relevant in applications, i.e. missing-data models and survival models with unspecified censoring distribution. The effectiveness of the proposed solution is validated via simulation studies and two clinical trial applications.
</p>projecteuclid.org/euclid.ejs/1549962031_20190212040038Tue, 12 Feb 2019 04:00 ESTQuery-dependent ranking and its asymptotic propertieshttps://projecteuclid.org/euclid.ejs/1549962032<strong>Ben Dai</strong>, <strong>Junhui Wang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 465--488.</p><p><strong>Abstract:</strong><br/>
Ranking, also known as learning to rank in machine learning community, is to rank a number of items based on their relevance to a specific query. In literature, most ranking methods use a uniform ranking function to evaluate the relevance, which completely ignores the heterogeneity among queries. To admit different ranking functions for various queries, a general $U$-process formulation for query-dependent ranking is developed. It allows to incorporate neighborhood structure among queries via various forms of smoothing weights to improve the ranking performance. One of its salient features is its capability of producing reasonable rankings for novel queries that are absent in the training set, which is commonly encountered in practice but often neglected in the literature. The proposed method is implemented via an inexact alternating direction method of multipliers (ADMM) for each query parallelly. Its asymptotic risk bound is established, showing that it achieves desirable ranking accuracy at a fast rate for any query including the novel ones. Furthermore, simulated examples and a real application to the Yahoo! challenge dataset also support the advantage of the query-dependent ranking method against existing competitors.
</p>projecteuclid.org/euclid.ejs/1549962032_20190212040038Tue, 12 Feb 2019 04:00 ESTNon-marginal decisions: A novel Bayesian multiple testing procedurehttps://projecteuclid.org/euclid.ejs/1550134833<strong>Noirrit Kiran Chandra</strong>, <strong>Sourabh Bhattacharya</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 489--535.</p><p><strong>Abstract:</strong><br/>
In this paper, we consider the problem of multiple testing where the hypotheses are dependent. In most of the existing literature, either Bayesian or non-Bayesian, the decision rules mainly focus on the validity of the test procedure rather than actually utilizing the dependency to increase efficiency. Moreover, the decisions regarding different hypotheses are marginal in the sense that they do not depend upon each other directly. However, in realistic situations, the hypotheses are usually dependent, and hence it is desirable that the decisions regarding the dependent hypotheses are taken jointly.
In this article, we develop a novel Bayesian multiple testing procedure that coherently takes this requirement into consideration. Our method, which is based on new notions of error and non-error terms, substantially enhances efficiency by judicious exploitation of the dependence structure among the hypotheses. We show that our method minimizes the posterior expected loss associated with an additive “0-1” loss function; we also prove theoretical results on the relevant error probabilities, establishing the coherence and usefulness of our method. The optimal decision configuration is not available in closed form and we propose an efficient simulated annealing algorithm for the purpose of optimization, which is also generically applicable to binary optimization problems.
Extensive simulation studies indicate that in dependent situations, our method performs significantly better than some existing popular conventional multiple testing methods, in terms of accuracy and power control. Moreover, application of our ideas to a real, spatial data set associated with radionuclide concentration in Rongelap islands yielded insightful results.
</p>projecteuclid.org/euclid.ejs/1550134833_20190214040043Thu, 14 Feb 2019 04:00 ESTLipschitz-Killing curvatures of excursion sets for two-dimensional random fieldshttps://projecteuclid.org/euclid.ejs/1550134834<strong>Hermine Biermé</strong>, <strong>Elena Di Bernardino</strong>, <strong>Céline Duval</strong>, <strong>Anne Estrade</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 536--581.</p><p><strong>Abstract:</strong><br/>
In the present paper we study three geometrical characteristics for the excursion sets of a two-dimensional stationary isotropic random field. First, we show that these characteristics can be estimated without bias if the considered field satisfies a kinematic formula, this is for instance the case of fields given by a function of smooth Gaussian fields or of some shot noise fields. By using the proposed estimators of these geometric characteristics, we describe some inference procedures for the estimation of the parameters of the field. An extensive simulation study illustrates the performances of each estimator. Then, we use the Euler characteristic estimator to build a test to determine whether a given field is Gaussian or not, when compared to various alternatives. The test is based on a sparse information, i.e. , the excursion sets for two different levels of the field to be tested. Finally, the proposed test is adapted to an applied case, synthesized 2D digital mammograms.
</p>projecteuclid.org/euclid.ejs/1550134834_20190214040043Thu, 14 Feb 2019 04:00 ESTGeneralized M-estimators for high-dimensional Tobit I modelshttps://projecteuclid.org/euclid.ejs/1550286094<strong>Jelena Bradic</strong>, <strong>Jiaqi Guo</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 582--645.</p><p><strong>Abstract:</strong><br/>
This paper develops robust confidence intervals in high-dimensional and left-censored regression. Type-I censored regression models, where a competing event makes the variable of interest unobservable, are extremely common in practice. In this paper, we develop smoothed estimating equations that are adaptive to censoring level and are more robust to the misspecification of the error distribution. We propose a unified class of robust estimators, including one-step Mallow’s, Schweppe’s, and Hill-Ryan’s estimator that are adaptive to the left-censored observations. In the ultra-high-dimensional setting, where the dimensionality can grow exponentially with the sample size, we show that as long as the preliminary estimator converges faster than $n^{-1/4}$, the one-step estimators inherit asymptotic distribution of fully iterated version. Moreover, we show that the size of the residuals of the Bahadur representation matches those of the pure linear models – that is, the effects of censoring disappear asymptotically. Simulation studies demonstrate that our method is adaptive to the censoring level and asymmetry in the error distribution, and does not lose efficiency when the errors are from symmetric distributions.
</p>projecteuclid.org/euclid.ejs/1550286094_20190215220146Fri, 15 Feb 2019 22:01 ESTContraction and uniform convergence of isotonic regressionhttps://projecteuclid.org/euclid.ejs/1550286095<strong>Fan Yang</strong>, <strong>Rina Foygel Barber</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 646--677.</p><p><strong>Abstract:</strong><br/>
We consider the problem of isotonic regression, where the underlying signal $x$ is assumed to satisfy a monotonicity constraint, that is, $x$ lies in the cone $\{x\in \mathbb{R}^{n}:x_{1}\leq \dots\leq x_{n}\}$. We study the isotonic projection operator (projection to this cone), and find a necessary and sufficient condition characterizing all norms with respect to which this projection is contractive. This enables a simple and non-asymptotic analysis of the convergence properties of isotonic regression, yielding uniform confidence bands that adapt to the local Lipschitz properties of the signal.
</p>projecteuclid.org/euclid.ejs/1550286095_20190215220146Fri, 15 Feb 2019 22:01 ESTSpectral clustering in the dynamic stochastic block modelhttps://projecteuclid.org/euclid.ejs/1550286096<strong>Marianna Pensky</strong>, <strong>Teng Zhang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 678--709.</p><p><strong>Abstract:</strong><br/>
In the present paper, we have studied a Dynamic Stochastic Block Model (DSBM) under the assumptions that the connection probabilities, as functions of time, are smooth and that at most $s$ nodes can switch their class memberships between two consecutive time points. We estimate the edge probability tensor by a kernel-type procedure and extract the group memberships of the nodes by spectral clustering. The procedure is computationally viable, adaptive to the unknown smoothness of the functional connection probabilities, to the rate $s$ of membership switching, and to the unknown number of clusters. In addition, it is accompanied by non-asymptotic guarantees for the precision of estimation and clustering.
</p>projecteuclid.org/euclid.ejs/1550286096_20190215220146Fri, 15 Feb 2019 22:01 ESTIsotonic regression meets LASSOhttps://projecteuclid.org/euclid.ejs/1550632213<strong>Matey Neykov</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 13, Number 1, 710--746.</p><p><strong>Abstract:</strong><br/>
This paper studies a two step procedure for monotone increasing additive single index models with Gaussian designs. The proposed procedure is simple, easy to implement with existing software, and consists of consecutively applying LASSO and isotonic regression. Aside from formalizing this procedure, we provide theoretical guarantees regarding its performance: 1) we show that our procedure controls the in-sample squared error; 2) we demonstrate that one can use the procedure for predicting new observations, by showing that the absolute prediction error can be controlled with high-probability. Our bounds show a tradeoff of two rates: the minimax rate for estimating high dimensional quadratic loss, and the minimax nonparametric rate for estimating a monotone increasing function.
</p>projecteuclid.org/euclid.ejs/1550632213_20190219221028Tue, 19 Feb 2019 22:10 EST