Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTNormalizing constants of log-concave densitieshttps://projecteuclid.org/euclid.ejs/1520240451<strong>Nicolas Brosse</strong>, <strong>Alain Durmus</strong>, <strong>Éric Moulines</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 851--889.</p><p><strong>Abstract:</strong><br/>
We derive explicit bounds for the computation of normalizing constants $Z$ for log-concave densities $\pi =\mathrm{e}^{-U}/Z$ w.r.t. the Lebesgue measure on $\mathbb{R}^{d}$. Our approach relies on a Gaussian annealing combined with recent and precise bounds on the Unadjusted Langevin Algorithm [15]. Polynomial bounds in the dimension $d$ are obtained with an exponent that depends on the assumptions made on $U$. The algorithm also provides a theoretically grounded choice of the annealing sequence of variances. A numerical experiment supports our findings. Results of independent interest on the mean squared error of the empirical average of locally Lipschitz functions are established.
</p>projecteuclid.org/euclid.ejs/1520240451_20180613220156Wed, 13 Jun 2018 22:01 EDTEstimation of the asymptotic variance of univariate and multivariate random fields and statistical inferencehttps://projecteuclid.org/euclid.ejs/1520326826<strong>Annabel Prause</strong>, <strong>Ansgar Steland</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 890--940.</p><p><strong>Abstract:</strong><br/>
Correlated random fields are a common way to model dependence structures in high-dimensional data, especially for data collected in imaging. One important parameter characterizing the degree of dependence is the asymptotic variance which adds up all autocovariances in the temporal and spatial domain. Especially, it arises in the standardization of test statistics based on partial sums of random fields and thus the construction of tests requires its estimation. In this paper we propose consistent estimators for this parameter for strictly stationary $\varphi $-mixing random fields with arbitrary dimension of the domain and taking values in a Euclidean space of arbitrary dimension, thus allowing for multivariate random fields. We establish consistency, provide central limit theorems and show that distributional approximations of related test statistics based on sample autocovariances of random fields can be obtained by the subsampling approach.
As in applications the spatial-temporal correlations are often quite local, such that a large number of autocovariances vanish or are negligible, we also investigate a thresholding approach where sample autocovariances of small magnitude are omitted. Extensive simulation studies show that the proposed estimators work well in practice and, when used to standardize image test statistics, can provide highly accurate image testing procedures. Having in mind automatized applications on a big data scale as arising in data science problems, these examinations also cover the proposed data-adaptive procedures to select method parameters.
</p>projecteuclid.org/euclid.ejs/1520326826_20180613220156Wed, 13 Jun 2018 22:01 EDTLeast tail-trimmed absolute deviation estimation for autoregressions with infinite/finite variancehttps://projecteuclid.org/euclid.ejs/1520413266<strong>Rongning Wu</strong>, <strong>Yunwei Cui</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 941--959.</p><p><strong>Abstract:</strong><br/>
We propose least tail-trimmed absolute deviation estimation for autoregressive processes with infinite/finite variance. We explore the large sample properties of the resulting estimator and establish its asymptotic normality. Moreover, we study convergence rates of the estimator under different moment settings and show that it attains a super-$\sqrt{n}$ convergence rate when the innovation variance is infinite. Simulation studies are carried out to examine the finite-sample performance of the proposed method and that of relevant statistical inferences. A real example is also presented.
</p>projecteuclid.org/euclid.ejs/1520413266_20180613220156Wed, 13 Jun 2018 22:01 EDTSupervised dimensionality reduction via distance correlation maximizationhttps://projecteuclid.org/euclid.ejs/1520586206<strong>Praneeth Vepakomma</strong>, <strong>Chetan Tonde</strong>, <strong>Ahmed Elgammal</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 960--984.</p><p><strong>Abstract:</strong><br/>
In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, (Székely et al., 2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low-dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximization method of (Parizi et al., 2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.
</p>projecteuclid.org/euclid.ejs/1520586206_20180613220156Wed, 13 Jun 2018 22:01 EDTRidge regression for the functional concurrent modelhttps://projecteuclid.org/euclid.ejs/1521079461<strong>Tito Manrique</strong>, <strong>Christophe Crambes</strong>, <strong>Nadine Hilgert</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 985--1018.</p><p><strong>Abstract:</strong><br/>
The aim of this paper is to propose estimators of the unknown functional coefficients in the Functional Concurrent Model (FCM). We extend the Ridge Regression method developed in the classical linear case to the functional data framework. Two distinct penalized estimators are obtained: one with a constant regularization parameter and the other with a functional one. We prove the probability convergence of these estimators with rate. Then we study the practical choice of both regularization parameters. Additionally, we present some simulations that show the accuracy of these estimators despite a very low signal-to-noise ratio.
</p>projecteuclid.org/euclid.ejs/1521079461_20180613220156Wed, 13 Jun 2018 22:01 EDTHigh dimensional efficiency with applications to change point testshttps://projecteuclid.org/euclid.ejs/1528941678<strong>John A.D. Aston</strong>, <strong>Claudia Kirch</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1901--1947.</p><p><strong>Abstract:</strong><br/>
This paper rigourously introduces the asymptotic concept of high dimensional efficiency which quantifies the detection power of different statistics in high dimensional multivariate settings. It allows for comparisons of different high dimensional methods with different null asymptotics and even different asymptotic behavior such as extremal-type asymptotics. The concept will be used to understand the power behavior of different test statistics as the performance will greatly depend on the assumptions made, such as sparseness or denseness of the signal. The effect of misspecification of the covariance on the power of the tests is also investigated, because in many high dimensional situations estimation of the full dependency (covariance) between the multivariate observations in the panel is often either computationally or even theoretically infeasible. The theoretic quantification by the theory is accompanied by simulation results which confirm the theoretic (asymptotic) findings for surprisingly small samples. The development of this concept was motivated by, but is by no means limited to, high-dimensional change point tests. It is shown that the concept of high dimensional efficiency is indeed suitable to describe small sample power.
</p>projecteuclid.org/euclid.ejs/1528941678_20180613220156Wed, 13 Jun 2018 22:01 EDTFeasible invertibility conditions and maximum likelihood estimation for observation-driven modelshttps://projecteuclid.org/euclid.ejs/1521079462<strong>Francisco Blasques</strong>, <strong>Paolo Gorgi</strong>, <strong>Siem Jan Koopman</strong>, <strong>Olivier Wintenberger</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1019--1052.</p><p><strong>Abstract:</strong><br/>
Invertibility conditions for observation-driven time series models often fail to be guaranteed in empirical applications. As a result, the asymptotic theory of maximum likelihood and quasi-maximum likelihood estimators may be compromised. We derive considerably weaker conditions that can be used in practice to ensure the consistency of the maximum likelihood estimator for a wide class of observation-driven time series models. Our consistency results hold for both correctly specified and misspecified models. We also obtain an asymptotic test and confidence bounds for the unfeasible “true” invertibility region of the parameter space. The practical relevance of the theory is highlighted in a set of empirical examples. For instance, we derive the consistency of the maximum likelihood estimator of the Beta-$t$-GARCH model under weaker conditions than those considered in previous literature.
</p>projecteuclid.org/euclid.ejs/1521079462_20180618040214Mon, 18 Jun 2018 04:02 EDTExact post-selection inference for the generalized lasso pathhttps://projecteuclid.org/euclid.ejs/1521252212<strong>Sangwon Hyun</strong>, <strong>Max G’Sell</strong>, <strong>Ryan J. Tibshirani</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1053--1097.</p><p><strong>Abstract:</strong><br/>
We study tools for inference conditioned on model selection events that are defined by the generalized lasso regularization path. The generalized lasso estimate is given by the solution of a penalized least squares regression problem, where the penalty is the $\ell_{1}$ norm of a matrix $D$ times the coefficient vector. The generalized lasso path collects these estimates as the penalty parameter $\lambda$ varies (from $\infty$ down to 0). Leveraging a (sequential) characterization of this path from Tibshirani and Taylor [37], and recent advances in post-selection inference from Lee at al. [22], Tibshirani et al. [38], we develop exact hypothesis tests and confidence intervals for linear contrasts of the underlying mean vector, conditioned on any model selection event along the generalized lasso path (assuming Gaussian errors in the observations).
Our construction of inference tools holds for any penalty matrix $D$. By inspecting specific choices of $D$, we obtain post-selection tests and confidence intervals for specific cases of generalized lasso estimates, such as the fused lasso, trend filtering, and the graph fused lasso. In the fused lasso case, the underlying coordinates of the mean are assigned a linear ordering, and our framework allows us to test selectively chosen breakpoints or changepoints in these mean coordinates. This is an interesting and well-studied problem with broad applications; our framework applied to the trend filtering and graph fused lasso cases serves several applications as well. Aside from the development of selective inference tools, we describe several practical aspects of our methods such as (valid, i.e., fully-accounted-for) post-processing of generalized lasso estimates before performing inference in order to improve power, and problem-specific visualization aids that may be given to the data analyst for he/she to choose linear contrasts to be tested. Many examples, from both simulated and real data sources, are presented to examine the empirical properties of our inference methods.
</p>projecteuclid.org/euclid.ejs/1521252212_20180618040214Mon, 18 Jun 2018 04:02 EDTInference for heavy tailed stationary time series based on sliding blockshttps://projecteuclid.org/euclid.ejs/1522116040<strong>Axel Bücher</strong>, <strong>Johan Segers</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1098--1125.</p><p><strong>Abstract:</strong><br/>
The block maxima method in extreme value theory consists of fitting an extreme value distribution to a sample of block maxima extracted from a time series. Traditionally, the maxima are taken over disjoint blocks of observations. Alternatively, the blocks can be chosen to slide through the observation period, yielding a larger number of overlapping blocks. Inference based on sliding blocks is found to be more efficient than inference based on disjoint blocks. The asymptotic variance of the maximum likelihood estimator of the Fréchet shape parameter is reduced by more than 18%. Interestingly, the amount of the efficiency gain is the same whatever the serial dependence of the underlying time series: as for disjoint blocks, the asymptotic distribution depends on the serial dependence only through the sequence of scaling constants. The findings are illustrated by simulation experiments and are applied to the estimation of high return levels of the daily log-returns of the Standard & Poor’s 500 stock market index.
</p>projecteuclid.org/euclid.ejs/1522116040_20180618040214Mon, 18 Jun 2018 04:02 EDTA strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimationhttps://projecteuclid.org/euclid.ejs/1522116041<strong>Ramji Venkataramanan</strong>, <strong>Oliver Johnson</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1126--1149.</p><p><strong>Abstract:</strong><br/>
In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to do this involves the use of Fano’s inequality. However, recent work in an information-theoretic setting has shown that an argument based on binary hypothesis testing gives tighter converse results (error lower bounds) than Fano for channel coding problems. We adapt this technique to the statistical setting, and argue that Fano’s inequality can always be replaced by this approach to obtain tighter lower bounds that can be easily computed and are asymptotically sharp. We illustrate our technique in three applications: density estimation, active learning of a binary classifier, and compressed sensing, obtaining tighter risk lower bounds in each case.
</p>projecteuclid.org/euclid.ejs/1522116041_20180618040214Mon, 18 Jun 2018 04:02 EDTSupervised multiway factorizationhttps://projecteuclid.org/euclid.ejs/1522116042<strong>Eric F. Lock</strong>, <strong>Gen Li</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1150--1180.</p><p><strong>Abstract:</strong><br/>
We describe a probabilistic PARAFAC/CANDECOMP (CP) factorization for multiway (i.e., tensor) data that incorporates auxiliary covariates, SupCP . SupCP generalizes the supervised singular value decomposition (SupSVD) for vector-valued observations, to allow for observations that have the form of a matrix or higher-order array. Such data are increasingly encountered in biomedical research and other fields. We use a novel likelihood-based latent variable representation of the CP factorization, in which the latent variables are informed by additional covariates. We give conditions for identifiability, and develop an EM algorithm for simultaneous estimation of all model parameters. SupCP can be used for dimension reduction, capturing latent structures that are more accurate and interpretable due to covariate supervision. Moreover, SupCP specifies a full probability distribution for a multiway data observation with given covariate values, which can be used for predictive modeling. We conduct comprehensive simulations to evaluate the SupCP algorithm. We apply it to a facial image database with facial descriptors (e.g., smiling / not smiling) as covariates, and to a study of amino acid fluorescence. Software is available at https://github.com/lockEF/SupCP.
</p>projecteuclid.org/euclid.ejs/1522116042_20180618040214Mon, 18 Jun 2018 04:02 EDTAn MM algorithm for estimation of a two component semiparametric density mixture with a known componenthttps://projecteuclid.org/euclid.ejs/1522224150<strong>Zhou Shen</strong>, <strong>Michael Levine</strong>, <strong>Zuofeng Shang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1181--1209.</p><p><strong>Abstract:</strong><br/>
We consider a semiparametric mixture of two univariate density functions where one of them is known while the weight and the other function are unknown. We do not assume any additional structure on the unknown density function. For this mixture model, we derive a new sufficient identifiability condition and pinpoint a specific class of distributions describing the unknown component for which this condition is mostly satisfied. We also suggest a novel approach to estimation of this model that is based on an idea of applying a maximum smoothed likelihood to what would otherwise have been an ill-posed problem. We introduce an iterative MM (Majorization-Minimization) algorithm that estimates all of the model parameters. We establish that the algorithm possesses a descent property with respect to a log-likelihood objective functional and prove that the algorithm, indeed, converges. Finally, we also illustrate the performance of our algorithm in a simulation study and apply it to a real dataset.
</p>projecteuclid.org/euclid.ejs/1522224150_20180618040214Mon, 18 Jun 2018 04:02 EDTConvex and non-convex regularization methods for spatial point processes intensity estimationhttps://projecteuclid.org/euclid.ejs/1522288952<strong>Achmad Choiruddin</strong>, <strong>Jean-François Coeurjolly</strong>, <strong>Frédérique Letué</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1210--1255.</p><p><strong>Abstract:</strong><br/>
This paper deals with feature selection procedures for spatial point processes intensity estimation. We consider regularized versions of estimating equations based on Campbell theorem. In particular, we consider two classical functions: the Poisson likelihood and the logistic regression likelihood. We provide general conditions on the spatial point processes and on penalty functions which ensure oracle property, consistency, and asymptotic normality under the increasing domain setting. We discuss the numerical implementation and assess finite sample properties in simulation studies. Finally, an application to tropical forestry datasets illustrates the use of the proposed method.
</p>projecteuclid.org/euclid.ejs/1522288952_20180618040214Mon, 18 Jun 2018 04:02 EDTFast adaptive estimation of log-additive exponential models in Kullback-Leibler divergencehttps://projecteuclid.org/euclid.ejs/1522828871<strong>Cristina Butucea</strong>, <strong>Jean-François Delmas</strong>, <strong>Anne Dutfoy</strong>, <strong>Richard Fischer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1256--1298.</p><p><strong>Abstract:</strong><br/>
We study the problem of nonparametric estimation of probability density functions (pdf) with a product form on the domain $\triangle =\{(x_{1},\ldots ,x_{d})\in{\mathbb{R}} ^{d},0\leq x_{1}\leq \dots\leq x_{d}\leq 1\}$. Such pdf’s appear in the random truncation model as the joint pdf of the observations. They are also obtained as maximum entropy distributions of order statistics with given marginals. We propose an estimation method based on the approximation of the logarithm of the density by a carefully chosen family of basis functions. We show that the method achieves a fast convergence rate in probability with respect to the Kullback-Leibler divergence for pdf’s whose logarithm belong to a Sobolev function class with known regularity. In the case when the regularity is unknown, we propose an estimation procedure using convex aggregation of the log-densities to obtain adaptability. The performance of this method is illustrated in a simulation study.
</p>projecteuclid.org/euclid.ejs/1522828871_20180618040214Mon, 18 Jun 2018 04:02 EDTConditional kernel density estimation for some incomplete data modelshttps://projecteuclid.org/euclid.ejs/1524881058<strong>Ting Yan</strong>, <strong>Liangqiang Qu</strong>, <strong>Zhaohai Li</strong>, <strong>Ao Yuan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1299--1329.</p><p><strong>Abstract:</strong><br/>
A class of density estimators based on observed incomplete data are proposed. The method is to use a conditional kernel, defined as the expectation of a given kernel for the complete data conditioning on the observed data, to construct the density estimator. We study such kernel density estimators for several commonly used incomplete data models and establish their basic asymptotic properties. Some characteristics different from the classical kernel estimators are discovered. For instance, the asymptotic results of the proposed estimator do not depend on the choice of the kernel $k(\cdot )$. Simulation study is conducted to evaluate the performance of the estimator and compared with some exising methods.
</p>projecteuclid.org/euclid.ejs/1524881058_20180618040214Mon, 18 Jun 2018 04:02 EDTBayesian nonparametric estimation of survival functions with multiple-samples informationhttps://projecteuclid.org/euclid.ejs/1525334453<strong>Alan Riva Palacio</strong>, <strong>Fabrizio Leisen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1330--1357.</p><p><strong>Abstract:</strong><br/>
In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. They provide flexible models which are tractable from a computational and theoretical point of view. In this paper, we focus on their use for estimating multivariate survival functions. Our model extends the work of Epifani and Lijoi (2010) to an arbitrary dimension and allows to model the dependence among survival times of different groups of observations. Theoretical results about the posterior behaviour of the underlying dependent vector of completely random measures are provided. The performance of the model is tested on a simulated dataset arising from a distributional Clayton copula.
</p>projecteuclid.org/euclid.ejs/1525334453_20180618040214Mon, 18 Jun 2018 04:02 EDTBayesian inference for spectral projectors of the covariance matrixhttps://projecteuclid.org/euclid.ejs/1529308884<strong>Igor Silin</strong>, <strong>Vladimir Spokoiny</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1948--1987.</p><p><strong>Abstract:</strong><br/>
Let $X_{1},\ldots ,X_{n}$ be an i.i.d. sample in $\mathbb{R}^{p}$ with zero mean and the covariance matrix ${\boldsymbol{\varSigma }^{*}}$. The classical PCA approach recovers the projector $\boldsymbol{P}^{*}_{\mathcal{J}}$ onto the principal eigenspace of ${\boldsymbol{\varSigma }^{*}}$ by its empirical counterpart $\widehat{\boldsymbol{P}}_{\mathcal{J}}$. Recent paper [24] investigated the asymptotic distribution of the Frobenius distance between the projectors $\|\widehat{\boldsymbol{P}}_{\mathcal{J}}-\boldsymbol{P}^{*}_{\mathcal{J}}\|_{2}$, while [27] offered a bootstrap procedure to measure uncertainty in recovering this subspace $\boldsymbol{P}^{*}_{\mathcal{J}}$ even in a finite sample setup. The present paper considers this problem from a Bayesian perspective and suggests to use the credible sets of the pseudo-posterior distribution on the space of covariance matrices induced by the conjugated Inverse Wishart prior as sharp confidence sets. This yields a numerically efficient procedure. Moreover, we theoretically justify this method and derive finite sample bounds on the corresponding coverage probability. Contrary to [24, 27], the obtained results are valid for non-Gaussian data: the main assumption that we impose is the concentration of the sample covariance $\widehat{\boldsymbol{\varSigma }}$ in a vicinity of ${\boldsymbol{\varSigma }^{*}}$. Numerical simulations illustrate good performance of the proposed procedure even on non-Gaussian data in a rather challenging regime.
</p>projecteuclid.org/euclid.ejs/1529308884_20180618040214Mon, 18 Jun 2018 04:02 EDTSelection by partitioning the solution pathshttps://projecteuclid.org/euclid.ejs/1529308885<strong>Yang Liu</strong>, <strong>Peng Wang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1988--2017.</p><p><strong>Abstract:</strong><br/>
The performance of penalized likelihood approaches depends profoundly on the selection of the tuning parameter; however, there is no commonly agreed-upon criterion for choosing the tuning parameter. Moreover, penalized likelihood estimation based on a single value of the tuning parameter suffers from several drawbacks. This article introduces a novel approach for feature selection based on the entire solution paths rather than the choice of a single tuning parameter, which significantly improves the accuracy of the selection. Moreover, the approach allows for feature selection using ridge or other strictly convex penalties. The key idea is to classify variables as relevant or irrelevant at each tuning parameter and then to select all of the variables which have been classified as relevant at least once. We establish the theoretical properties of the method, which requires significantly weaker conditions than existing methods in the literature. We also illustrate the advantages of the proposed approach with simulation studies and a data example.
</p>projecteuclid.org/euclid.ejs/1529308885_20180618040214Mon, 18 Jun 2018 04:02 EDTCommon price and volatility jumps in noisy high-frequency datahttps://projecteuclid.org/euclid.ejs/1529308886<strong>Markus Bibinger</strong>, <strong>Lars Winkelmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 2018--2073.</p><p><strong>Abstract:</strong><br/>
We introduce a statistical test for simultaneous jumps in the price of a financial asset and its volatility process. The proposed test is based on high-frequency data and is robust to market microstructure frictions. For the test, local estimators of volatility jumps at price jump arrival times are designed using a nonparametric spectral estimator of the spot volatility process. A simulation study and an empirical example with NASDAQ order book data demonstrate the practicability of the proposed methods and highlight the important role played by price volatility co-jumps.
</p>projecteuclid.org/euclid.ejs/1529308886_20180618040214Mon, 18 Jun 2018 04:02 EDTChange detection via affine and quadratic detectorshttps://projecteuclid.org/euclid.ejs/1514970025<strong>Yang Cao</strong>, <strong>Arkadi Nemirovski</strong>, <strong>Yao Xie</strong>, <strong>Vincent Guigues</strong>, <strong>Anatoli Juditsky</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1--57.</p><p><strong>Abstract:</strong><br/>
The goal of the paper is to develop a specific application of the convex optimization based hypothesis testing techniques developed in A. Juditsky, A. Nemirovski, “Hypothesis testing via affine detectors,” Electronic Journal of Statistics 10 :2204–2242, 2016. Namely, we consider the Change Detection problem as follows: observing one by one noisy observations of outputs of a discrete-time linear dynamical system, we intend to decide, in a sequential fashion, on the null hypothesis that the input to the system is a nuisance, vs. the alternative that the input is a “nontrivial signal,” with both the nuisances and the nontrivial signals modeled as inputs belonging to finite unions of some given convex sets. Assuming the observation noises are zero mean sub-Gaussian, we develop “computation-friendly” sequential decision rules and demonstrate that in our context these rules are provably near-optimal.
</p>projecteuclid.org/euclid.ejs/1514970025_20180621040108Thu, 21 Jun 2018 04:01 EDTConfidence intervals for the means of the selected populationshttps://projecteuclid.org/euclid.ejs/1515142842<strong>Claudio Fuentes</strong>, <strong>George Casella</strong>, <strong>Martin T. Wells</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 58--79.</p><p><strong>Abstract:</strong><br/>
Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed with a common variance $\sigma^{2}$. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any $p$ and $k$, taking into account the selection procedure.
</p>projecteuclid.org/euclid.ejs/1515142842_20180621040108Thu, 21 Jun 2018 04:01 EDTUniformly valid confidence sets based on the Lassohttps://projecteuclid.org/euclid.ejs/1526284830<strong>Karl Ewald</strong>, <strong>Ulrike Schneider</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1358--1387.</p><p><strong>Abstract:</strong><br/>
In a linear regression model of fixed dimension $p\leq n$, we construct confidence regions for the unknown parameter vector based on the Lasso estimator that uniformly and exactly hold the prescribed in finite samples as well as in an asymptotic setup. We thereby quantify estimation uncertainty as well as the “post-model selection error” of this estimator. More concretely, in finite samples with Gaussian errors and asymptotically in the case where the Lasso estimator is tuned to perform conservative model selection, we derive exact formulas for computing the minimal coverage probability over the entire parameter space for a large class of shapes for the confidence sets, thus enabling the construction of valid confidence regions based on the Lasso estimator in these settings. The choice of shape for the confidence sets and comparison with the confidence ellipse based on the least-squares estimator is also discussed. Moreover, in the case where the Lasso estimator is tuned to enable consistent model selection, we give a simple confidence region with minimal coverage probability converging to one. Finally, we also treat the case of unknown error variance and present some ideas for extensions.
</p>projecteuclid.org/euclid.ejs/1526284830_20180621040108Thu, 21 Jun 2018 04:01 EDTA two stage $k$-monotone B-spline regression estimator: Uniform Lipschitz property and optimal convergence ratehttps://projecteuclid.org/euclid.ejs/1526544023<strong>Teresa M. Lebair</strong>, <strong>Jinglai Shen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1388--1428.</p><p><strong>Abstract:</strong><br/>
This paper considers $k$-monotone estimation and the related asymptotic performance analysis over a suitable Hölder class for general $k$. A novel two stage $k$-monotone B-spline estimator is proposed: in the first stage, an unconstrained estimator with optimal asymptotic performance is considered; in the second stage, a $k$-monotone B-spline estimator is constructed (roughly) by projecting the unconstrained estimator onto a cone of $k$-monotone splines. To study the asymptotic performance of the second stage estimator under the sup-norm and other risks, a critical uniform Lipschitz property for the $k$-monotone B-spline estimator is established under the $\ell_{\infty }$-norm. This property uniformly bounds the Lipschitz constants associated with the mapping from a (weighted) first stage input vector to the B-spline coefficients of the second stage $k$-monotone estimator, independent of the sample size and the number of knots. This result is then exploited to analyze the second stage estimator performance and develop convergence rates under the sup-norm, pointwise, and $L_{p}$-norm (with $p\in [1,\infty )$) risks. By employing recent results in $k$-monotone estimation minimax lower bound theory, we show that these convergence rates are optimal.
</p>projecteuclid.org/euclid.ejs/1526544023_20180621040108Thu, 21 Jun 2018 04:01 EDTHigh-dimensional robust precision matrix estimation: Cellwise corruption under $\epsilon $-contaminationhttps://projecteuclid.org/euclid.ejs/1526630484<strong>Po-Ling Loh</strong>, <strong>Xin Lu Tan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1429--1467.</p><p><strong>Abstract:</strong><br/>
We analyze the statistical consistency of robust estimators for precision matrices in high dimensions. We focus on a contamination mechanism acting cellwise on the data matrix. The estimators we analyze are formed by plugging appropriately chosen robust covariance matrix estimators into the graphical Lasso and CLIME. Such estimators were recently proposed in the robust statistics literature, but only analyzed mathematically from the point of view of the breakdown point. This paper provides complementary high-dimensional error bounds for the precision matrix estimators that reveal the interplay between the dimensionality of the problem and the degree of contamination permitted in the observed distribution. We also show that although the graphical Lasso and CLIME estimators perform equally well from the point of view of statistical consistency, the breakdown property of the graphical Lasso is superior to that of CLIME. We discuss implications of our work for problems involving graphical model estimation when the uncontaminated data follow a multivariate normal distribution, and the goal is to estimate the support of the population-level precision matrix. Our error bounds do not make any assumptions about the the contaminating distribution and allow for a nonvanishing fraction of cellwise contamination.
</p>projecteuclid.org/euclid.ejs/1526630484_20180621040108Thu, 21 Jun 2018 04:01 EDTDimension reduction-based significance testing in nonparametric regressionhttps://projecteuclid.org/euclid.ejs/1526695233<strong>Xuehu Zhu</strong>, <strong>Lixing Zhu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1468--1506.</p><p><strong>Abstract:</strong><br/>
A dimension reduction-based adaptive-to-model test is proposed for significance of a subset of covariates in the context of a nonparametric regression model. Unlike existing locally smoothing significance tests, the new test behaves like a locally smoothing test as if the number of covariates was just that under the null hypothesis and it can detect local alternative hypotheses distinct from the null hypothesis at the rate that is only related to the number of covariates under the null hypothesis. Thus, the curse of dimensionality is largely alleviated when nonparametric estimation is inevitably required. In the cases where there are many insignificant covariates, the improvement of the new test is very significant over existing locally smoothing tests on the significance level maintenance and power enhancement. Simulation studies and a real data analysis are conducted to examine the finite sample performance of the proposed test.
</p>projecteuclid.org/euclid.ejs/1526695233_20180621040108Thu, 21 Jun 2018 04:01 EDTSlice inverse regression with score functionshttps://projecteuclid.org/euclid.ejs/1526889626<strong>Dmitry Babichev</strong>, <strong>Francis Bach</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1507--1543.</p><p><strong>Abstract:</strong><br/>
We consider non-linear regression problems where we assume that the response depends non-linearly on a linear projection of the covariates. We propose score function extensions to sliced inverse regression problems, both for the first- order and second-order score functions. We show that they provably improve estimation in the population case over the non-sliced versions and we study finite sample estimators and their consistency given the exact score functions. We also propose to learn the score function as well, in two steps, i.e., first learning the score function and then learning the effective dimension reduction space, or directly, by solving a convex optimization problem regularized by the nuclear norm. We illustrate our results on a series of experiments.
</p>projecteuclid.org/euclid.ejs/1526889626_20180621040108Thu, 21 Jun 2018 04:01 EDTAn extended empirical saddlepoint approximation for intractable likelihoodshttps://projecteuclid.org/euclid.ejs/1527300140<strong>Matteo Fasiolo</strong>, <strong>Simon N. Wood</strong>, <strong>Florian Hartig</strong>, <strong>Mark V. Bravington</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1544--1578.</p><p><strong>Abstract:</strong><br/>
The challenges posed by complex stochastic models used in computational ecology, biology and genetics have stimulated the development of approximate approaches to statistical inference. Here we focus on Synthetic Likelihood (SL), a procedure that reduces the observed and simulated data to a set of summary statistics, and quantifies the discrepancy between them through a synthetic likelihood function. SL requires little tuning, but it relies on the approximate normality of the summary statistics. We relax this assumption by proposing a novel, more flexible, density estimator: the Extended Empirical Saddlepoint approximation. In addition to proving the consistency of SL, under either the new or the Gaussian density estimator, we illustrate the method using three examples. One of these is a complex individual-based forest model for which SL offers one of the few practical possibilities for statistical inference. The examples show that the new density estimator is able to capture large departures from normality, while being scalable to high dimensions, and this in turn leads to more accurate parameter estimates, relative to the Gaussian alternative. The new density estimator is implemented by the esaddle R package, which is freely available on the Comprehensive R Archive Network (CRAN).
</p>projecteuclid.org/euclid.ejs/1527300140_20180621040108Thu, 21 Jun 2018 04:01 EDTModified sequential change point procedures based on estimating functionshttps://projecteuclid.org/euclid.ejs/1527300141<strong>Claudia Kirch</strong>, <strong>Silke Weber</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1579--1613.</p><p><strong>Abstract:</strong><br/>
A large class of sequential change point tests are based on estimating functions where estimation is computationally efficient as (possibly numeric) optimization is restricted to an initial estimation. This includes examples as diverse as mean changes, linear or non-linear autoregressive and binary models. While the standard cumulative-sum-detector (CUSUM) has recently been considered in this general setup, we consider several modifications that have faster detection rates in particular if changes do occur late in the monitoring period. More presicely, we use three different types of detector statistics based on partial sums of a monitoring function, namely the modified moving-sum-statistic (mMOSUM), Page’s cumulative-sum-statistic (Page-CUSUM) and the standard moving-sum-statistic (MOSUM). The statistics only differ in the number of observations included in the partial sum. The mMOSUM uses a bandwidth parameter which multiplicatively scales the lower bound of the moving sum. The MOSUM uses a constant bandwidth parameter, while Page-CUSUM chooses the maximum over all possible lower bounds for the partial sums. So far, the first two schemes have only been studied in a linear model, the MOSUM only for a mean change.
We develop the asymptotics under the null hypothesis and alternatives under mild regularity conditions for each test statistic, which include the existing theory but also many new examples. In a simulation study we compare all four types of test procedures in terms of their size, power and run length. Additionally we illustrate their behavior by applications to exchange rate data as well as the Boston homicide data.
</p>projecteuclid.org/euclid.ejs/1527300141_20180621040108Thu, 21 Jun 2018 04:01 EDTOn penalized estimation for dynamical systems with small noisehttps://projecteuclid.org/euclid.ejs/1527300142<strong>Alessandro De Gregorio</strong>, <strong>Stefano Maria Iacus</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1614--1630.</p><p><strong>Abstract:</strong><br/>
We consider a dynamical system with small noise for which the drift is parametrized by a finite dimensional parameter. For this model, we consider minimum distance estimation from continuous time observations under $l^{p}$-penalty imposed on the parameters in the spirit of the Lasso approach, with the aim of simultaneous estimation and model selection. We study the consistency and the asymptotic distribution of these Lasso-type estimators for different values of $p$. For $p=1,$ we also consider the adaptive version of the Lasso estimator and establish its oracle properties.
</p>projecteuclid.org/euclid.ejs/1527300142_20180621040108Thu, 21 Jun 2018 04:01 EDTBayesian pairwise estimation under dependent informative samplinghttps://projecteuclid.org/euclid.ejs/1527300143<strong>Matthew R. Williams</strong>, <strong>Terrance D. Savitsky</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1631--1661.</p><p><strong>Abstract:</strong><br/>
An informative sampling design leads to the selection of units whose inclusion probabilities are correlated with the response variable of interest. Inference under the population model performed on the resulting observed sample, without adjustment, will be biased for the population generative model. One approach that produces asymptotically unbiased inference employs marginal inclusion probabilities to form sampling weights used to exponentiate each likelihood contribution of a pseudo likelihood used to form a pseudo posterior distribution. Conditions for posterior consistency restrict applicable sampling designs to those under which pairwise inclusion dependencies asymptotically limit to $0$. There are many sampling designs excluded by this restriction; for example, a multi-stage design that samples individuals within households. Viewing each household as a population, the dependence among individuals does not attenuate. We propose a more targeted approach in this paper for inference focused on pairs of individuals or sampled units; for example, the substance use of one spouse in a shared household, conditioned on the substance use of the other spouse. We formulate the pseudo likelihood with weights based on pairwise or second order probabilities and demonstrate consistency, removing the requirement for asymptotic independence and replacing it with restrictions on higher order selection probabilities. Our approach provides a nearly automated estimation procedure applicable to any model specified by the data analyst. We demonstrate our method on the National Survey on Drug Use and Health.
</p>projecteuclid.org/euclid.ejs/1527300143_20180621040108Thu, 21 Jun 2018 04:01 EDTHeritability estimation in case-control studieshttps://projecteuclid.org/euclid.ejs/1527559245<strong>Anna Bonnet</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1662--1716.</p><p><strong>Abstract:</strong><br/>
In the field of genetics, the concept of heritability refers to the proportion of variations of a biological trait or disease that can be explained by genetic factors. Quantifying the heritability of a disease is a fundamental challenge in human genetics, especially when the causes are plural and not clearly identified. Although the literature regarding heritability estimation for binary traits is less rich than for quantitative traits, several methods have been proposed to estimate the heritability of complex diseases. However, to the best of our knowledge, the existing methods are not supported by theoretical grounds. Moreover, most of the methodologies do not take into account a major specificity of the data coming from medical studies, which is the oversampling of the number of patients compared to controls. We propose in this paper to investigate the theoretical properties of the Phenotype Correlation Genotype Correlation (PCGC) regression developed by Golan, Lander and Rosset (2014), which is one of the major techniques used in statistical genetics and which is very efficient in practice, despite the oversampling of patients. Our main result is the proof of the consistency of this estimator, under several assumptions that we will state and discuss. We also provide a numerical study to compare two approximations leading to two heritability estimators.
</p>projecteuclid.org/euclid.ejs/1527559245_20180621040108Thu, 21 Jun 2018 04:01 EDTA deconvolution path for mixtureshttps://projecteuclid.org/euclid.ejs/1527559246<strong>Oscar-Hernan Madrid-Padilla</strong>, <strong>Nicholas G. Polson</strong>, <strong>James Scott</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 1717--1751.</p><p><strong>Abstract:</strong><br/>
We propose a class of estimators for deconvolution in mixture models based on a simple two-step “bin-and-smooth” procedure applied to histogram counts. The method is both statistically and computationally efficient: by exploiting recent advances in convex optimization, we are able to provide a full deconvolution path that shows the estimate for the mi-xing distribution across a range of plausible degrees of smoothness, at far less cost than a full Bayesian analysis. This enables practitioners to conduct a sensitivity analysis with minimal effort. This is especially important for applied data analysis, given the ill-posed nature of the deconvolution problem. Our results establish the favorable theoretical properties of our estimator and show that it offers state-of-the-art performance when compared to benchmark methods across a range of scenarios.
</p>projecteuclid.org/euclid.ejs/1527559246_20180621040108Thu, 21 Jun 2018 04:01 EDTHigh-dimensional inference for personalized treatment decisionhttps://projecteuclid.org/euclid.ejs/1529568040<strong>X. Jessie Jeng</strong>, <strong>Wenbin Lu</strong>, <strong>Huimin Peng</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 1, 2074--2089.</p><p><strong>Abstract:</strong><br/>
Recent development in statistical methodology for personalized treatment decision has utilized high-dimensional regression to take into account a large number of patients’ covariates and described personalized treatment decision through interactions between treatment and covariates. While a subset of interaction terms can be obtained by existing variable selection methods to indicate relevant covariates for making treatment decision, there often lacks statistical interpretation of the results. This paper proposes an asymptotically unbiased estimator based on Lasso solution for the interaction coefficients. We derive the limiting distribution of the estimator when baseline function of the regression model is unknown and possibly misspecified. Confidence intervals and p-values are derived to infer the effects of the patients’ covariates in making treatment decision. We confirm the accuracy of the proposed method and its robustness against misspecified function in simulation and apply the method to STAR∗D study for major depression disorder.
</p>projecteuclid.org/euclid.ejs/1529568040_20180621040108Thu, 21 Jun 2018 04:01 EDTMeasuring distributional asymmetry with Wasserstein distance and Rademacher symmetrizationhttps://projecteuclid.org/euclid.ejs/1531468822<strong>Adam B. Kashlak</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2091--2113.</p><p><strong>Abstract:</strong><br/>
We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the asymmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequality. Lastly, a wide range of applications are detailed including testing for data symmetry, constructing nonasymptotic high dimensional confidence sets, bounding the variance of an empirical process, and improving constants in Nemirovski style inequalities for Banach space valued random variables.
</p>projecteuclid.org/euclid.ejs/1531468822_20180713040028Fri, 13 Jul 2018 04:00 EDTPrincipal quantile regression for sufficient dimension reduction with heteroscedasticityhttps://projecteuclid.org/euclid.ejs/1531468823<strong>Chong Wang</strong>, <strong>Seung Jun Shin</strong>, <strong>Yichao Wu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2114--2140.</p><p><strong>Abstract:</strong><br/>
Sufficient dimension reduction (SDR) is a successful tool for reducing data dimensionality without stringent model assumptions. In practice, data often display heteroscedasticity which is of scientific importance in general but frequently overlooked since a primal goal of most existing statistical methods is to identify conditional mean relationship among variables. In this article, we propose a new SDR method called principal quantile regression (PQR) that efficiently tackles heteroscedasticity. PQR can naturally be extended to a nonlinear version via kernel trick. Asymptotic properties are established and an efficient solution path-based algorithm is provided. Numerical examples based on both simulated and real data demonstrate the PQR’s advantageous performance over existing SDR methods. PQR still performs very competitively even for the case without heteroscedasticity.
</p>projecteuclid.org/euclid.ejs/1531468823_20180713040028Fri, 13 Jul 2018 04:00 EDTFast learning rate of non-sparse multiple kernel learning and optimal regularization strategieshttps://projecteuclid.org/euclid.ejs/1531468825<strong>Taiji Suzuki</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2141--2192.</p><p><strong>Abstract:</strong><br/>
In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including $\ell_{p}$-MKL. According to the numerical experiments, it is known that the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy ($\ell_{1}$ or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, $\ell_{1}$-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse $\ell_{1}$-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.
</p>projecteuclid.org/euclid.ejs/1531468825_20180713040028Fri, 13 Jul 2018 04:00 EDTModel-free envelope dimension selectionhttps://projecteuclid.org/euclid.ejs/1531814505<strong>Xin Zhang</strong>, <strong>Qing Mai</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2193--2216.</p><p><strong>Abstract:</strong><br/>
An envelope is a targeted dimension reduction subspace for simultaneously achieving dimension reduction and improving parameter estimation efficiency. While many envelope methods have been proposed in recent years, all envelope methods hinge on the knowledge of a key hyperparameter, the structural dimension of the envelope. How to estimate the envelope dimension consistently is of substantial interest from both theoretical and practical aspects. Moreover, very recent advances in the literature have generalized envelope as a model-free method, which makes selecting the envelope dimension even more challenging. Likelihood-based approaches such as information criteria and likelihood-ratio tests either cannot be directly applied or have no theoretical justification. To address this critical issue of dimension selection, we propose two unified approaches – called FG and 1D selections – for determining the envelope dimension that can be applied to any envelope models and methods. The two model-free selection approaches are based on the two different envelope optimization procedures: the full Grassmannian (FG) optimization and the 1D algorithm [11], and are shown to be capable of correctly identifying the structural dimension with a probability tending to 1 under mild moment conditions as the sample size increases. While the FG selection unifies and generalizes the BIC and modified BIC approaches that existing in the literature, and hence provides the theoretical justification of them under weak moment condition and model-free context, the 1D selection is computationally more stable and efficient in finite sample. Extensive simulations and a real data analysis demonstrate the superb performance of our proposals.
</p>projecteuclid.org/euclid.ejs/1531814505_20180717040149Tue, 17 Jul 2018 04:01 EDTPrediction of dynamical time series using kernel based regression and smooth splineshttps://projecteuclid.org/euclid.ejs/1532333003<strong>Raymundo Navarrete</strong>, <strong>Divakar Viswanath</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2217--2237.</p><p><strong>Abstract:</strong><br/>
Prediction of dynamical time series with additive noise using support vector machines or kernel based regression is consistent for certain classes of discrete dynamical systems. Consistency implies that these methods are effective at computing the expected value of a point at a future time given the present coordinates. However, the present coordinates themselves are noisy, and therefore, these methods are not necessarily effective at removing noise. In this article, we consider denoising and prediction as separate problems for flows, as opposed to discrete time dynamical systems, and show that the use of smooth splines is more effective at removing noise. Combination of smooth splines and kernel based regression yields predictors that are more accurate on benchmarks typically by a factor of 2 or more. We prove that kernel based regression in combination with smooth splines converges to the exact predictor for time series extracted from any compact invariant set of any sufficiently smooth flow. As a consequence of convergence, one can find examples where the combination of kernel based regression with smooth splines is superior by even a factor of $100$. The predictors that we analyze and compute operate on delay coordinate data and not the full state vector, which is typically not observable.
</p>projecteuclid.org/euclid.ejs/1532333003_20180723040326Mon, 23 Jul 2018 04:03 EDTConfidence intervals for linear unbiased estimators under constrained dependencehttps://projecteuclid.org/euclid.ejs/1532333004<strong>Peter M. Aronow</strong>, <strong>Forrest W. Crawford</strong>, <strong>José R. Zubizarreta</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2238--2252.</p><p><strong>Abstract:</strong><br/>
We propose an approach for conducting inference for linear unbiased estimators applied to dependent outcomes given constraints on their independence relations, in the form of a dependency graph. We establish the consistency of an oracle variance estimator when a dependency graph is known, along with an associated central limit theorem. We derive an integer linear program for finding an upper bound for the estimated variance when a dependency graph is unknown, but topological or degree-based constraints are available on one such graph. We develop alternative bounds, including a closed-form bound, under an additional homoskedasticity assumption. We establish a basis for Wald-type confidence intervals that are guaranteed to have asymptotically conservative coverage.
</p>projecteuclid.org/euclid.ejs/1532333004_20180723040326Mon, 23 Jul 2018 04:03 EDTUpper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real linehttps://projecteuclid.org/euclid.ejs/1532333005<strong>Jérémie Bigot</strong>, <strong>Raúl Gouet</strong>, <strong>Thierry Klein</strong>, <strong>Alfredo López</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2253--2289.</p><p><strong>Abstract:</strong><br/>
This paper is focused on the statistical analysis of probability measures $\boldsymbol{\nu }_{1},\ldots ,\boldsymbol{\nu }_{n}$ on ${\mathbb{R}}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $\boldsymbol{\nu }_{i}$ are absolutely continuous with densities $\boldsymbol{f}_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1\leq i\leq n;\;1\leq j\leq p_{i}}$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots ,X_{i,p_{i}}$ are iid observations sampled from a random measure $\boldsymbol{\nu }_{i}$ for each $1\leq i\leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1\leq i\leq n}p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
</p>projecteuclid.org/euclid.ejs/1532333005_20180723040326Mon, 23 Jul 2018 04:03 EDTExchangeable trait allocationshttps://projecteuclid.org/euclid.ejs/1532484331<strong>Trevor Campbell</strong>, <strong>Diana Cai</strong>, <strong>Tamara Broderick</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2290--2322.</p><p><strong>Abstract:</strong><br/>
Trait allocations are a class of combinatorial structures in which data may belong to multiple groups and may have different levels of belonging in each group. Often the data are also exchangeable, i.e., their joint distribution is invariant to reordering. In clustering—a special case of trait allocation—exchangeability implies the existence of both a de Finetti representation and an exchangeable partition probability function (EPPF), distributional representations useful for computational and theoretical purposes. In this work, we develop the analogous de Finetti representation and exchangeable trait probability function (ETPF) for trait allocations, along with a characterization of all trait allocations with an ETPF. Unlike previous feature allocation characterizations, our proofs fully capture single-occurrence “dust” groups. We further introduce a novel constrained version of the ETPF that we use to establish an intuitive connection between the probability functions for clustering, feature allocations, and trait allocations. As an application of our general theory, we characterize the distribution of all edge-exchangeable graphs, a class of recently-developed models that captures realistic sparse graph sequences.
</p>projecteuclid.org/euclid.ejs/1532484331_20180724220542Tue, 24 Jul 2018 22:05 EDTNon-parametric estimation of time varying AR(1)–processes with local stationarity and periodicityhttps://projecteuclid.org/euclid.ejs/1532484332<strong>Jean-Marc Bardet</strong>, <strong>Paul Doukhan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2323--2354.</p><p><strong>Abstract:</strong><br/>
Extending the ideas of [7], this paper aims at providing a kernel based non-parametric estimation of a new class of time varying AR(1) processes $(X_{t})$, with local stationarity and periodic features (with a known period $T$), inducing the definition $X_{t}=a_{t}(t/nT)X_{t-1}+\xi_{t}$ for $t\in \mathbb{N}$ and with $a_{t+T}\equiv a_{t}$. Central limit theorems are established for kernel estimators $\widehat{a}_{s}(u)$ reaching classical minimax rates and only requiring low order moment conditions of the white noise $(\xi_{t})_{t}$ up to the second order.
</p>projecteuclid.org/euclid.ejs/1532484332_20180724220542Tue, 24 Jul 2018 22:05 EDTScalable methods for Bayesian selective inferencehttps://projecteuclid.org/euclid.ejs/1532484333<strong>Snigdha Panigrahi</strong>, <strong>Jonathan Taylor</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2355--2400.</p><p><strong>Abstract:</strong><br/>
Modeled along the truncated approach in [20], selection-adjusted inference in a Bayesian regime is based on a selective posterior . Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in the use of a truncated likelihood. The normalizer of the truncated law in the adjusted framework is the probability of the selection event; this typically lacks a closed form expression leading to the computational bottleneck in sampling from such a posterior. The current work provides an optimization problem that approximates the otherwise intractable selective posterior and leads to scalable methods that give valid post-selective Bayesian inference. The selection procedures are posed as data-queries that solve a randomized version of a convex learning program which have the advantage of preserving more left-over information for inference.
We propose a randomization scheme under which the approximating optimization has separable constraints that result in a partially separable objective in lower dimensions for many commonly used selective queries. We show that the proposed optimization gives a valid exponential rate of decay for the selection probability on a large deviation scale under a Gaussian randomization scheme. On the implementation side, we offer a primal-dual method to solve the optimization problem leading to an approximate posterior; this allows us to exploit the usual merits of a Bayesian machinery in both low and high dimensional regimes when the underlying signal is effectively sparse. We show that the adjusted estimates empirically demonstrate better frequentist properties in comparison to the unadjusted estimates based on the usual posterior, when applied to a wide range of constrained, convex data queries.
</p>projecteuclid.org/euclid.ejs/1532484333_20180724220542Tue, 24 Jul 2018 22:05 EDTAsymptotic minimum scoring rule predictionhttps://projecteuclid.org/euclid.ejs/1532484334<strong>Federica Giummolè</strong>, <strong>Valentina Mameli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2401--2429.</p><p><strong>Abstract:</strong><br/>
Most of the methods nowadays employed in forecast problems are based on scoring rules. There is a divergence function associated to each scoring rule, that can be used as a measure of discrepancy between probability distributions. This approach is commonly used in the literature for comparing two competing predictive distributions on the basis of their relative expected divergence from the true distribution.
In this paper we focus on the use of scoring rules as a tool for finding predictive distributions for an unknown of interest. The proposed predictive distributions are asymptotic modifications of the estimative solutions, obtained by minimizing the expected divergence related to a general scoring rule.
The asymptotic properties of such predictive distributions are strictly related to the geometry induced by the considered divergence on a regular parametric model. In particular, the existence of a global optimal predictive distribution is guaranteed for invariant divergences, whose local behaviour is similar to well known $\alpha $-divergences.
We show that a wide class of divergences obtained from weighted scoring rules share invariance properties with $\alpha $-divergences. For weighted scoring rules it is thus possible to obtain a global solution to the prediction problem. Unfortunately, the divergences associated to many widely used scoring rules are not invariant. Still for these cases we provide a locally optimal predictive distribution, within a specified parametric model.
</p>projecteuclid.org/euclid.ejs/1532484334_20180724220542Tue, 24 Jul 2018 22:05 EDTOn the role of the overall effect in exponential familieshttps://projecteuclid.org/euclid.ejs/1532484335<strong>Anna Klimova</strong>, <strong>Tamás Rudas</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2430--2453.</p><p><strong>Abstract:</strong><br/>
Exponential families of discrete probability distributions when the normalizing constant (or overall effect) is added or removed are compared in this paper. The latter setup, in which the exponential family is curved, is particularly relevant when the sample space is an incomplete Cartesian product or when it is very large, so that the computational burden is significant. The lack or presence of the overall effect has a fundamental impact on the properties of the exponential family. When the overall effect is added, the family becomes the smallest regular exponential family containing the curved one. The procedure is related to the homogenization of an inhomogeneous variety discussed in algebraic geometry, of which a statistical interpretation is given as an augmentation of the sample space. The changes in the kernel basis representation when the overall effect is included or removed are derived. The geometry of maximum likelihood estimates, also allowing zero observed frequencies, is described with and without the overall effect, and various algorithms are compared. The importance of the results is illustrated by an example from cell biology, showing that routinely including the overall effect leads to estimates which are not in the model intended by the researchers.
</p>projecteuclid.org/euclid.ejs/1532484335_20180724220542Tue, 24 Jul 2018 22:05 EDTA new design strategy for hypothesis testing under response adaptive randomizationhttps://projecteuclid.org/euclid.ejs/1532484336<strong>Alessandro Baldi Antognini</strong>, <strong>Alessandro Vagheggini</strong>, <strong>Maroussa Zagoraiou</strong>, <strong>Marco Novelli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2454--2481.</p><p><strong>Abstract:</strong><br/>
The aim of this paper is to provide a new design strategy for response adaptive randomization in the case of normal response trials aimed at testing the superiority of one of two available treatments. In particular, we introduce a new test statistic based on the treatment allocation proportion ensuing the adoption of a suitable response adaptive randomization rule that could be more efficient and uniformly more powerful with respect to the classical Wald test. We analyze the conditions under which the suggested strategy, derived by matching an asymptotically best response adaptive procedure and a suitably chosen target allocation, could induce a monotonically increasing power that discriminates with high precision the chosen alternatives. Moreover, we introduce and analyze new classes of targets aimed at maximizing the power of the new statistical test, showing both analytically and via simulations i) how the power function of the suggested test increases as the ethical skew of the chosen target grows, namely overcoming the usual trade-off between ethics and inference, and ii) the substantial gain of inferential precision ensured by the proposed approach.
</p>projecteuclid.org/euclid.ejs/1532484336_20180724220542Tue, 24 Jul 2018 22:05 EDTWasserstein and total variation distance between marginals of Lévy processeshttps://projecteuclid.org/euclid.ejs/1532657104<strong>Ester Mariucci</strong>, <strong>Markus Reiß</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2482--2514.</p><p><strong>Abstract:</strong><br/>
We present upper bounds for the Wasserstein distance of order $p$ between the marginals of Lévy processes, including Gaussian approximations for jumps of infinite activity. Using the convolution structure, we further derive upper bounds for the total variation distance between the marginals of Lévy processes. Connections to other metrics like Zolotarev and Toscani-Fourier distances are established. The theory is illustrated by concrete examples and an application to statistical lower bounds.
</p>projecteuclid.org/euclid.ejs/1532657104_20180726220515Thu, 26 Jul 2018 22:05 EDTA noninformative Bayesian approach for selecting a good post-stratificationhttps://projecteuclid.org/euclid.ejs/1532678418<strong>Patrick Zimmerman</strong>, <strong>Glen Meeden</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2515--2536.</p><p><strong>Abstract:</strong><br/>
In the standard design approach to survey sampling prior information is often used to stratify the population of interest. A good choice of the strata can yield significant improvement in the resulting estimator. However, if there are several possible ways to stratify the population, it might not be clear which is best. Here we assume that before the sample is taken a limited number of possible stratifications have been defined. We will propose an objective Bayesian approach that allows one to consider these several different possible stratifications simultaneously. Given the sample the posterior distribution will assign more weight to the good stratifications and less to the others. Empirical results suggest that the resulting estimator will typically be almost as good as the estimator based on the best stratification and better than the estimator which does not use stratification. It will also have a sensible estimate of precision.
</p>projecteuclid.org/euclid.ejs/1532678418_20180727040030Fri, 27 Jul 2018 04:00 EDTOn kernel methods for covariates that are rankingshttps://projecteuclid.org/euclid.ejs/1534233701<strong>Horia Mania</strong>, <strong>Aaditya Ramdas</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>, <strong>Benjamin Recht</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 12, Number 2, 2537--2577.</p><p><strong>Abstract:</strong><br/>
Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way when numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. We characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that the Mallows kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.
</p>projecteuclid.org/euclid.ejs/1534233701_20180814040157Tue, 14 Aug 2018 04:01 EDT