Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTSiAM: A hybrid of single index models and additive modelshttp://projecteuclid.org/euclid.ejs/1496044838<strong>Shujie Ma</strong>, <strong>Heng Lian</strong>, <strong>Hua Liang</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2397--2423.</p><p><strong>Abstract:</strong><br/>
While popular, single index models and additive models have potential limitations, a fact that leads us to propose SiAM, a novel hybrid combination of these two models. We first address model identifiability under general assumptions. The result is of independent interest. We then develop an estimation procedure by using splines to approximate unknown functions and establish the asymptotic properties of the resulting estimators. Furthermore, we suggest a two-step procedure for establishing confidence bands for the nonparametric additive functions. This procedure enables us to make global inferences. Numerical experiments indicate that SiAM works well with finite sample sizes, and are especially robust to model structures. That is, when the model reduces to either single-index or additive scenario, the estimation and inference results are comparable to those based on the true model, while when the model is misspecified, the superiority of our method can be very great.
</p>projecteuclid.org/euclid.ejs/1496044838_20170619220131Mon, 19 Jun 2017 22:01 EDTEstimation of mean form and mean form difference under elliptical lawshttp://projecteuclid.org/euclid.ejs/1496131237<strong>José A. Díaz-García</strong>, <strong>Francisco J. Caro-Lopera</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2424--2460.</p><p><strong>Abstract:</strong><br/>
The matrix variate elliptical generalization of [30] is presented in this work. The published Gaussian case is revised and modified. Then, new aspects of identifiability and consistent estimation of mean form and mean form difference are considered under elliptical laws. For example, instead of using the Euclidean distance matrix for the consistent estimates, exact formulae are derived for the moments of the matrix $\mathbf{B}=\mathbf{X}^{c}\left(\mathbf{X}^{c}\right)^{T}$; where $\mathbf{X}^{c}$ is the centered landmark matrix. Finally, a complete application in Biology is provided; it includes estimation, model selection and hypothesis testing.
</p>projecteuclid.org/euclid.ejs/1496131237_20170619220131Mon, 19 Jun 2017 22:01 EDTMultinomial and empirical likelihood under convex constraints: Directions of recession, Fenchel duality, the PP algorithmhttp://projecteuclid.org/euclid.ejs/1497924056<strong>Marian Grendár</strong>, <strong>Vladimír Špitalský</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2547--2612.</p><p><strong>Abstract:</strong><br/>
The primal problem of multinomial likelihood maximization restricted to a convex closed subset of the probability simplex is studied. A solution of this problem may assign a positive mass to an outcome with zero count. Such a solution cannot be obtained by the widely used, simplified Lagrange and Fenchel duals. Related flaws in the simplified dual problems, which arise because the recession directions are ignored, are identified and the correct Lagrange and Fenchel duals are developed.
The results permit us to specify linear sets and data such that the empirical likelihood-maximizing distribution exists and is the same as the multinomial likelihood-maximizing distribution. The multinomial likelihood ratio reaches, in general, a different conclusion than the empirical likelihood ratio.
Implications for minimum discrimination information, Lindsay geometry, compositional data analysis, bootstrap with auxiliary information, and Lagrange multiplier test, which explicitly or implicitly ignore information about the support, are discussed.
A solution of the primal problem can be obtained by the PP (perturbed primal) algorithm, that is, as the limit of a sequence of solutions of perturbed primal problems. The PP algorithm may be implemented by the simplified Lagrange or Fenchel dual.
</p>projecteuclid.org/euclid.ejs/1497924056_20170619220131Mon, 19 Jun 2017 22:01 EDTDiagonal and unscaled Wald-type tests in general factorial designshttp://projecteuclid.org/euclid.ejs/1497924057<strong>Łukasz Smaga</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2613--2646.</p><p><strong>Abstract:</strong><br/>
In this paper, the asymptotic and permutation testing procedures are developed in general factorial designs without assuming homoscedasticity or a particular error distribution. The one-way layout, crossed and hierarchically nested designs are contained in our general framework. New test statistics are modifications of Wald-type statistic, where a weight matrix is a certain diagonal matrix. Asymptotic properties of the new solutions are also investigated. In particular, the consistency of the tests under fixed alternatives or asymptotic validity of the permutation procedures are proved in many cases. Simulation studies show that, in the case of small sample sizes, some of the proposed methods perform comparably to or even better in certain situations than the Wald-type permutation test of Pauly et al. (2015). Illustrative real data examples of the use of the tests in practice are also given.
</p>projecteuclid.org/euclid.ejs/1497924057_20170619220131Mon, 19 Jun 2017 22:01 EDTEstimation of a discrete probability under constraint of $k$-monotonicityhttp://projecteuclid.org/euclid.ejs/1483585972<strong>Jade Giguelay</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1--49.</p><p><strong>Abstract:</strong><br/>
We propose two least-squares estimators of a discrete probability under the constraint of $k$-monotonicity and study their statistical properties. We give a characterization of these estimators based on the decomposition on a spline basis of $k$-monotone sequences. We develop an algorithm derived from the Support Reduction Algorithm and we finally present a simulation study to illustrate their properties.
</p>projecteuclid.org/euclid.ejs/1483585972_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of low rank density matrices by Pauli measurementshttp://projecteuclid.org/euclid.ejs/1483585973<strong>Dong Xia</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 50--77.</p><p><strong>Abstract:</strong><br/>
Density matrices are positively semi-definite Hermitian matrices with unit trace that describe the states of quantum systems. Many quantum systems of physical interest can be represented as high-dimensional low rank density matrices. A popular problem in quantum state tomography (QST) is to estimate the unknown low rank density matrix of a quantum system by conducting Pauli measurements. Our main contribution is twofold. First, we establish the minimax lower bounds in Schatten $p$-norms with $1\leq p\leq+\infty$ for low rank density matrices estimation by Pauli measurements. In our previous paper [14], these minimax lower bounds are proved under the trace regression model with Gaussian noise and the noise is assumed to have common variance. In this paper, we prove these bounds under the Binomial observation model which meets the actual model in QST.
Second, we study the Dantzig estimator (DE) for estimating the unknown low rank density matrix under the Binomial observation model by using Pauli measurements. In our previous papers [14] and [25], we studied the least squares estimator and the projection estimator, where we proved the optimal convergence rates for the least squares estimator in Schatten $p$-norms with $1\leq p\leq2$ and, under a stronger condition, the optimal convergence rates for the projection estimator in Schatten $p$-norms with $1\leq p\leq+\infty$. In this paper, we show that the results of these two distinct estimators can be simultaneously obtained by the Dantzig estimator. Moreover, better convergence rates in Schatten norm distances can be proved for Dantzig estimator under conditions weaker than those needed in [14] and [25]. When the objective function of DE is replaced by the negative von Neumann entropy, we obtain sharp convergence rate in Kullback-Leibler divergence.
</p>projecteuclid.org/euclid.ejs/1483585973_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of the global regularity of a multifractional Brownian motionhttp://projecteuclid.org/euclid.ejs/1484363215<strong>Joachim Lebovits</strong>, <strong>Mark Podolskij</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 78--98.</p><p><strong>Abstract:</strong><br/>
This paper presents a new estimator of the global regularity index of a multifractional Brownian motion. Our estimation method is based upon a ratio statistic, which compares the realized global quadratic variation of a multifractional Brownian motion at two different frequencies. We show that a logarithmic transformation of this statistic converges in probability to the minimum of the Hurst functional parameter, which is, under weak assumptions, identical to the global regularity index of the path.
</p>projecteuclid.org/euclid.ejs/1484363215_20170626220158Mon, 26 Jun 2017 22:01 EDTBootstrap for the second-order analysis of Poisson-sampled almost periodic processeshttp://projecteuclid.org/euclid.ejs/1485162022<strong>Dominique Dehay</strong>, <strong>Anna E. Dudek</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 99--147.</p><p><strong>Abstract:</strong><br/>
In this paper we consider a continuous almost periodically correlated process $\{X(t),t\in\mathbb{R}\}$ that is observed at the jump moments of a stationary Poisson point process $\{N(t),t\geq0\}$. The processes $\{X(t),t\in\mathbb{R}\}$ and $\{N(t),t\geq0\}$ are assumed to be independent. We define the kernel estimators of the Fourier coefficients of the autocovariance function of $X(t)$ and investigate their asymptotic properties. Moreover, we propose a bootstrap method that provides consistent pointwise and simultaneous confidence intervals for the considered coefficients. Finally, to illustrate our results we provide a simulated data example.
</p>projecteuclid.org/euclid.ejs/1485162022_20170626220158Mon, 26 Jun 2017 22:01 EDTParametric conditional variance estimation in location-scale models with censored datahttp://projecteuclid.org/euclid.ejs/1485939611<strong>Cédric Heuchenne</strong>, <strong>Géraldine Laurent</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 148--176.</p><p><strong>Abstract:</strong><br/>
Suppose the random vector $(X,Y)$ satisfies the regression model $Y=m(X)+\sigma (X)\varepsilon$, where $m(\cdot)=E(Y|\cdot),$ $\sigma^{2}(\cdot)=\mbox{Var}(Y|\cdot)$ belongs to some parametric class $\{\sigma _{\theta}(\cdot):\theta \in \Theta\}$ and $\varepsilon$ is independent of $X$. The response $Y$ is subject to random right censoring and the covariate $X$ is completely observed. A new estimation procedure is proposed for $\sigma _{\theta}(\cdot)$ when $m(\cdot)$ is unknown. It is based on nonlinear least squares estimation extended to conditional variance in the censored case. The consistency and asymptotic normality of the proposed estimator are established. The estimator is studied via simulations and an important application is devoted to fatigue life data analysis.
</p>projecteuclid.org/euclid.ejs/1485939611_20170626220158Mon, 26 Jun 2017 22:01 EDTConvergence properties of Gibbs samplers for Bayesian probit regression with proper priorshttp://projecteuclid.org/euclid.ejs/1485939612<strong>Saptarshi Chakraborty</strong>, <strong>Kshitij Khare</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 177--210.</p><p><strong>Abstract:</strong><br/>
The Bayesian probit regression model (Albert and Chib [1]) is popular and widely used for binary regression. While the improper flat prior for the regression coefficients is an appropriate choice in the absence of any prior information, a proper normal prior is desirable when prior information is available or in modern high dimensional settings where the number of coefficients ($p$) is greater than the sample size ($n$). For both choices of priors, the resulting posterior density is intractable and a Data Augmentation (DA) Markov chain is used to generate approximate samples from the posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for constructing standard errors for Markov chain based estimates of posterior quantities. In this paper, we first show that in case of proper normal priors, the DA Markov chain is geometrically ergodic for all choices of the design matrix $X$, $n$ and $p$ (unlike the improper prior case, where $n\geq p$ and another condition on $X$ are required for posterior propriety itself). We also derive sufficient conditions under which the DA Markov chain is trace-class, i.e., the eigenvalues of the corresponding operator are summable. In particular, this allows us to conclude that the Haar PX-DA sandwich algorithm (obtained by inserting an inexpensive extra step in between the two steps of the DA algorithm) is strictly better than the DA algorithm in an appropriate sense.
</p>projecteuclid.org/euclid.ejs/1485939612_20170626220158Mon, 26 Jun 2017 22:01 EDTCross-calibration of probabilistic forecastshttp://projecteuclid.org/euclid.ejs/1488531637<strong>Christof Strähl</strong>, <strong>Johanna Ziegel</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 608--639.</p><p><strong>Abstract:</strong><br/>
When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Often, there are several competing forecasters of different skill. We extend common notions of calibration where each forecaster is analyzed individually, to stronger notions of cross-calibration where each forecaster is analyzed with respect to the other forecasters. In particular, cross-calibration distinguishes forecasters with respect to increasing information sets. We provide diagnostic tools and statistical tests to assess cross-calibration. The methods are illustrated in simulation examples and applied to probabilistic forecasts for inflation rates by the Bank of England. Computer code and supplementary material (Strähl and Ziegel, 2017a,b) are available online.
</p>projecteuclid.org/euclid.ejs/1488531637_20170626220158Mon, 26 Jun 2017 22:01 EDTPrediction weighted maximum frequency selectionhttp://projecteuclid.org/euclid.ejs/1488531638<strong>Hongmei Liu</strong>, <strong>J. Sunil Rao</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 640--681.</p><p><strong>Abstract:</strong><br/>
Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today’s complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts. Estimation of penalty parameters still presents difficulties however. While variable selection consistent procedures have been developed, their finite sample performance can often be less than satisfactory. We develop a new strategy for variable selection using the adaptive LASSO and adaptive Elastic-Net estimators with $p_{n}$ diverging. The basic idea first involves using the trace paths of their LARS solutions to bootstrap estimates of maximum frequency (MF) models conditioned on dimension. Conditioning on dimension effectively mitigates overfitting, however to deal with underfitting, these MFs are then prediction-weighted, and it is shown that not only can consistent model selection be achieved, but that attractive convergence rates can as well, leading to excellent finite sample performance. Detailed numerical studies are carried out on both simulated and real datasets. Extensions to the class of generalized linear models are also detailed.
</p>projecteuclid.org/euclid.ejs/1488531638_20170626220158Mon, 26 Jun 2017 22:01 EDTAdaptive wavelet multivariate regression with errors in variableshttp://projecteuclid.org/euclid.ejs/1488964114<strong>Michaël Chichignoud</strong>, <strong>Van Ha Hoang</strong>, <strong>Thanh Mai Pham Ngoc</strong>, <strong>Vincent Rivoirard</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 682--724.</p><p><strong>Abstract:</strong><br/>
In the multidimensional setting, we consider the errors-in- variables model. We aim at estimating the unknown nonparametric multivariate regression function with errors in the covariates. We devise an adaptive estimators based on projection kernels on wavelets and a deconvolution operator. We propose an automatic and fully data driven procedure to select the wavelet level resolution. We obtain an oracle inequality and optimal rates of convergence over anisotropic Hölder classes. Our theoretical results are illustrated by some simulations.
</p>projecteuclid.org/euclid.ejs/1488964114_20170626220158Mon, 26 Jun 2017 22:01 EDTSome properties of the autoregressive-aided block bootstraphttp://projecteuclid.org/euclid.ejs/1488964115<strong>Tobias Niebuhr</strong>, <strong>Jens-Peter Kreiss</strong>, <strong>Efstathios Paparoditis</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 725--751.</p><p><strong>Abstract:</strong><br/>
We investigate properties of a hybrid bootstrap procedure for general, strictly stationary sequences, called the autoregressive-aided block bootstrap which combines a parametric autoregressive bootstrap with a nonparametric moving block bootstrap. The autoregressive-aided block bootstrap consists of two main steps, namely an autoregressive model fit and an ensuing (moving) block resampling of residuals. The linear parametric model-fit prewhitenes the time series so that the dependence structure of the remaining residuals gets closer to that of a white noise sequence, while the moving block bootstrap applied to these residuals captures nonlinear features that are not taken into account by the linear autoregressive fit. We establish validity of the autoregressive-aided block bootstrap for the important class of statistics known as generalized means which includes many commonly used statistics in time series analysis as special cases. Numerical investigations show that the hybrid bootstrap procedure considered in this paper performs quite well, it behaves as good as or it outperforms in many cases the ordinary moving block bootstrap and it is robust against mis-specifications of the autoregressive order, a substantial advantage over the autoregressive bootstrap.
</p>projecteuclid.org/euclid.ejs/1488964115_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimatorshttp://projecteuclid.org/euclid.ejs/1489201320<strong>Yuchen Zhang</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 752--799.</p><p><strong>Abstract:</strong><br/>
For the problem of high-dimensional sparse linear regression, it is known that an $\ell_{0}$-based estimator can achieve a $1/n$ “fast” rate for prediction error without any conditions on the design matrix, whereas in the absence of restrictive conditions on the design matrix, popular polynomial-time methods only guarantee the $1/\sqrt{n}$ “slow” rate. In this paper, we show that the slow rate is intrinsic to a broad class of M-estimators. In particular, for estimators based on minimizing a least-squares cost function together with a (possibly nonconvex) coordinate-wise separable regularizer, there is always a “bad” local optimum such that the associated prediction error is lower bounded by a constant multiple of $1/\sqrt{n}$. For convex regularizers, this lower bound applies to all global optima. The theory is applicable to many popular estimators, including convex $\ell_{1}$-based methods as well as M-estimators based on nonconvex regularizers, including the SCAD penalty or the MCP regularizer. In addition, we show that bad local optima are very common, in that a broad class of local minimization algorithms with random initialization typically converge to a bad solution.
</p>projecteuclid.org/euclid.ejs/1489201320_20170626220158Mon, 26 Jun 2017 22:01 EDTModel selection for the segmentation of multiparameter exponential family distributionshttp://projecteuclid.org/euclid.ejs/1490666425<strong>Alice Cleynen</strong>, <strong>Emilie Lebarbier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 800--842.</p><p><strong>Abstract:</strong><br/>
We consider the segmentation problem of univariate distributions from the exponential family with multiple parameters. In segmentation, the choice of the number of segments remains a difficult issue due to the discrete nature of the change-points. In this general exponential family distribution framework, we propose a penalized $\log$-likelihood estimator where the penalty is inspired by papers of L. Birgé and P. Massart. The resulting estimator is proved to satisfy some oracle inequalities. We then further study the particular case of categorical variables by comparing the values of the key constants when derived from the specification of our general approach and when obtained by working directly with the characteristics of this distribution. Finally, simulation studies are conducted to assess the performance of our criterion and to compare our approach to other existing methods, and an application on real data modeled using the categorical distribution is provided.
</p>projecteuclid.org/euclid.ejs/1490666425_20170626220158Mon, 26 Jun 2017 22:01 EDTA test of Gaussianity based on the Euler characteristic of excursion setshttp://projecteuclid.org/euclid.ejs/1490688316<strong>Elena Di Bernardino</strong>, <strong>Anne Estrade</strong>, <strong>José R. León</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 843--890.</p><p><strong>Abstract:</strong><br/>
In the present paper, we deal with a stationary isotropic random field $X:{\mathbb{R}}^{d}\to{\mathbb{R}}$ and we assume it is partially observed through some level functionals. We aim at providing a methodology for a test of Gaussianity based on this information. More precisely, the level functionals are given by the Euler characteristic of the excursion sets above a finite number of levels. On the one hand, we study the properties of these level functionals under the hypothesis that the random field $X$ is Gaussian. In particular, we focus on the mapping that associates to any level $u$ the expected Euler characteristic of the excursion set above level $u$. On the other hand, we study the same level functionals under alternative distributions of $X$, such as chi-square, harmonic oscillator and shot noise. In order to validate our methodology, a part of the work consists in numerical experimentations. We generate Monte-Carlo samples of Gaussian and non-Gaussian random fields and compare, from a statistical point of view, their level functionals. Goodness-of-fit $p-$values are displayed for both cases. Simulations are performed in one dimensional case ($d=1$) and in two dimensional case ($d=2$), using R.
</p>projecteuclid.org/euclid.ejs/1490688316_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimating a smooth function on a large graph by Bayesian Laplacian regularisationhttp://projecteuclid.org/euclid.ejs/1490688317<strong>Alisa Kirichenko</strong>, <strong>Harry van Zanten</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 891--915.</p><p><strong>Abstract:</strong><br/>
We study a Bayesian approach to estimating a smooth function in the context of regression or classification problems on large graphs. We derive theoretical results that show how asymptotically optimal Bayesian regularisation can be achieved under an asymptotic shape assumption on the underlying graph and a smoothness condition on the target function, both formulated in terms of the graph Laplacian. The priors we study are randomly scaled Gaussians with precision operators involving the Laplacian of the graph.
</p>projecteuclid.org/euclid.ejs/1490688317_20170626220158Mon, 26 Jun 2017 22:01 EDTAdaptive density estimation based on a mixture of Gammashttp://projecteuclid.org/euclid.ejs/1490688318<strong>Natalia Bochkina</strong>, <strong>Judith Rousseau</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 916--962.</p><p><strong>Abstract:</strong><br/>
We consider the problem of Bayesian density estimation on the positive semiline for possibly unbounded densities. We propose a hierarchical Bayesian estimator based on the gamma mixture prior which can be viewed as a location mixture. We study convergence rates of Bayesian density estimators based on such mixtures. We construct approximations of the local Hölder densities, and of their extension to unbounded densities, to be continuous mixtures of gamma distributions, leading to approximations of such densities by finite mixtures. These results are then used to derive posterior concentration rates, with priors based on these mixture models. The rates are minimax (up to a log n term) and since the priors are independent of the smoothness, the rates are adaptive to the smoothness.
One of the novel feature of the paper is that these results hold for densities with polynomial tails. Similar results are obtained using a hierarchical Bayesian model based on the mixture of inverse gamma densities which can be used to estimate adaptively densities with very heavy tails, including Cauchy density.
</p>projecteuclid.org/euclid.ejs/1490688318_20170626220158Mon, 26 Jun 2017 22:01 EDTA note on central limit theorems for quadratic variation in case of endogenous observation timeshttp://projecteuclid.org/euclid.ejs/1490860813<strong>Mathias Vetter</strong>, <strong>Tobias Zwingmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 963--980.</p><p><strong>Abstract:</strong><br/>
This paper is concerned with a central limit theorem for quadratic variation when observations come as exit times from a regular grid. We discuss the special case of a semimartingale with deterministic characteristics and finite activity jumps in detail and illustrate technical issues in more general situations.
</p>projecteuclid.org/euclid.ejs/1490860813_20170626220158Mon, 26 Jun 2017 22:01 EDTDensity estimation for $\tilde{\beta}$-dependent sequenceshttp://projecteuclid.org/euclid.ejs/1490860814<strong>Jérôme Dedecker</strong>, <strong>Florence Merlevède</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 981--1021.</p><p><strong>Abstract:</strong><br/>
We study the ${\mathbb{L}}^{p}$-integrated risk of some classical estimators of the density, when the observations are drawn from a strictly stationary sequence. The results apply to a large class of sequences, which can be non-mixing in the sense of Rosenblatt and long-range dependent. The main probabilistic tool is a new Rosenthal-type inequality for partial sums of $BV$ functions of the variables. As an application, we give the rates of convergence of regular Histograms, when estimating the invariant density of a class of expanding maps of the unit interval with a neutral fixed point at zero. These Histograms are plotted in the section devoted to the simulations.
</p>projecteuclid.org/euclid.ejs/1490860814_20170626220158Mon, 26 Jun 2017 22:01 EDTKernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operatorshttp://projecteuclid.org/euclid.ejs/1490860815<strong>Lee H. Dicker</strong>, <strong>Dean P. Foster</strong>, <strong>Daniel Hsu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1022--1047.</p><p><strong>Abstract:</strong><br/>
Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification ) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments.
</p>projecteuclid.org/euclid.ejs/1490860815_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of false discovery proportion in multiple testing: From normal to chi-squared test statisticshttp://projecteuclid.org/euclid.ejs/1490925658<strong>Lilun Du</strong>, <strong>Chunming Zhang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1048--1091.</p><p><strong>Abstract:</strong><br/>
Multiple testing based on chi-squared test statistics is common in many scientific fields such as genomics research and brain imaging studies. However, the challenges of designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. To address this gap, we first adopt a latent factor structure ([14]) to construct a testing framework for approximating the false discovery proportion ($\mathrm{FDP}$) for a large number of highly correlated chi-squared test statistics with a finite number of degrees of freedom $k$. The testing framework is then used to simultaneously test $k$ linear constraints in a large dimensional linear factor model with some observable and unobservable common factors; the result is a consistent estimator of the $\mathrm{FDP}$ based on the associated factor-adjusted $p$-values. The practical utility of the method is investigated through extensive simulation studies and an analysis of batch effects in a gene expression study.
</p>projecteuclid.org/euclid.ejs/1490925658_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimized recentered confidence spheres for the multivariate normal meanhttp://projecteuclid.org/euclid.ejs/1493258584<strong>Waruni Abeysekera</strong>, <strong>Paul Kabaila</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1798--1826.</p><p><strong>Abstract:</strong><br/>
Casella and Hwang, 1983, JASA , introduced a broad class of recentered confidence spheres for the mean $\boldsymbol{\theta}$ of a multivariate normal distribution with covariance matrix $\sigma^{2}\boldsymbol{I}$, for $\sigma^{2}$ known. Both the center and radius functions of these confidence spheres are flexible functions of the data. For the particular case of confidence spheres centered on the positive-part James-Stein estimator and with radius determined by empirical Bayes considerations, they show numerically that these confidence spheres have the desired minimum coverage probability $1-\alpha$ and dominate the usual confidence sphere in terms of scaled volume. We shift the focus from the scaled volume to the scaled expected volume of the recentered confidence sphere. Since both the coverage probability and the scaled expected volume are functions of the Euclidean norm of $\boldsymbol{\theta}$, it is feasible to optimize the performance of the recentered confidence sphere by numerically computing both the center and radius functions so as to optimize some clearly specified criterion. We suppose that we have uncertain prior information that $\boldsymbol{\theta}=\boldsymbol{0}$. This motivates us to determine the center and radius functions of the confidence sphere by numerical minimization of the scaled expected volume of the confidence sphere at $\boldsymbol{\theta}=\boldsymbol{0}$, subject to the constraints that (a) the coverage probability never falls below $1-\alpha$ and (b) the radius never exceeds the radius of the standard $1-\alpha$ confidence sphere. Our results show that, by focusing on this clearly specified criterion, significant gains in performance (in terms of this criterion) can be achieved. We also present analogous results for the much more difficult case that $\sigma^{2}$ is unknown.
</p>projecteuclid.org/euclid.ejs/1493258584_20170626220158Mon, 26 Jun 2017 22:01 EDTFinite sample bounds for expected number of false rejections under martingale dependence with applications to FDRhttp://projecteuclid.org/euclid.ejs/1493345170<strong>Julia Benditkis</strong>, <strong>Arnold Janssen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1827--1857.</p><p><strong>Abstract:</strong><br/>
Much effort has been made to improve the famous step up procedure of Benjamini and Hochberg given by linear critical values $\frac{i\alpha}{n}$. It is pointed out by Gavrilov, Benjamini and Sarkar that step down multiple testing procedures based on the critical values $\beta_{i}=\frac{i\alpha}{n+1-i(1-\alpha)}$ still control the false discovery rate (FDR) at the upper bound $\alpha$ under basic independence assumptions. Since that result is no longer true for step up procedures and for step down procedures, if the p-values are dependent, a big discussion about the corresponding FDR starts in the literature. The present paper establishes finite sample formulas and bounds for the FDR and the expected number of false rejections for multiple testing procedures using critical values $\beta_{i}$ under martingale and reverse martingale dependence models. It is pointed out that martingale methods are natural tools for the treatment of local FDR estimators which are closely connected to the present coefficients $\beta_{i}$. The martingale approach also yields new results and further inside for the special basic independence model.
</p>projecteuclid.org/euclid.ejs/1493345170_20170626220158Mon, 26 Jun 2017 22:01 EDTData-driven nonlinear expectations for statistical uncertainty in decisionshttp://projecteuclid.org/euclid.ejs/1493366413<strong>Samuel N. Cohen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1858--1889.</p><p><strong>Abstract:</strong><br/>
In stochastic decision problems, one often wants to estimate the underlying probability measure statistically, and then to use this estimate as a basis for decisions. We shall consider how the uncertainty in this estimation can be explicitly and consistently incorporated in the valuation of decisions, using the theory of nonlinear expectations.
</p>projecteuclid.org/euclid.ejs/1493366413_20170626220158Mon, 26 Jun 2017 22:01 EDTAn averaged projected Robbins-Monro algorithm for estimating the parameters of a truncated spherical distributionhttp://projecteuclid.org/euclid.ejs/1493776837<strong>Antoine Godichon-Baggioni</strong>, <strong>Bruno Portier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1890--1927.</p><p><strong>Abstract:</strong><br/>
The objective of this work is to propose a new algorithm to fit a sphere on a noisy 3D point cloud distributed around a complete or a truncated sphere. More precisely, we introduce a projected Robbins-Monro algorithm and its averaged version for estimating the center and the radius of the sphere. We give asymptotic results such as the almost sure convergence of these algorithms as well as the asymptotic normality of the averaged algorithm. Furthermore, some non-asymptotic results will be given, such as the rates of convergence in quadratic mean. Some numerical experiments show the efficiency of the proposed algorithm on simulated data for small to moderate sample sizes and for modeling an object in 3D.
</p>projecteuclid.org/euclid.ejs/1493776837_20170626220158Mon, 26 Jun 2017 22:01 EDTNonparametric distribution estimation in the presence of familial correlation and censoringhttp://projecteuclid.org/euclid.ejs/1493776838<strong>Kun Xu</strong>, <strong>Yanyuan Ma</strong>, <strong>Yuanjia Wang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1928--1948.</p><p><strong>Abstract:</strong><br/>
We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods.
</p>projecteuclid.org/euclid.ejs/1493776838_20170626220158Mon, 26 Jun 2017 22:01 EDTStatistical models for cores decomposition of an undirected random graphhttp://projecteuclid.org/euclid.ejs/1494900119<strong>Vishesh Karwa</strong>, <strong>Michael J. Pelsmajer</strong>, <strong>Sonja Petrović</strong>, <strong>Despina Stasi</strong>, <strong>Dane Wilburne</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1949--1982.</p><p><strong>Abstract:</strong><br/>
The $k$-core decomposition is a widely studied summary statistic that describes a graph’s global connectivity structure. In this paper, we move beyond using $k$-core decomposition as a tool to summarize a graph and propose using $k$-core decomposition as a tool to model random graphs. We propose using the shell distribution vector, a way of summarizing the decomposition, as a sufficient statistic for a family of exponential random graph models. We study the properties and behavior of the model family, implement a Markov chain Monte Carlo algorithm for simulating graphs from the model, implement a direct sampler from the set of graphs with a given shell distribution, and explore the sampling distributions of some of the commonly used complementary statistics as good candidates for heuristic model fitting. These algorithms provide first fundamental steps necessary for solving the following problems: parameter estimation in this ERGM, extending the model to its Bayesian relative, and developing a rigorous methodology for testing goodness of fit of the model and model selection. The methods are applied to a synthetic network as well as the well-known Sampson monks dataset.
</p>projecteuclid.org/euclid.ejs/1494900119_20170626220158Mon, 26 Jun 2017 22:01 EDTInference for a mean-reverting stochastic process with multiple change pointshttp://projecteuclid.org/euclid.ejs/1495504915<strong>Fuqi Chen</strong>, <strong>Rogemar Mamon</strong>, <strong>Matt Davison</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2199--2257.</p><p><strong>Abstract:</strong><br/>
The use of an Ornstein-Uhlenbeck (OU) process is ubiquitous in business, economics and finance to capture various price processes and evolution of economic indicators exhibiting mean-reverting properties. The time at which structural transition representing drastic changes in the economic dynamics occur are of particular interest to policy makers, investors and financial product providers. This paper addresses the change-point problem under a generalised OU model and investigates the associated statistical inference. We propose two estimation methods to locate multiple change points and show the asymptotic properties of the estimators. An informational approach is employed in detecting the change points, and the consistency of our methods is also theoretically demonstrated. Estimation is considered under the setting where both the number and location of change points are unknown. Three computing algorithms are further developed for implementation. The practical applicability of our methods is illustrated using simulated and observed financial market data.
</p>projecteuclid.org/euclid.ejs/1495504915_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimal exponential bounds for aggregation of estimators for the Kullback-Leibler losshttp://projecteuclid.org/euclid.ejs/1495504916<strong>Cristina Butucea</strong>, <strong>Jean-François Delmas</strong>, <strong>Anne Dutfoy</strong>, <strong>Richard Fischer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2258--2294.</p><p><strong>Abstract:</strong><br/>
We study the problem of aggregation of estimators with respect to the Kullback-Leibler divergence for various probabilistic models. Rather than considering a convex combination of the initial estimators $f_{1},\ldots,f_{N}$, our aggregation procedures rely on the convex combination of the logarithms of these functions. The first method is designed for probability density estimation as it gives an aggregate estimator that is also a proper density function, whereas the second method concerns spectral density estimation and has no such mass-conserving feature. We select the aggregation weights based on a penalized maximum likelihood criterion. We give sharp oracle inequalities that hold with high probability, with a remainder term that is decomposed into a bias and a variance part. We also show the optimality of the remainder terms by providing the corresponding lower bound results.
</p>projecteuclid.org/euclid.ejs/1495504916_20170626220158Mon, 26 Jun 2017 22:01 EDTChange-point tests under local alternatives for long-range dependent processeshttp://projecteuclid.org/euclid.ejs/1496217654<strong>Johannes Tewes</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2461--2498.</p><p><strong>Abstract:</strong><br/>
We consider the change-point problem for the marginal distribution of subordinated Gaussian processes that exhibit long-range dependence. The asymptotic distributions of Kolmogorov-Smirnov- and Cramér-von Mises type statistics are investigated under local alternatives. By doing so we are able to compute the asymptotic relative efficiency of the mentioned tests and the CUSUM test. In the special case of a mean-shift in Gaussian data it is always $1$. Moreover, our theory covers the scenario where the Hermite rank of the underlying process changes.
In a small simulation study, we show that the theoretical findings carry over to the finite sample performance of the tests.
</p>projecteuclid.org/euclid.ejs/1496217654_20170626220158Mon, 26 Jun 2017 22:01 EDTRecovering block-structured activations using compressive measurementshttp://projecteuclid.org/euclid.ejs/1498528883<strong>Sivaraman Balakrishnan</strong>, <strong>Mladen Kolar</strong>, <strong>Alessandro Rinaldo</strong>, <strong>Aarti Singh</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2647--2678.</p><p><strong>Abstract:</strong><br/>
We consider the problems of detection and support recovery of a contiguous block of weak activation in a large matrix, from noisy, possibly adaptively chosen, compressive (linear) measurements. We precisely characterize the tradeoffs between the various problem dimensions, the signal strength and the number of measurements required to reliably detect and recover the support of the signal, both for passive and adaptive measurement schemes. In each case, we complement algorithmic results with information-theoretic lower bounds. Analogous to the situation in the closely related problem of noisy compressed sensing, we show that for detection neither adaptivity, nor structure reduce the minimax signal strength requirement. On the other hand we show the rather surprising result that, contrary to the situation in noisy compressed sensing, the signal strength requirement to recover the support of a contiguous block-structured signal is strongly influenced by both the signal structure and the ability to choose measurements adaptively.
</p>projecteuclid.org/euclid.ejs/1498528883_20170626220158Mon, 26 Jun 2017 22:01 EDTPrediction by quantization of a conditional distributionhttp://projecteuclid.org/euclid.ejs/1498528884<strong>Jean-Michel Loubes</strong>, <strong>Bruno Pelletier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2679--2706.</p><p><strong>Abstract:</strong><br/>
Given a pair of random vectors $(X,Y)$, we consider the problem of approximating $Y$ by $\mathbf{c}(X)=\{\mathbf{c}_{1}(X),\dots ,\mathbf{c}_{M}(X)\}$ where $\mathbf{c}$ is a measurable set-valued function. We give meaning to the approximation by using the principles of vector quantization which leads to the definition of a multifunction regression problem. The formulated problem amounts at quantizing the conditional distributions of $Y$ given $X$. We propose a nonparametric estimate of the solutions of the multifunction regression problem by combining the method of $M$-means clustering with the nonparametric smoothing technique of $k$-nearest neighbors. We provide an asymptotic analysis of the estimate and we derive a convergence rate for the excess risk of the estimate. The proposed methodology is illustrated on simulated examples and on a speed-flow traffic data set emanating from the context of road traffic forecasting.
</p>projecteuclid.org/euclid.ejs/1498528884_20170626220158Mon, 26 Jun 2017 22:01 EDTAsymptotic properties of quasi-maximum likelihood estimators in observation-driven time series modelshttp://projecteuclid.org/euclid.ejs/1499133752<strong>Randal Douc</strong>, <strong>Konstantinos Fokianos</strong>, <strong>Eric Moulines</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2707--2740.</p><p><strong>Abstract:</strong><br/>
We study a general class of quasi-maximum likelihood estimators for observation-driven time series models. Our main focus is on models related to the exponential family of distributions like Poisson based models for count time series or duration models. However, the proposed approach is more general and covers a variety of time series models including the ordinary GARCH model which has been studied extensively in the literature. We provide general conditions under which quasi-maximum likelihood estimators can be analyzed for this class of time series models and we prove that these estimators are consistent and asymptotically normally distributed regardless of the true data generating process. We illustrate our results using classical examples of quasi-maximum likelihood estimation including standard GARCH models, duration models, Poisson type autoregressions and ARMA models with GARCH errors. Our contribution unifies the existing theory and gives conditions for proving consistency and asymptotic normality in a variety of situations.
</p>projecteuclid.org/euclid.ejs/1499133752_20170703220233Mon, 03 Jul 2017 22:02 EDTA Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimatorhttp://projecteuclid.org/euclid.ejs/1499133753<strong>Ayanendranath Basu</strong>, <strong>Abhik Ghosh</strong>, <strong>Abhijit Mandal</strong>, <strong>Nirian Martín</strong>, <strong>Leandro Pardo</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2741--2772.</p><p><strong>Abstract:</strong><br/>
In this paper a robust version of the classical Wald test statistics for linear hypothesis in the logistic regression model is introduced and its properties are explored. We study the problem under the assumption of random covariates although some ideas with non random covariates are also considered. A family of robust Wald type tests are considered here, where the minimum density power divergence estimator is used instead of the maximum likelihood estimator. We obtain the asymptotic distribution and also study the robustness properties of these Wald type test statistics. The robustness of the tests is investigated theoretically through the influence function analysis as well as suitable practical examples. It is theoretically established that the level as well as the power of the Wald-type tests are stable against contamination, while the classical Wald type test breaks down in this scenario. Some classical examples are presented which numerically substantiate the theory developed. Finally a simulation study is included to provide further confirmation of the validity of the theoretical results established in the paper.
</p>projecteuclid.org/euclid.ejs/1499133753_20170703220233Mon, 03 Jul 2017 22:02 EDTParametrically guided local quasi-likelihood with censored datahttp://projecteuclid.org/euclid.ejs/1499133754<strong>Majda Talamakrouni</strong>, <strong>Anouar El Ghouch</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2773--2799.</p><p><strong>Abstract:</strong><br/>
It is widely pointed out in the literature that misspecification of a parametric model can lead to inconsistent estimators and wrong inference. However, even a misspecified model can provide some valuable information about the phenomena under study. This is the main idea behind the development of an approach known, in the literature, as parametrically guided nonparametric estimation. Due to its promising bias reduction property, this approach has been investigated in different frameworks such as density estimation, least squares regression and local quasi-likelihood. Our contribution is concerned with parametrically guided local quasi-likelihood estimation adapted to randomly right censored data. The generalization to censored data involves synthetic data and local linear fitting. The asymptotic properties of the guided estimator as well as its finite sample performance are studied and compared with the unguided local quasi-likelihood estimator. The results confirm the bias reduction property and show that, using an appropriate guide and an appropriate bandwidth, the proposed estimator outperforms the classical local quasi-likelihood estimator.
</p>projecteuclid.org/euclid.ejs/1499133754_20170703220233Mon, 03 Jul 2017 22:02 EDTConverting high-dimensional regression to high-dimensional conditional density estimationhttp://projecteuclid.org/euclid.ejs/1499133755<strong>Rafael Izbicki</strong>, <strong>Ann B. Lee</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2800--2831.</p><p><strong>Abstract:</strong><br/>
There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode , a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.
</p>projecteuclid.org/euclid.ejs/1499133755_20170703220233Mon, 03 Jul 2017 22:02 EDTError bounds for the convex loss Lasso in linear modelshttp://projecteuclid.org/euclid.ejs/1502157624<strong>Mark Hannay</strong>, <strong>Pierre-Yves Deléamont</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2832--2875.</p><p><strong>Abstract:</strong><br/>
In this paper we investigate error bounds for convex loss functions for the Lasso in linear models, by first establishing a gap in the theory with respect to the existing error bounds. Then, under the compatibility condition, we recover bounds for the absolute value estimation error and the squared prediction error under mild conditions, which appear to be far more appropriate than the existing bounds for the convex loss Lasso. Interestingly, asymptotically the only difference between the new bounds of the convex loss Lasso and the classical Lasso is a term solely depending on a well-known expression in the robust statistics literature appearing multiplicatively in the bounds. We show that this result holds whether or not the scale parameter needs to be estimated jointly with the regression coefficients. Finally, we use the ratio to optimize our bounds in terms of minimaxity.
</p>projecteuclid.org/euclid.ejs/1502157624_20170807220039Mon, 07 Aug 2017 22:00 EDTKernel estimates of nonparametric functional autoregression models and their bootstrap approximationhttp://projecteuclid.org/euclid.ejs/1502157625<strong>Tingyi Zhu</strong>, <strong>Dimitris N. Politis</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2876--2906.</p><p><strong>Abstract:</strong><br/>
This paper considers a nonparametric functional autoregression model of order one. Existing contributions addressing the problem of functional time series prediction have focused on the linear model and literatures are rather lacking in the context of nonlinear functional time series. In our nonparametric setting, we define the functional version of kernel estimator for the autoregressive operator and develop its asymptotic theory under the assumption of a strong mixing condition on the sample. The results are general in the sense that high-order autoregression can be naturally written as a first-order AR model. In addition, a component-wise bootstrap procedure is proposed that can be used for estimating the distribution of the kernel estimation and its asymptotic validity is theoretically justified. The bootstrap procedure is implemented to construct prediction regions that achieve good coverage rate. A supporting simulation study is presented in the end to illustrate the theoretical advances in the paper.
</p>projecteuclid.org/euclid.ejs/1502157625_20170807220039Mon, 07 Aug 2017 22:00 EDTVariable selection for partially linear models via learning gradientshttp://projecteuclid.org/euclid.ejs/1502157627<strong>Lei Yang</strong>, <strong>Yixin Fang</strong>, <strong>Junhui Wang</strong>, <strong>Yongzhao Shao</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2907--2930.</p><p><strong>Abstract:</strong><br/>
Partially linear models (PLMs) are important generalizations of linear models and are very useful for analyzing high-dimensional data. Compared to linear models, the PLMs possess desirable flexibility of non-parametric regression models because they have both linear and non-linear components. Variable selection for PLMs plays an important role in practical applications and has been extensively studied with respect to the linear component. However, for the non-linear component, variable selection has been well developed only for PLMs with extra structural assumptions such as additive PLMs and generalized additive PLMs. There is currently an unmet need for variable selection methods applicable to general PLMs without structural assumptions on the non-linear component. In this paper, we propose a new variable selection method based on learning gradients for general PLMs without any assumption on the structure of the non-linear component. The proposed method utilizes the reproducing-kernel-Hilbert-space tool to learn the gradients and the group-lasso penalty to select variables. In addition, a block-coordinate descent algorithm is suggested and some theoretical properties are established including selection consistency and estimation consistency. The performance of the proposed method is further evaluated via simulation studies and illustrated using real data.
</p>projecteuclid.org/euclid.ejs/1502157627_20170807220039Mon, 07 Aug 2017 22:00 EDTCox Markov models for estimating single cell growthhttp://projecteuclid.org/euclid.ejs/1502416820<strong>Federico Bassetti</strong>, <strong>Ilenia Epifani</strong>, <strong>Lucia Ladelli</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2931--2977.</p><p><strong>Abstract:</strong><br/>
Recent experimental techniques produce thousands of data of single cell growth, consequently stochastic models of growth can be validated on true data and used to understand the main mechanisms that control the cell cycle. A sequence of growing cells is usually modeled by a suitable Markov chain. In this framework, the most interesting goal is to infer the distribution of the doubling time (or of the added size ) of a cell given its initial size and its elongation rate. In the literature, these distributions are described in terms of the corresponding conditional hazard function, referred as division hazard rate . In this work we propose a simple but effective way to estimate the division hazard by using extended Cox modeling . We investigate the convergence to the stationary distribution of the Markov chain describing the sequence of growing cells and we prove that, under reasonable conditions, the proposed estimators of the division hazard rates are asymptotically consistent. Finally, we apply our model to study some published datasets of E-Coli cells.
</p>projecteuclid.org/euclid.ejs/1502416820_20170810220030Thu, 10 Aug 2017 22:00 EDTMaximum likelihood estimation for a bivariate Gaussian process under fixed domain asymptoticshttp://projecteuclid.org/euclid.ejs/1502416821<strong>Daira Velandia</strong>, <strong>François Bachoc</strong>, <strong>Moreno Bevilacqua</strong>, <strong>Xavier Gendre</strong>, <strong>Jean-Michel Loubes</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2978--3007.</p><p><strong>Abstract:</strong><br/>
We consider maximum likelihood estimation with data from a bivariate Gaussian process with a separable exponential covariance model under fixed domain asymptotics. We first characterize the equivalence of Gaussian measures under this model. Then consistency and asymptotic normality for the maximum likelihood estimator of the microergodic parameters are established. A simulation study is presented in order to compare the finite sample behavior of the maximum likelihood estimator with the given asymptotic distribution.
</p>projecteuclid.org/euclid.ejs/1502416821_20170810220030Thu, 10 Aug 2017 22:00 EDTModel selection in semiparametric expectile regressionhttp://projecteuclid.org/euclid.ejs/1502416822<strong>Elmar Spiegel</strong>, <strong>Fabian Sobotka</strong>, <strong>Thomas Kneib</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3008--3038.</p><p><strong>Abstract:</strong><br/>
Ordinary least squares regression focuses on the expected response and strongly depends on the assumption of normally distributed errors for inferences. An approach to overcome these restrictions is expectile regression, where no distributional assumption is made but rather the whole distribution of the response is described in terms of covariates. This is similar to quantile regression, but expectiles provide a convenient generalization of the arithmetic mean while quantiles are a generalization of the median. To analyze more complex data structures where purely linear predictors are no longer sufficient, semiparametric regression methods have been introduced for both ordinary least squares and expectile regression. However, with increasing complexity of the data and the regression structure, the selection of the true covariates and their effects becomes even more important than in standard regression models. Therefore we introduce several approaches depending on selection criteria and shrinkage methods to perform model selection in semiparametric expectile regression. Moreover, we propose a joint approach for model selection based on several asymmetries simultaneously to deal with the special feature that expectile regression estimates the complete distribution of the response. Furthermore, to distinguish between linear and smooth predictors, we split nonlinear effects into the purely linear trend and the deviation from this trend. All selection methods are compared with the benchmark of functional gradient descent boosting in a simulation study and applied to determine the relevant covariates when studying childhood malnutrition in Peru.
</p>projecteuclid.org/euclid.ejs/1502416822_20170810220030Thu, 10 Aug 2017 22:00 EDTEstimator augmentation with applications in high-dimensional group inferencehttp://projecteuclid.org/euclid.ejs/1502697693<strong>Qing Zhou</strong>, <strong>Seunghyun Min</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3039--3080.</p><p><strong>Abstract:</strong><br/>
To make statistical inference about a group of parameters on high-dimensional data, we develop the method of estimator augmentation for the block lasso, which is defined via block norm regularization. By augmenting a block lasso estimator $\hat{\beta }$ with the subgradient $S$ of the block norm evaluated at $\hat{\beta }$, we derive a closed-form density for the joint distribution of $(\hat{\beta },S)$ under a high-dimensional setting. This allows us to draw from an estimated sampling distribution of $\hat{\beta }$, or more generally any function of $(\hat{\beta },S)$, by Monte Carlo algorithms. We demonstrate the application of estimator augmentation in group inference with the group lasso and a de-biased group lasso constructed as a function of $(\hat{\beta },S)$. Our numerical results show that importance sampling via estimator augmentation can be orders of magnitude more efficient than parametric bootstrap in estimating tail probabilities for significance tests. This work also brings new insights into the geometry of the sample space and the solution uniqueness of the block lasso. To broaden its application, we generalize our method to a scaled block lasso, which estimates the error variance simultaneously.
</p>projecteuclid.org/euclid.ejs/1502697693_20170814040150Mon, 14 Aug 2017 04:01 EDTPoincaré inequalities on intervals – application to sensitivity analysishttps://projecteuclid.org/euclid.ejs/1503626422<strong>Olivier Roustant</strong>, <strong>Franck Barthe</strong>, <strong>Bertrand Iooss</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3081--3119.</p><p><strong>Abstract:</strong><br/>
The development of global sensitivity analysis of numerical model outputs has recently raised new issues on 1-dimensional Poincaré inequalities. Typically two kinds of sensitivity indices are linked by a Poincaré type inequality, which provides upper bounds of the most interpretable index by using the other one, cheaper to compute. This allows performing a low-cost screening of unessential variables. The efficiency of this screening then highly depends on the accuracy of the upper bounds in Poincaré inequalities.
The novelty in the questions concern the wide range of probability distributions involved, which are often truncated on intervals. After providing an overview of the existing knowledge and techniques, we add some theory about Poincaré constants on intervals, with improvements for symmetric intervals. Then we exploit the spectral interpretation for computing exact value of Poincaré constants of any admissible distribution on a given interval. We give semi-analytical results for some frequent distributions (truncated exponential, triangular, truncated normal), and present a numerical method in the general case.
Finally, an application is made to a hydrological problem, showing the benefits of the new results in Poincaré inequalities to sensitivity analysis.
</p>projecteuclid.org/euclid.ejs/1503626422_20170824220036Thu, 24 Aug 2017 22:00 EDTSemiparametrically efficient estimation of constrained Euclidean parametershttps://projecteuclid.org/euclid.ejs/1503626423<strong>Nanang Susyanto</strong>, <strong>Chris A. J. Klaassen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3120--3140.</p><p><strong>Abstract:</strong><br/>
Consider a quite arbitrary (semi)parametric model for i.i.d. observations with a Euclidean parameter of interest and assume that an asymptotically (semi)parametrically efficient estimator of it is given. If the parameter of interest is known to lie on a general surface (image of a continuously differentiable vector valued function), we have a submodel in which this constrained Euclidean parameter may be rewritten in terms of a lower-dimensional Euclidean parameter of interest. An estimator of this underlying parameter is constructed based on the given estimator of the original Euclidean parameter, and it is shown to be (semi)parametrically efficient. It is proved that the efficient score function for the underlying parameter is determined by the efficient score function for the original parameter and the Jacobian of the function defining the general surface, via a chain rule for score functions. Efficient estimation of the constrained Euclidean parameter itself is considered as well.
Our general estimation method is applied to location-scale, Gaussian copula and semiparametric regression models, and to parametric models.
</p>projecteuclid.org/euclid.ejs/1503626423_20170824220036Thu, 24 Aug 2017 22:00 EDTEstimation of Kullback-Leibler losses for noisy recovery problems within the exponential familyhttps://projecteuclid.org/euclid.ejs/1503972028<strong>Charles-Alban Deledalle</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3141--3164.</p><p><strong>Abstract:</strong><br/>
We address the question of estimating Kullback-Leibler losses rather than squared losses in recovery problems where the noise is distributed within the exponential family. Inspired by Stein unbiased risk estimator (SURE), we exhibit conditions under which these losses can be unbiasedly estimated or estimated with a controlled bias. Simulations on parameter selection problems in applications to image denoising and variable selection with Gamma and Poisson noises illustrate the interest of Kullback-Leibler losses and the proposed estimators.
</p>projecteuclid.org/euclid.ejs/1503972028_20170828220053Mon, 28 Aug 2017 22:00 EDTAsymptotically minimax prediction in infinite sequence modelshttps://projecteuclid.org/euclid.ejs/1505116877<strong>Keisuke Yano</strong>, <strong>Fumiyasu Komaki</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 3165--3195.</p><p><strong>Abstract:</strong><br/>
We study asymptotically minimax predictive distributions in infinite sequence models. First, we discuss the connection between prediction in an infinite sequence model and prediction in a function model. Second, we construct an asymptotically minimax predictive distribution for the setting in which the parameter space is a known ellipsoid. We show that the Bayesian predictive distribution based on the Gaussian prior distribution is asymptotically minimax in the ellipsoid. Third, we construct an asymptotically minimax predictive distribution for any Sobolev ellipsoid. We show that the Bayesian predictive distribution based on the product of Stein’s priors is asymptotically minimax for any Sobolev ellipsoid. Finally, we present an efficient sampling method from the proposed Bayesian predictive distribution.
</p>projecteuclid.org/euclid.ejs/1505116877_20170911040129Mon, 11 Sep 2017 04:01 EDT