Electronic Journal of Statistics Articles (Project Euclid)
http://projecteuclid.org/euclid.ejs
The latest articles from Electronic Journal of Statistics on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTFri, 03 Jun 2011 09:20 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
The bias and skewness of M -estimators in regression
http://projecteuclid.org/euclid.ejs/1262876992
<strong>Christopher Withers</strong>, <strong>Saralees Nadarajah</strong><p><strong>Source: </strong>Electron. J. Statist., Volume 4, 1--14.</p><p><strong>Abstract:</strong><br/>
We consider M estimation of a regression model with a nuisance parameter and a vector of other parameters. The unknown distribution of the residuals is not assumed to be normal or symmetric. Simple and easily estimated formulas are given for the dominant terms of the bias and skewness of the parameter estimates. For the linear model these are proportional to the skewness of the ‘independent’ variables. For a nonlinear model, its linear component plays the role of these independent variables, and a second term must be added proportional to the covariance of its linear and quadratic components. For the least squares estimate with normal errors this term was derived by Box [1]. We also consider the effect of a large number of parameters, and the case of random independent variables.
</p>projecteuclid.org/euclid.ejs/1262876992_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTAsymptotic behavior of the Laplacian quasi-maximum likelihood estimator of affine causal processeshttp://projecteuclid.org/euclid.ejs/1488423804<strong>Jean-Marc Bardet</strong>, <strong>Yakoub Boularouk</strong>, <strong>Khedidja Djaballah</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 452--479.</p><p><strong>Abstract:</strong><br/>
We prove the consistency and asymptotic normality of the Laplacian Quasi-Maximum Likelihood Estimator (QMLE) for a general class of causal time series including ARMA, AR($\infty$), GARCH, ARCH($\infty$), ARMA-GARCH, APARCH, ARMA-APARCH,..., processes. We notably exhibit the advantages (moment order and robustness) of this estimator compared to the classical Gaussian QMLE. Numerical simulations confirms the accuracy of this estimator.
</p>projecteuclid.org/euclid.ejs/1488423804_20170619220131Mon, 19 Jun 2017 22:01 EDTEstimation and inference of error-prone covariate effect in the presence of confounding variableshttp://projecteuclid.org/euclid.ejs/1488423805<strong>Jianxuan Liu</strong>, <strong>Yanyuan Ma</strong>, <strong>Liping Zhu</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 480--501.</p><p><strong>Abstract:</strong><br/>
We introduce a general single index semiparametric measurement error model for the case that the main covariate of interest is measured with error and modeled parametrically, and where there are many other variables also important to the modeling. We propose a semiparametric bias-correction approach to estimate the effect of the covariate of interest. The resultant estimators are shown to be root-$n$ consistent, asymptotically normal and locally efficient. Comprehensive simulations and an analysis of an empirical data set are performed to demonstrate the finite sample performance and the bias reduction of the locally efficient estimators.
</p>projecteuclid.org/euclid.ejs/1488423805_20170619220131Mon, 19 Jun 2017 22:01 EDTA geometric approach to pairwise Bayesian alignment of functional data using importance samplinghttp://projecteuclid.org/euclid.ejs/1488423806<strong>Sebastian Kurtek</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 502--531.</p><p><strong>Abstract:</strong><br/>
We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of warping functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of warping functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a $k$-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We also show pointwise 95% credible intervals to assess the uncertainty of the alignment in different clusters. We validate this model using simulations and present multiple examples on real data from different application domains including biometrics and medicine.
</p>projecteuclid.org/euclid.ejs/1488423806_20170619220131Mon, 19 Jun 2017 22:01 EDTSupport vector regression for right censored datahttp://projecteuclid.org/euclid.ejs/1488423807<strong>Yair Goldberg</strong>, <strong>Michael R. Kosorok</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 532--569.</p><p><strong>Abstract:</strong><br/>
We develop a unified approach for classification and regression support vector machines for when the responses are subject to right censoring. We provide finite sample bounds on the generalization error of the algorithm, prove risk consistency for a wide class of probability measures, and study the associated learning rates. We apply the general methodology to estimation of the (truncated) mean, median, quantiles, and for classification problems. We present a simulation study that demonstrates the performance of the proposed approach.
</p>projecteuclid.org/euclid.ejs/1488423807_20170619220131Mon, 19 Jun 2017 22:01 EDTSequential quantiles via Hermite series density estimationhttp://projecteuclid.org/euclid.ejs/1488531636<strong>Michael Stephanou</strong>, <strong>Melvin Varughese</strong>, <strong>Iain Macdonald</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 570--607.</p><p><strong>Abstract:</strong><br/>
Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.
</p>projecteuclid.org/euclid.ejs/1488531636_20170619220131Mon, 19 Jun 2017 22:01 EDTTest of independence for high-dimensional random vectors based on freeness in block correlation matriceshttp://projecteuclid.org/euclid.ejs/1492588988<strong>Zhigang Bao</strong>, <strong>Jiang Hu</strong>, <strong>Guangming Pan</strong>, <strong>Wang Zhou</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1527--1548.</p><p><strong>Abstract:</strong><br/>
In this paper, we are concerned with the independence test for $k$ high-dimensional sub-vectors of a normal vector, with fixed positive integer $k$. A natural high-dimensional extension of the classical sample correlation matrix, namely block correlation matrix, is proposed for this purpose. We then construct the so-called Schott type statistic as our test statistic, which turns out to be a particular linear spectral statistic of the block correlation matrix. Interestingly, the limiting behavior of the Schott type statistic can be figured out with the aid of the Free Probability Theory and the Random Matrix Theory. Specifically, we will bring the so-called real second order freeness for Haar distributed orthogonal matrices, derived in Mingo and Popa (2013)[10], into the framework of this high-dimensional testing problem. Our test does not require the sample size to be larger than the total or any partial sum of the dimensions of the $k$ sub-vectors. Simulated results show the effect of the Schott type statistic, in contrast to those statistics proposed in Jiang and Yang (2013)[8] and Jiang, Bai and Zheng (2013)[7], is satisfactory. Real data analysis is also used to illustrate our method.
</p>projecteuclid.org/euclid.ejs/1492588988_20170619220131Mon, 19 Jun 2017 22:01 EDTAnalysis of asynchronous longitudinal data with partially linear modelshttp://projecteuclid.org/euclid.ejs/1492740038<strong>Li Chen</strong>, <strong>Hongyuan Cao</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1549--1569.</p><p><strong>Abstract:</strong><br/>
We study partially linear models for asynchronous longitudinal data to incorporate nonlinear time trend effects. Local and global estimating equations are developed for estimating the parametric and nonparametric effects. We show that with a proper choice of the kernel bandwidth parameter, one can obtain consistent and asymptotically normal parameter estimates for the linear effects. Asymptotic properties of the estimated nonlinear effects are established. Extensive simulation studies provide numerical support for the theoretical findings. Data from an HIV study are used to illustrate our methodology.
</p>projecteuclid.org/euclid.ejs/1492740038_20170619220131Mon, 19 Jun 2017 22:01 EDTEfficient block boundaries estimation in block-wise constant matrices: An application to HiC datahttp://projecteuclid.org/euclid.ejs/1492826487<strong>Vincent Brault</strong>, <strong>Julien Chiquet</strong>, <strong>Céline Lévy-Leduc</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1570--1599.</p><p><strong>Abstract:</strong><br/>
In this paper, we propose a novel modeling and a new methodology for estimating the location of block boundaries in a random matrix consisting of a block-wise constant matrix corrupted with white noise. Our method consists in rewriting this problem as a variable selection issue. A penalized least-squares criterion with an $\ell_{1}$-type penalty is used for dealing with this problem. Firstly, some theoretical results ensuring the consistency of our block boundaries estimators are provided. Secondly, we explain how to implement our approach in a very efficient way. This implementation is available in the R package blockseg which can be found in the Comprehensive R Archive Network. Thirdly, we provide some numerical experiments to illustrate the statistical and numerical performance of our package, as well as a thorough comparison with existing methods. Fourthly, an empirical procedure is proposed for estimating the number of blocks. Finally, our approach is applied to HiC data which are used in molecular biology for better understanding the influence of the chromosomal conformation on the cells functioning.
</p>projecteuclid.org/euclid.ejs/1492826487_20170619220131Mon, 19 Jun 2017 22:01 EDTDetecting long-range dependence in non-stationary time serieshttp://projecteuclid.org/euclid.ejs/1493020822<strong>Holger Dette</strong>, <strong>Philip Preuss</strong>, <strong>Kemal Sen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1600--1659.</p><p><strong>Abstract:</strong><br/>
An important problem in time series analysis is the discrimination between non-stationarity and long-range dependence. Most of the literature considers the problem of testing specific parametric hypotheses of non-stationarity (such as a change in the mean) against long-range dependent stationary alternatives. In this paper we suggest a simple approach, which can be used to test the null-hypothesis of a general non-stationary short-memory against the alternative of a non-stationary long-memory process. The test procedure works in the spectral domain and uses a sequence of approximating tvFARIMA models to estimate the time varying long-range dependence parameter. We prove uniform consistency of this estimate and asymptotic normality of an averaged version. These results yield a simple test (based on the quantiles of the standard normal distribution), and it is demonstrated in a simulation study that - despite of its semi-parametric nature - the new test outperforms the currently available methods, which are constructed to discriminate between specific parametric hypotheses of non-stationarity short- and stationarity long-range dependence.
</p>projecteuclid.org/euclid.ejs/1493020822_20170619220131Mon, 19 Jun 2017 22:01 EDTSemiparametric copula quantile regression for complete or censored datahttp://projecteuclid.org/euclid.ejs/1493107294<strong>Mickaël De Backer</strong>, <strong>Anouar El Ghouch</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1660--1698.</p><p><strong>Abstract:</strong><br/>
When facing multivariate covariates, general semiparametric regression techniques come at hand to propose flexible models that are unexposed to the curse of dimensionality. In this work a semiparametric copula-based estimator for conditional quantiles is investigated for both complete or right-censored data. In spirit, the methodology is extending the recent work of Noh, El Ghouch and Bouezmarni [34] and Noh, El Ghouch and Van Keilegom [35], as the main idea consists in appropriately defining the quantile regression in terms of a multivariate copula and marginal distributions. Prior estimation of the latter and simple plug-in lead to an easily implementable estimator expressed, for both contexts with or without censoring, as a weighted quantile of the observed response variable. In addition, and contrary to the initial suggestion in the literature, a semiparametric estimation scheme for the multivariate copula density is studied, motivated by the possible shortcomings of a purely parametric approach and driven by the regression context. The resulting quantile regression estimator has the valuable property of being automatically monotonic across quantile levels. Additionally, the copula-based approach allows the analyst to spontaneously take account of common regression concerns such as interactions between covariates or possible transformations of the latter. From a theoretical prospect, asymptotic normality for both complete and censored data is obtained under classical regularity conditions. Finally, numerical examples as well as a real data application are used to illustrate the validity and finite sample performance of the proposed procedure.
</p>projecteuclid.org/euclid.ejs/1493107294_20170619220131Mon, 19 Jun 2017 22:01 EDTErrors-in-variables models with dependent measurementshttp://projecteuclid.org/euclid.ejs/1493107295<strong>Mark Rudelson</strong>, <strong>Shuheng Zhou</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1699--1797.</p><p><strong>Abstract:</strong><br/>
Suppose that we observe $y\in\mathbb{R}^{n}$ and $X\in\mathbb{R}^{n\times m}$ in the following errors-in-variables model: \begin{eqnarray*}y&=&X_{0}\beta^{*}+\epsilon\\X&=&X_{0}+W\end{eqnarray*} where $X_{0}$ is an $n\times m$ design matrix with independent subgaussian row vectors, $\epsilon\in\mathbb{R}^{n}$ is a noise vector and $W$ is a mean zero $n\times m$ random noise matrix with independent subgaussian column vectors, independent of $X_{0}$ and $\epsilon$. This model is significantly different from those analyzed in the literature in the sense that we allow the measurement error for each covariate to be a dependent vector across its $n$ observations. Such error structures appear in the science literature when modeling the trial-to-trial fluctuations in response strength shared across a set of neurons.
Under sparsity and restrictive eigenvalue type of conditions, we show that one is able to recover a sparse vector $\beta^{*}\in\mathbb{R}^{m}$ from the model given a single observation matrix $X$ and the response vector $y$. We establish consistency in estimating $\beta^{*}$ and obtain the rates of convergence in the $\ell_{q}$ norm, where $q=1,2$ for the Lasso-type estimator, and for $q\in [1,2]$ for a Dantzig-type Conic programming estimator. We show error bounds which approach that of the regular Lasso and the Dantzig selector in case the errors in $W$ are tending to 0. We analyze the convergence rates of the gradient descent methods for solving the nonconvex programs and show that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to a neighborhood of the global minimizers: the size of the neighborhood is bounded by the statistical error in the $\ell_{2}$ norm. Our analysis reveals interesting connections between computational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. We provide simulation evidence illuminating the theoretical predictions.
</p>projecteuclid.org/euclid.ejs/1493107295_20170619220131Mon, 19 Jun 2017 22:01 EDTTests of radial symmetry for multivariate copulas based on the copula characteristic functionhttp://projecteuclid.org/euclid.ejs/1495159235<strong>Tarik Bahraoui</strong>, <strong>Jean-François Quessy</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2066--2096.</p><p><strong>Abstract:</strong><br/>
A new class of rank statistics is proposed to assess that the copula of a multivariate population is radially symmetric. The proposed test statistics are weighted $L_{2}$ functional distances between a nonparametric estimator of the characteristic function that one can associate to a copula and its complex conjugate. It will be shown that these statistics behave asymptotically as degenerate V-statistics of order four and that the limit distributions have expressions in terms of weighted sums of independent chi-square random variables. A suitably adapted and asymptotically valid multiplier bootstrap procedure is proposed for the computation of $p$-values. One advantage of the proposed approach is that unlike methods based on the empirical copula, the partial derivatives of the copula need not be estimated. The good properties of the tests in finite samples are shown via simulations. In particular, the superiority of the proposed tests over competing ones based on the empirical copula investigated by [6] in the bivariate case is clearly demonstrated.
</p>projecteuclid.org/euclid.ejs/1495159235_20170619220131Mon, 19 Jun 2017 22:01 EDTFinite sample properties of tests based on prewhitened nonparametric covariance estimatorshttp://projecteuclid.org/euclid.ejs/1495159236<strong>David Preinerstorfer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2097--2167.</p><p><strong>Abstract:</strong><br/>
We analytically investigate size and power properties of a popular family of procedures for testing linear restrictions on the coefficient vector in a linear regression model with temporally dependent errors. The tests considered are autocorrelation-corrected F-type tests based on prewhitened nonparametric covariance estimators that possibly incorporate a data-dependent bandwidth parameter, e.g., estimators as considered in Andrews and Monahan (1992), Newey and West (1994), or Rho and Shao (2013). For design matrices that are generic in a measure theoretic sense we prove that these tests either suffer from extreme size distortions or from strong power deficiencies. Despite this negative result we demonstrate that a simple adjustment procedure based on artificial regressors can often resolve this problem.
</p>projecteuclid.org/euclid.ejs/1495159236_20170619220131Mon, 19 Jun 2017 22:01 EDTPower of change-point tests for long-range dependent datahttp://projecteuclid.org/euclid.ejs/1495159237<strong>Herold Dehling</strong>, <strong>Aeneas Rooch</strong>, <strong>Murad S. Taqqu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2168--2198.</p><p><strong>Abstract:</strong><br/>
We investigate the power of the CUSUM test and the Wilcoxon change-point tests for a shift in the mean of a process with long-range dependent noise. We derive analytic formulas for the power of these tests under local alternatives. These results enable us to calculate the asymptotic relative efficiency (ARE) of the CUSUM test and the Wilcoxon change point test. We obtain the surprising result that for Gaussian data, the ARE of these two tests equals $1$, in contrast to the case of i.i.d. noise when the ARE is known to be $3/\pi$.
</p>projecteuclid.org/euclid.ejs/1495159237_20170619220131Mon, 19 Jun 2017 22:01 EDTSiAM: A hybrid of single index models and additive modelshttp://projecteuclid.org/euclid.ejs/1496044838<strong>Shujie Ma</strong>, <strong>Heng Lian</strong>, <strong>Hua Liang</strong>, <strong>Raymond J. Carroll</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2397--2423.</p><p><strong>Abstract:</strong><br/>
While popular, single index models and additive models have potential limitations, a fact that leads us to propose SiAM, a novel hybrid combination of these two models. We first address model identifiability under general assumptions. The result is of independent interest. We then develop an estimation procedure by using splines to approximate unknown functions and establish the asymptotic properties of the resulting estimators. Furthermore, we suggest a two-step procedure for establishing confidence bands for the nonparametric additive functions. This procedure enables us to make global inferences. Numerical experiments indicate that SiAM works well with finite sample sizes, and are especially robust to model structures. That is, when the model reduces to either single-index or additive scenario, the estimation and inference results are comparable to those based on the true model, while when the model is misspecified, the superiority of our method can be very great.
</p>projecteuclid.org/euclid.ejs/1496044838_20170619220131Mon, 19 Jun 2017 22:01 EDTEstimation of mean form and mean form difference under elliptical lawshttp://projecteuclid.org/euclid.ejs/1496131237<strong>José A. Díaz-García</strong>, <strong>Francisco J. Caro-Lopera</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2424--2460.</p><p><strong>Abstract:</strong><br/>
The matrix variate elliptical generalization of [30] is presented in this work. The published Gaussian case is revised and modified. Then, new aspects of identifiability and consistent estimation of mean form and mean form difference are considered under elliptical laws. For example, instead of using the Euclidean distance matrix for the consistent estimates, exact formulae are derived for the moments of the matrix $\mathbf{B}=\mathbf{X}^{c}\left(\mathbf{X}^{c}\right)^{T}$; where $\mathbf{X}^{c}$ is the centered landmark matrix. Finally, a complete application in Biology is provided; it includes estimation, model selection and hypothesis testing.
</p>projecteuclid.org/euclid.ejs/1496131237_20170619220131Mon, 19 Jun 2017 22:01 EDTMultinomial and empirical likelihood under convex constraints: Directions of recession, Fenchel duality, the PP algorithmhttp://projecteuclid.org/euclid.ejs/1497924056<strong>Marian Grendár</strong>, <strong>Vladimír Špitalský</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2547--2612.</p><p><strong>Abstract:</strong><br/>
The primal problem of multinomial likelihood maximization restricted to a convex closed subset of the probability simplex is studied. A solution of this problem may assign a positive mass to an outcome with zero count. Such a solution cannot be obtained by the widely used, simplified Lagrange and Fenchel duals. Related flaws in the simplified dual problems, which arise because the recession directions are ignored, are identified and the correct Lagrange and Fenchel duals are developed.
The results permit us to specify linear sets and data such that the empirical likelihood-maximizing distribution exists and is the same as the multinomial likelihood-maximizing distribution. The multinomial likelihood ratio reaches, in general, a different conclusion than the empirical likelihood ratio.
Implications for minimum discrimination information, Lindsay geometry, compositional data analysis, bootstrap with auxiliary information, and Lagrange multiplier test, which explicitly or implicitly ignore information about the support, are discussed.
A solution of the primal problem can be obtained by the PP (perturbed primal) algorithm, that is, as the limit of a sequence of solutions of perturbed primal problems. The PP algorithm may be implemented by the simplified Lagrange or Fenchel dual.
</p>projecteuclid.org/euclid.ejs/1497924056_20170619220131Mon, 19 Jun 2017 22:01 EDTDiagonal and unscaled Wald-type tests in general factorial designshttp://projecteuclid.org/euclid.ejs/1497924057<strong>Łukasz Smaga</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2613--2646.</p><p><strong>Abstract:</strong><br/>
In this paper, the asymptotic and permutation testing procedures are developed in general factorial designs without assuming homoscedasticity or a particular error distribution. The one-way layout, crossed and hierarchically nested designs are contained in our general framework. New test statistics are modifications of Wald-type statistic, where a weight matrix is a certain diagonal matrix. Asymptotic properties of the new solutions are also investigated. In particular, the consistency of the tests under fixed alternatives or asymptotic validity of the permutation procedures are proved in many cases. Simulation studies show that, in the case of small sample sizes, some of the proposed methods perform comparably to or even better in certain situations than the Wald-type permutation test of Pauly et al. (2015). Illustrative real data examples of the use of the tests in practice are also given.
</p>projecteuclid.org/euclid.ejs/1497924057_20170619220131Mon, 19 Jun 2017 22:01 EDTEstimation of a discrete probability under constraint of $k$-monotonicityhttp://projecteuclid.org/euclid.ejs/1483585972<strong>Jade Giguelay</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1--49.</p><p><strong>Abstract:</strong><br/>
We propose two least-squares estimators of a discrete probability under the constraint of $k$-monotonicity and study their statistical properties. We give a characterization of these estimators based on the decomposition on a spline basis of $k$-monotone sequences. We develop an algorithm derived from the Support Reduction Algorithm and we finally present a simulation study to illustrate their properties.
</p>projecteuclid.org/euclid.ejs/1483585972_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of low rank density matrices by Pauli measurementshttp://projecteuclid.org/euclid.ejs/1483585973<strong>Dong Xia</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 50--77.</p><p><strong>Abstract:</strong><br/>
Density matrices are positively semi-definite Hermitian matrices with unit trace that describe the states of quantum systems. Many quantum systems of physical interest can be represented as high-dimensional low rank density matrices. A popular problem in quantum state tomography (QST) is to estimate the unknown low rank density matrix of a quantum system by conducting Pauli measurements. Our main contribution is twofold. First, we establish the minimax lower bounds in Schatten $p$-norms with $1\leq p\leq+\infty$ for low rank density matrices estimation by Pauli measurements. In our previous paper [14], these minimax lower bounds are proved under the trace regression model with Gaussian noise and the noise is assumed to have common variance. In this paper, we prove these bounds under the Binomial observation model which meets the actual model in QST.
Second, we study the Dantzig estimator (DE) for estimating the unknown low rank density matrix under the Binomial observation model by using Pauli measurements. In our previous papers [14] and [25], we studied the least squares estimator and the projection estimator, where we proved the optimal convergence rates for the least squares estimator in Schatten $p$-norms with $1\leq p\leq2$ and, under a stronger condition, the optimal convergence rates for the projection estimator in Schatten $p$-norms with $1\leq p\leq+\infty$. In this paper, we show that the results of these two distinct estimators can be simultaneously obtained by the Dantzig estimator. Moreover, better convergence rates in Schatten norm distances can be proved for Dantzig estimator under conditions weaker than those needed in [14] and [25]. When the objective function of DE is replaced by the negative von Neumann entropy, we obtain sharp convergence rate in Kullback-Leibler divergence.
</p>projecteuclid.org/euclid.ejs/1483585973_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of the global regularity of a multifractional Brownian motionhttp://projecteuclid.org/euclid.ejs/1484363215<strong>Joachim Lebovits</strong>, <strong>Mark Podolskij</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 78--98.</p><p><strong>Abstract:</strong><br/>
This paper presents a new estimator of the global regularity index of a multifractional Brownian motion. Our estimation method is based upon a ratio statistic, which compares the realized global quadratic variation of a multifractional Brownian motion at two different frequencies. We show that a logarithmic transformation of this statistic converges in probability to the minimum of the Hurst functional parameter, which is, under weak assumptions, identical to the global regularity index of the path.
</p>projecteuclid.org/euclid.ejs/1484363215_20170626220158Mon, 26 Jun 2017 22:01 EDTBootstrap for the second-order analysis of Poisson-sampled almost periodic processeshttp://projecteuclid.org/euclid.ejs/1485162022<strong>Dominique Dehay</strong>, <strong>Anna E. Dudek</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 99--147.</p><p><strong>Abstract:</strong><br/>
In this paper we consider a continuous almost periodically correlated process $\{X(t),t\in\mathbb{R}\}$ that is observed at the jump moments of a stationary Poisson point process $\{N(t),t\geq0\}$. The processes $\{X(t),t\in\mathbb{R}\}$ and $\{N(t),t\geq0\}$ are assumed to be independent. We define the kernel estimators of the Fourier coefficients of the autocovariance function of $X(t)$ and investigate their asymptotic properties. Moreover, we propose a bootstrap method that provides consistent pointwise and simultaneous confidence intervals for the considered coefficients. Finally, to illustrate our results we provide a simulated data example.
</p>projecteuclid.org/euclid.ejs/1485162022_20170626220158Mon, 26 Jun 2017 22:01 EDTParametric conditional variance estimation in location-scale models with censored datahttp://projecteuclid.org/euclid.ejs/1485939611<strong>Cédric Heuchenne</strong>, <strong>Géraldine Laurent</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 148--176.</p><p><strong>Abstract:</strong><br/>
Suppose the random vector $(X,Y)$ satisfies the regression model $Y=m(X)+\sigma (X)\varepsilon$, where $m(\cdot)=E(Y|\cdot),$ $\sigma^{2}(\cdot)=\mbox{Var}(Y|\cdot)$ belongs to some parametric class $\{\sigma _{\theta}(\cdot):\theta \in \Theta\}$ and $\varepsilon$ is independent of $X$. The response $Y$ is subject to random right censoring and the covariate $X$ is completely observed. A new estimation procedure is proposed for $\sigma _{\theta}(\cdot)$ when $m(\cdot)$ is unknown. It is based on nonlinear least squares estimation extended to conditional variance in the censored case. The consistency and asymptotic normality of the proposed estimator are established. The estimator is studied via simulations and an important application is devoted to fatigue life data analysis.
</p>projecteuclid.org/euclid.ejs/1485939611_20170626220158Mon, 26 Jun 2017 22:01 EDTConvergence properties of Gibbs samplers for Bayesian probit regression with proper priorshttp://projecteuclid.org/euclid.ejs/1485939612<strong>Saptarshi Chakraborty</strong>, <strong>Kshitij Khare</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 177--210.</p><p><strong>Abstract:</strong><br/>
The Bayesian probit regression model (Albert and Chib [1]) is popular and widely used for binary regression. While the improper flat prior for the regression coefficients is an appropriate choice in the absence of any prior information, a proper normal prior is desirable when prior information is available or in modern high dimensional settings where the number of coefficients ($p$) is greater than the sample size ($n$). For both choices of priors, the resulting posterior density is intractable and a Data Augmentation (DA) Markov chain is used to generate approximate samples from the posterior distribution. Establishing geometric ergodicity for this DA Markov chain is important as it provides theoretical guarantees for constructing standard errors for Markov chain based estimates of posterior quantities. In this paper, we first show that in case of proper normal priors, the DA Markov chain is geometrically ergodic for all choices of the design matrix $X$, $n$ and $p$ (unlike the improper prior case, where $n\geq p$ and another condition on $X$ are required for posterior propriety itself). We also derive sufficient conditions under which the DA Markov chain is trace-class, i.e., the eigenvalues of the corresponding operator are summable. In particular, this allows us to conclude that the Haar PX-DA sandwich algorithm (obtained by inserting an inexpensive extra step in between the two steps of the DA algorithm) is strictly better than the DA algorithm in an appropriate sense.
</p>projecteuclid.org/euclid.ejs/1485939612_20170626220158Mon, 26 Jun 2017 22:01 EDTCross-calibration of probabilistic forecastshttp://projecteuclid.org/euclid.ejs/1488531637<strong>Christof Strähl</strong>, <strong>Johanna Ziegel</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 608--639.</p><p><strong>Abstract:</strong><br/>
When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Often, there are several competing forecasters of different skill. We extend common notions of calibration where each forecaster is analyzed individually, to stronger notions of cross-calibration where each forecaster is analyzed with respect to the other forecasters. In particular, cross-calibration distinguishes forecasters with respect to increasing information sets. We provide diagnostic tools and statistical tests to assess cross-calibration. The methods are illustrated in simulation examples and applied to probabilistic forecasts for inflation rates by the Bank of England. Computer code and supplementary material (Strähl and Ziegel, 2017a,b) are available online.
</p>projecteuclid.org/euclid.ejs/1488531637_20170626220158Mon, 26 Jun 2017 22:01 EDTPrediction weighted maximum frequency selectionhttp://projecteuclid.org/euclid.ejs/1488531638<strong>Hongmei Liu</strong>, <strong>J. Sunil Rao</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 640--681.</p><p><strong>Abstract:</strong><br/>
Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today’s complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts. Estimation of penalty parameters still presents difficulties however. While variable selection consistent procedures have been developed, their finite sample performance can often be less than satisfactory. We develop a new strategy for variable selection using the adaptive LASSO and adaptive Elastic-Net estimators with $p_{n}$ diverging. The basic idea first involves using the trace paths of their LARS solutions to bootstrap estimates of maximum frequency (MF) models conditioned on dimension. Conditioning on dimension effectively mitigates overfitting, however to deal with underfitting, these MFs are then prediction-weighted, and it is shown that not only can consistent model selection be achieved, but that attractive convergence rates can as well, leading to excellent finite sample performance. Detailed numerical studies are carried out on both simulated and real datasets. Extensions to the class of generalized linear models are also detailed.
</p>projecteuclid.org/euclid.ejs/1488531638_20170626220158Mon, 26 Jun 2017 22:01 EDTAdaptive wavelet multivariate regression with errors in variableshttp://projecteuclid.org/euclid.ejs/1488964114<strong>Michaël Chichignoud</strong>, <strong>Van Ha Hoang</strong>, <strong>Thanh Mai Pham Ngoc</strong>, <strong>Vincent Rivoirard</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 682--724.</p><p><strong>Abstract:</strong><br/>
In the multidimensional setting, we consider the errors-in- variables model. We aim at estimating the unknown nonparametric multivariate regression function with errors in the covariates. We devise an adaptive estimators based on projection kernels on wavelets and a deconvolution operator. We propose an automatic and fully data driven procedure to select the wavelet level resolution. We obtain an oracle inequality and optimal rates of convergence over anisotropic Hölder classes. Our theoretical results are illustrated by some simulations.
</p>projecteuclid.org/euclid.ejs/1488964114_20170626220158Mon, 26 Jun 2017 22:01 EDTSome properties of the autoregressive-aided block bootstraphttp://projecteuclid.org/euclid.ejs/1488964115<strong>Tobias Niebuhr</strong>, <strong>Jens-Peter Kreiss</strong>, <strong>Efstathios Paparoditis</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 725--751.</p><p><strong>Abstract:</strong><br/>
We investigate properties of a hybrid bootstrap procedure for general, strictly stationary sequences, called the autoregressive-aided block bootstrap which combines a parametric autoregressive bootstrap with a nonparametric moving block bootstrap. The autoregressive-aided block bootstrap consists of two main steps, namely an autoregressive model fit and an ensuing (moving) block resampling of residuals. The linear parametric model-fit prewhitenes the time series so that the dependence structure of the remaining residuals gets closer to that of a white noise sequence, while the moving block bootstrap applied to these residuals captures nonlinear features that are not taken into account by the linear autoregressive fit. We establish validity of the autoregressive-aided block bootstrap for the important class of statistics known as generalized means which includes many commonly used statistics in time series analysis as special cases. Numerical investigations show that the hybrid bootstrap procedure considered in this paper performs quite well, it behaves as good as or it outperforms in many cases the ordinary moving block bootstrap and it is robust against mis-specifications of the autoregressive order, a substantial advantage over the autoregressive bootstrap.
</p>projecteuclid.org/euclid.ejs/1488964115_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimatorshttp://projecteuclid.org/euclid.ejs/1489201320<strong>Yuchen Zhang</strong>, <strong>Martin J. Wainwright</strong>, <strong>Michael I. Jordan</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 752--799.</p><p><strong>Abstract:</strong><br/>
For the problem of high-dimensional sparse linear regression, it is known that an $\ell_{0}$-based estimator can achieve a $1/n$ “fast” rate for prediction error without any conditions on the design matrix, whereas in the absence of restrictive conditions on the design matrix, popular polynomial-time methods only guarantee the $1/\sqrt{n}$ “slow” rate. In this paper, we show that the slow rate is intrinsic to a broad class of M-estimators. In particular, for estimators based on minimizing a least-squares cost function together with a (possibly nonconvex) coordinate-wise separable regularizer, there is always a “bad” local optimum such that the associated prediction error is lower bounded by a constant multiple of $1/\sqrt{n}$. For convex regularizers, this lower bound applies to all global optima. The theory is applicable to many popular estimators, including convex $\ell_{1}$-based methods as well as M-estimators based on nonconvex regularizers, including the SCAD penalty or the MCP regularizer. In addition, we show that bad local optima are very common, in that a broad class of local minimization algorithms with random initialization typically converge to a bad solution.
</p>projecteuclid.org/euclid.ejs/1489201320_20170626220158Mon, 26 Jun 2017 22:01 EDTModel selection for the segmentation of multiparameter exponential family distributionshttp://projecteuclid.org/euclid.ejs/1490666425<strong>Alice Cleynen</strong>, <strong>Emilie Lebarbier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 800--842.</p><p><strong>Abstract:</strong><br/>
We consider the segmentation problem of univariate distributions from the exponential family with multiple parameters. In segmentation, the choice of the number of segments remains a difficult issue due to the discrete nature of the change-points. In this general exponential family distribution framework, we propose a penalized $\log$-likelihood estimator where the penalty is inspired by papers of L. Birgé and P. Massart. The resulting estimator is proved to satisfy some oracle inequalities. We then further study the particular case of categorical variables by comparing the values of the key constants when derived from the specification of our general approach and when obtained by working directly with the characteristics of this distribution. Finally, simulation studies are conducted to assess the performance of our criterion and to compare our approach to other existing methods, and an application on real data modeled using the categorical distribution is provided.
</p>projecteuclid.org/euclid.ejs/1490666425_20170626220158Mon, 26 Jun 2017 22:01 EDTA test of Gaussianity based on the Euler characteristic of excursion setshttp://projecteuclid.org/euclid.ejs/1490688316<strong>Elena Di Bernardino</strong>, <strong>Anne Estrade</strong>, <strong>José R. León</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 843--890.</p><p><strong>Abstract:</strong><br/>
In the present paper, we deal with a stationary isotropic random field $X:{\mathbb{R}}^{d}\to{\mathbb{R}}$ and we assume it is partially observed through some level functionals. We aim at providing a methodology for a test of Gaussianity based on this information. More precisely, the level functionals are given by the Euler characteristic of the excursion sets above a finite number of levels. On the one hand, we study the properties of these level functionals under the hypothesis that the random field $X$ is Gaussian. In particular, we focus on the mapping that associates to any level $u$ the expected Euler characteristic of the excursion set above level $u$. On the other hand, we study the same level functionals under alternative distributions of $X$, such as chi-square, harmonic oscillator and shot noise. In order to validate our methodology, a part of the work consists in numerical experimentations. We generate Monte-Carlo samples of Gaussian and non-Gaussian random fields and compare, from a statistical point of view, their level functionals. Goodness-of-fit $p-$values are displayed for both cases. Simulations are performed in one dimensional case ($d=1$) and in two dimensional case ($d=2$), using R.
</p>projecteuclid.org/euclid.ejs/1490688316_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimating a smooth function on a large graph by Bayesian Laplacian regularisationhttp://projecteuclid.org/euclid.ejs/1490688317<strong>Alisa Kirichenko</strong>, <strong>Harry van Zanten</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 891--915.</p><p><strong>Abstract:</strong><br/>
We study a Bayesian approach to estimating a smooth function in the context of regression or classification problems on large graphs. We derive theoretical results that show how asymptotically optimal Bayesian regularisation can be achieved under an asymptotic shape assumption on the underlying graph and a smoothness condition on the target function, both formulated in terms of the graph Laplacian. The priors we study are randomly scaled Gaussians with precision operators involving the Laplacian of the graph.
</p>projecteuclid.org/euclid.ejs/1490688317_20170626220158Mon, 26 Jun 2017 22:01 EDTAdaptive density estimation based on a mixture of Gammashttp://projecteuclid.org/euclid.ejs/1490688318<strong>Natalia Bochkina</strong>, <strong>Judith Rousseau</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 916--962.</p><p><strong>Abstract:</strong><br/>
We consider the problem of Bayesian density estimation on the positive semiline for possibly unbounded densities. We propose a hierarchical Bayesian estimator based on the gamma mixture prior which can be viewed as a location mixture. We study convergence rates of Bayesian density estimators based on such mixtures. We construct approximations of the local Hölder densities, and of their extension to unbounded densities, to be continuous mixtures of gamma distributions, leading to approximations of such densities by finite mixtures. These results are then used to derive posterior concentration rates, with priors based on these mixture models. The rates are minimax (up to a log n term) and since the priors are independent of the smoothness, the rates are adaptive to the smoothness.
One of the novel feature of the paper is that these results hold for densities with polynomial tails. Similar results are obtained using a hierarchical Bayesian model based on the mixture of inverse gamma densities which can be used to estimate adaptively densities with very heavy tails, including Cauchy density.
</p>projecteuclid.org/euclid.ejs/1490688318_20170626220158Mon, 26 Jun 2017 22:01 EDTA note on central limit theorems for quadratic variation in case of endogenous observation timeshttp://projecteuclid.org/euclid.ejs/1490860813<strong>Mathias Vetter</strong>, <strong>Tobias Zwingmann</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 963--980.</p><p><strong>Abstract:</strong><br/>
This paper is concerned with a central limit theorem for quadratic variation when observations come as exit times from a regular grid. We discuss the special case of a semimartingale with deterministic characteristics and finite activity jumps in detail and illustrate technical issues in more general situations.
</p>projecteuclid.org/euclid.ejs/1490860813_20170626220158Mon, 26 Jun 2017 22:01 EDTDensity estimation for $\tilde{\beta}$-dependent sequenceshttp://projecteuclid.org/euclid.ejs/1490860814<strong>Jérôme Dedecker</strong>, <strong>Florence Merlevède</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 981--1021.</p><p><strong>Abstract:</strong><br/>
We study the ${\mathbb{L}}^{p}$-integrated risk of some classical estimators of the density, when the observations are drawn from a strictly stationary sequence. The results apply to a large class of sequences, which can be non-mixing in the sense of Rosenblatt and long-range dependent. The main probabilistic tool is a new Rosenthal-type inequality for partial sums of $BV$ functions of the variables. As an application, we give the rates of convergence of regular Histograms, when estimating the invariant density of a class of expanding maps of the unit interval with a neutral fixed point at zero. These Histograms are plotted in the section devoted to the simulations.
</p>projecteuclid.org/euclid.ejs/1490860814_20170626220158Mon, 26 Jun 2017 22:01 EDTKernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operatorshttp://projecteuclid.org/euclid.ejs/1490860815<strong>Lee H. Dicker</strong>, <strong>Dean P. Foster</strong>, <strong>Daniel Hsu</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1022--1047.</p><p><strong>Abstract:</strong><br/>
Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification ) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments.
</p>projecteuclid.org/euclid.ejs/1490860815_20170626220158Mon, 26 Jun 2017 22:01 EDTEstimation of false discovery proportion in multiple testing: From normal to chi-squared test statisticshttp://projecteuclid.org/euclid.ejs/1490925658<strong>Lilun Du</strong>, <strong>Chunming Zhang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1048--1091.</p><p><strong>Abstract:</strong><br/>
Multiple testing based on chi-squared test statistics is common in many scientific fields such as genomics research and brain imaging studies. However, the challenges of designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. To address this gap, we first adopt a latent factor structure ([14]) to construct a testing framework for approximating the false discovery proportion ($\mathrm{FDP}$) for a large number of highly correlated chi-squared test statistics with a finite number of degrees of freedom $k$. The testing framework is then used to simultaneously test $k$ linear constraints in a large dimensional linear factor model with some observable and unobservable common factors; the result is a consistent estimator of the $\mathrm{FDP}$ based on the associated factor-adjusted $p$-values. The practical utility of the method is investigated through extensive simulation studies and an analysis of batch effects in a gene expression study.
</p>projecteuclid.org/euclid.ejs/1490925658_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimized recentered confidence spheres for the multivariate normal meanhttp://projecteuclid.org/euclid.ejs/1493258584<strong>Waruni Abeysekera</strong>, <strong>Paul Kabaila</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1798--1826.</p><p><strong>Abstract:</strong><br/>
Casella and Hwang, 1983, JASA , introduced a broad class of recentered confidence spheres for the mean $\boldsymbol{\theta}$ of a multivariate normal distribution with covariance matrix $\sigma^{2}\boldsymbol{I}$, for $\sigma^{2}$ known. Both the center and radius functions of these confidence spheres are flexible functions of the data. For the particular case of confidence spheres centered on the positive-part James-Stein estimator and with radius determined by empirical Bayes considerations, they show numerically that these confidence spheres have the desired minimum coverage probability $1-\alpha$ and dominate the usual confidence sphere in terms of scaled volume. We shift the focus from the scaled volume to the scaled expected volume of the recentered confidence sphere. Since both the coverage probability and the scaled expected volume are functions of the Euclidean norm of $\boldsymbol{\theta}$, it is feasible to optimize the performance of the recentered confidence sphere by numerically computing both the center and radius functions so as to optimize some clearly specified criterion. We suppose that we have uncertain prior information that $\boldsymbol{\theta}=\boldsymbol{0}$. This motivates us to determine the center and radius functions of the confidence sphere by numerical minimization of the scaled expected volume of the confidence sphere at $\boldsymbol{\theta}=\boldsymbol{0}$, subject to the constraints that (a) the coverage probability never falls below $1-\alpha$ and (b) the radius never exceeds the radius of the standard $1-\alpha$ confidence sphere. Our results show that, by focusing on this clearly specified criterion, significant gains in performance (in terms of this criterion) can be achieved. We also present analogous results for the much more difficult case that $\sigma^{2}$ is unknown.
</p>projecteuclid.org/euclid.ejs/1493258584_20170626220158Mon, 26 Jun 2017 22:01 EDTFinite sample bounds for expected number of false rejections under martingale dependence with applications to FDRhttp://projecteuclid.org/euclid.ejs/1493345170<strong>Julia Benditkis</strong>, <strong>Arnold Janssen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1827--1857.</p><p><strong>Abstract:</strong><br/>
Much effort has been made to improve the famous step up procedure of Benjamini and Hochberg given by linear critical values $\frac{i\alpha}{n}$. It is pointed out by Gavrilov, Benjamini and Sarkar that step down multiple testing procedures based on the critical values $\beta_{i}=\frac{i\alpha}{n+1-i(1-\alpha)}$ still control the false discovery rate (FDR) at the upper bound $\alpha$ under basic independence assumptions. Since that result is no longer true for step up procedures and for step down procedures, if the p-values are dependent, a big discussion about the corresponding FDR starts in the literature. The present paper establishes finite sample formulas and bounds for the FDR and the expected number of false rejections for multiple testing procedures using critical values $\beta_{i}$ under martingale and reverse martingale dependence models. It is pointed out that martingale methods are natural tools for the treatment of local FDR estimators which are closely connected to the present coefficients $\beta_{i}$. The martingale approach also yields new results and further inside for the special basic independence model.
</p>projecteuclid.org/euclid.ejs/1493345170_20170626220158Mon, 26 Jun 2017 22:01 EDTData-driven nonlinear expectations for statistical uncertainty in decisionshttp://projecteuclid.org/euclid.ejs/1493366413<strong>Samuel N. Cohen</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1858--1889.</p><p><strong>Abstract:</strong><br/>
In stochastic decision problems, one often wants to estimate the underlying probability measure statistically, and then to use this estimate as a basis for decisions. We shall consider how the uncertainty in this estimation can be explicitly and consistently incorporated in the valuation of decisions, using the theory of nonlinear expectations.
</p>projecteuclid.org/euclid.ejs/1493366413_20170626220158Mon, 26 Jun 2017 22:01 EDTAn averaged projected Robbins-Monro algorithm for estimating the parameters of a truncated spherical distributionhttp://projecteuclid.org/euclid.ejs/1493776837<strong>Antoine Godichon-Baggioni</strong>, <strong>Bruno Portier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1890--1927.</p><p><strong>Abstract:</strong><br/>
The objective of this work is to propose a new algorithm to fit a sphere on a noisy 3D point cloud distributed around a complete or a truncated sphere. More precisely, we introduce a projected Robbins-Monro algorithm and its averaged version for estimating the center and the radius of the sphere. We give asymptotic results such as the almost sure convergence of these algorithms as well as the asymptotic normality of the averaged algorithm. Furthermore, some non-asymptotic results will be given, such as the rates of convergence in quadratic mean. Some numerical experiments show the efficiency of the proposed algorithm on simulated data for small to moderate sample sizes and for modeling an object in 3D.
</p>projecteuclid.org/euclid.ejs/1493776837_20170626220158Mon, 26 Jun 2017 22:01 EDTNonparametric distribution estimation in the presence of familial correlation and censoringhttp://projecteuclid.org/euclid.ejs/1493776838<strong>Kun Xu</strong>, <strong>Yanyuan Ma</strong>, <strong>Yuanjia Wang</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1928--1948.</p><p><strong>Abstract:</strong><br/>
We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods.
</p>projecteuclid.org/euclid.ejs/1493776838_20170626220158Mon, 26 Jun 2017 22:01 EDTStatistical models for cores decomposition of an undirected random graphhttp://projecteuclid.org/euclid.ejs/1494900119<strong>Vishesh Karwa</strong>, <strong>Michael J. Pelsmajer</strong>, <strong>Sonja Petrović</strong>, <strong>Despina Stasi</strong>, <strong>Dane Wilburne</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 1949--1982.</p><p><strong>Abstract:</strong><br/>
The $k$-core decomposition is a widely studied summary statistic that describes a graph’s global connectivity structure. In this paper, we move beyond using $k$-core decomposition as a tool to summarize a graph and propose using $k$-core decomposition as a tool to model random graphs. We propose using the shell distribution vector, a way of summarizing the decomposition, as a sufficient statistic for a family of exponential random graph models. We study the properties and behavior of the model family, implement a Markov chain Monte Carlo algorithm for simulating graphs from the model, implement a direct sampler from the set of graphs with a given shell distribution, and explore the sampling distributions of some of the commonly used complementary statistics as good candidates for heuristic model fitting. These algorithms provide first fundamental steps necessary for solving the following problems: parameter estimation in this ERGM, extending the model to its Bayesian relative, and developing a rigorous methodology for testing goodness of fit of the model and model selection. The methods are applied to a synthetic network as well as the well-known Sampson monks dataset.
</p>projecteuclid.org/euclid.ejs/1494900119_20170626220158Mon, 26 Jun 2017 22:01 EDTInference for a mean-reverting stochastic process with multiple change pointshttp://projecteuclid.org/euclid.ejs/1495504915<strong>Fuqi Chen</strong>, <strong>Rogemar Mamon</strong>, <strong>Matt Davison</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2199--2257.</p><p><strong>Abstract:</strong><br/>
The use of an Ornstein-Uhlenbeck (OU) process is ubiquitous in business, economics and finance to capture various price processes and evolution of economic indicators exhibiting mean-reverting properties. The time at which structural transition representing drastic changes in the economic dynamics occur are of particular interest to policy makers, investors and financial product providers. This paper addresses the change-point problem under a generalised OU model and investigates the associated statistical inference. We propose two estimation methods to locate multiple change points and show the asymptotic properties of the estimators. An informational approach is employed in detecting the change points, and the consistency of our methods is also theoretically demonstrated. Estimation is considered under the setting where both the number and location of change points are unknown. Three computing algorithms are further developed for implementation. The practical applicability of our methods is illustrated using simulated and observed financial market data.
</p>projecteuclid.org/euclid.ejs/1495504915_20170626220158Mon, 26 Jun 2017 22:01 EDTOptimal exponential bounds for aggregation of estimators for the Kullback-Leibler losshttp://projecteuclid.org/euclid.ejs/1495504916<strong>Cristina Butucea</strong>, <strong>Jean-François Delmas</strong>, <strong>Anne Dutfoy</strong>, <strong>Richard Fischer</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2258--2294.</p><p><strong>Abstract:</strong><br/>
We study the problem of aggregation of estimators with respect to the Kullback-Leibler divergence for various probabilistic models. Rather than considering a convex combination of the initial estimators $f_{1},\ldots,f_{N}$, our aggregation procedures rely on the convex combination of the logarithms of these functions. The first method is designed for probability density estimation as it gives an aggregate estimator that is also a proper density function, whereas the second method concerns spectral density estimation and has no such mass-conserving feature. We select the aggregation weights based on a penalized maximum likelihood criterion. We give sharp oracle inequalities that hold with high probability, with a remainder term that is decomposed into a bias and a variance part. We also show the optimality of the remainder terms by providing the corresponding lower bound results.
</p>projecteuclid.org/euclid.ejs/1495504916_20170626220158Mon, 26 Jun 2017 22:01 EDTChange-point tests under local alternatives for long-range dependent processeshttp://projecteuclid.org/euclid.ejs/1496217654<strong>Johannes Tewes</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2461--2498.</p><p><strong>Abstract:</strong><br/>
We consider the change-point problem for the marginal distribution of subordinated Gaussian processes that exhibit long-range dependence. The asymptotic distributions of Kolmogorov-Smirnov- and Cramér-von Mises type statistics are investigated under local alternatives. By doing so we are able to compute the asymptotic relative efficiency of the mentioned tests and the CUSUM test. In the special case of a mean-shift in Gaussian data it is always $1$. Moreover, our theory covers the scenario where the Hermite rank of the underlying process changes.
In a small simulation study, we show that the theoretical findings carry over to the finite sample performance of the tests.
</p>projecteuclid.org/euclid.ejs/1496217654_20170626220158Mon, 26 Jun 2017 22:01 EDTRecovering block-structured activations using compressive measurementshttp://projecteuclid.org/euclid.ejs/1498528883<strong>Sivaraman Balakrishnan</strong>, <strong>Mladen Kolar</strong>, <strong>Alessandro Rinaldo</strong>, <strong>Aarti Singh</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2647--2678.</p><p><strong>Abstract:</strong><br/>
We consider the problems of detection and support recovery of a contiguous block of weak activation in a large matrix, from noisy, possibly adaptively chosen, compressive (linear) measurements. We precisely characterize the tradeoffs between the various problem dimensions, the signal strength and the number of measurements required to reliably detect and recover the support of the signal, both for passive and adaptive measurement schemes. In each case, we complement algorithmic results with information-theoretic lower bounds. Analogous to the situation in the closely related problem of noisy compressed sensing, we show that for detection neither adaptivity, nor structure reduce the minimax signal strength requirement. On the other hand we show the rather surprising result that, contrary to the situation in noisy compressed sensing, the signal strength requirement to recover the support of a contiguous block-structured signal is strongly influenced by both the signal structure and the ability to choose measurements adaptively.
</p>projecteuclid.org/euclid.ejs/1498528883_20170626220158Mon, 26 Jun 2017 22:01 EDTPrediction by quantization of a conditional distributionhttp://projecteuclid.org/euclid.ejs/1498528884<strong>Jean-Michel Loubes</strong>, <strong>Bruno Pelletier</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 1, 2679--2706.</p><p><strong>Abstract:</strong><br/>
Given a pair of random vectors $(X,Y)$, we consider the problem of approximating $Y$ by $\mathbf{c}(X)=\{\mathbf{c}_{1}(X),\dots ,\mathbf{c}_{M}(X)\}$ where $\mathbf{c}$ is a measurable set-valued function. We give meaning to the approximation by using the principles of vector quantization which leads to the definition of a multifunction regression problem. The formulated problem amounts at quantizing the conditional distributions of $Y$ given $X$. We propose a nonparametric estimate of the solutions of the multifunction regression problem by combining the method of $M$-means clustering with the nonparametric smoothing technique of $k$-nearest neighbors. We provide an asymptotic analysis of the estimate and we derive a convergence rate for the excess risk of the estimate. The proposed methodology is illustrated on simulated examples and on a speed-flow traffic data set emanating from the context of road traffic forecasting.
</p>projecteuclid.org/euclid.ejs/1498528884_20170626220158Mon, 26 Jun 2017 22:01 EDTAsymptotic properties of quasi-maximum likelihood estimators in observation-driven time series modelshttp://projecteuclid.org/euclid.ejs/1499133752<strong>Randal Douc</strong>, <strong>Konstantinos Fokianos</strong>, <strong>Eric Moulines</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2707--2740.</p><p><strong>Abstract:</strong><br/>
We study a general class of quasi-maximum likelihood estimators for observation-driven time series models. Our main focus is on models related to the exponential family of distributions like Poisson based models for count time series or duration models. However, the proposed approach is more general and covers a variety of time series models including the ordinary GARCH model which has been studied extensively in the literature. We provide general conditions under which quasi-maximum likelihood estimators can be analyzed for this class of time series models and we prove that these estimators are consistent and asymptotically normally distributed regardless of the true data generating process. We illustrate our results using classical examples of quasi-maximum likelihood estimation including standard GARCH models, duration models, Poisson type autoregressions and ARMA models with GARCH errors. Our contribution unifies the existing theory and gives conditions for proving consistency and asymptotic normality in a variety of situations.
</p>projecteuclid.org/euclid.ejs/1499133752_20170703220233Mon, 03 Jul 2017 22:02 EDTA Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimatorhttp://projecteuclid.org/euclid.ejs/1499133753<strong>Ayanendranath Basu</strong>, <strong>Abhik Ghosh</strong>, <strong>Abhijit Mandal</strong>, <strong>Nirian Martín</strong>, <strong>Leandro Pardo</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2741--2772.</p><p><strong>Abstract:</strong><br/>
In this paper a robust version of the classical Wald test statistics for linear hypothesis in the logistic regression model is introduced and its properties are explored. We study the problem under the assumption of random covariates although some ideas with non random covariates are also considered. A family of robust Wald type tests are considered here, where the minimum density power divergence estimator is used instead of the maximum likelihood estimator. We obtain the asymptotic distribution and also study the robustness properties of these Wald type test statistics. The robustness of the tests is investigated theoretically through the influence function analysis as well as suitable practical examples. It is theoretically established that the level as well as the power of the Wald-type tests are stable against contamination, while the classical Wald type test breaks down in this scenario. Some classical examples are presented which numerically substantiate the theory developed. Finally a simulation study is included to provide further confirmation of the validity of the theoretical results established in the paper.
</p>projecteuclid.org/euclid.ejs/1499133753_20170703220233Mon, 03 Jul 2017 22:02 EDTParametrically guided local quasi-likelihood with censored datahttp://projecteuclid.org/euclid.ejs/1499133754<strong>Majda Talamakrouni</strong>, <strong>Anouar El Ghouch</strong>, <strong>Ingrid Van Keilegom</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2773--2799.</p><p><strong>Abstract:</strong><br/>
It is widely pointed out in the literature that misspecification of a parametric model can lead to inconsistent estimators and wrong inference. However, even a misspecified model can provide some valuable information about the phenomena under study. This is the main idea behind the development of an approach known, in the literature, as parametrically guided nonparametric estimation. Due to its promising bias reduction property, this approach has been investigated in different frameworks such as density estimation, least squares regression and local quasi-likelihood. Our contribution is concerned with parametrically guided local quasi-likelihood estimation adapted to randomly right censored data. The generalization to censored data involves synthetic data and local linear fitting. The asymptotic properties of the guided estimator as well as its finite sample performance are studied and compared with the unguided local quasi-likelihood estimator. The results confirm the bias reduction property and show that, using an appropriate guide and an appropriate bandwidth, the proposed estimator outperforms the classical local quasi-likelihood estimator.
</p>projecteuclid.org/euclid.ejs/1499133754_20170703220233Mon, 03 Jul 2017 22:02 EDTConverting high-dimensional regression to high-dimensional conditional density estimationhttp://projecteuclid.org/euclid.ejs/1499133755<strong>Rafael Izbicki</strong>, <strong>Ann B. Lee</strong>. <p><strong>Source: </strong>Electronic Journal of Statistics, Volume 11, Number 2, 2800--2831.</p><p><strong>Abstract:</strong><br/>
There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode , a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.
</p>projecteuclid.org/euclid.ejs/1499133755_20170703220233Mon, 03 Jul 2017 22:02 EDT