Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
We consider the problem of sparsity testing in the high-dimensional linear regression model. The problem is to test whether the number of non-zero components (aka the sparsity) of the regression parameter is less than or equal to . We pinpoint the minimax separation distances for this problem, which amounts to quantifying how far a -sparse vector has to be from the set of -sparse vectors so that a test is able to reject the null hypothesis with high probability. Two scenarios are considered. In the independent scenario, the covariates are i.i.d. normally distributed and the noise level is known. In the general scenario, both the covariance matrix of the covariates and the noise level are unknown. Although the minimax separation distances differ in these two scenarios, both of them actually depend on and illustrating that for this composite-composite testing problem both the size of the null and of the alternative hypotheses play a key role.
In this paper, we offer a proof for a family of functional inequalities interpolating between the Poincaré and the logarithmic Sobolev (standard and weighted) inequalities. The proofs rely both on entropy flows and on a condition, either with and , or with and . As such, results are valid in the case of a Riemannian manifold, which constitutes a generalization of what was previously proved.
A popular class of problems in statistics deals with estimating the support of a density from n observations drawn at random from a d-dimensional distribution. In the one-dimensional case, if the support is an interval, the problem reduces to estimating its end points. In practice, an experimenter may only have access to a noisy version of the original data. Therefore, a more realistic model allows for the observations to be contaminated with additive noise.
In this paper, we consider estimation of convex bodies when the additive noise is distributed according to a multivariate Gaussian (or nearly Gaussian) distribution, even though our techniques could easily be adapted to other noise distributions. Unlike standard methods in deconvolution that are implemented by thresholding a kernel density estimate, our method avoids tuning parameters and Fourier transforms altogether. We show that our estimator, computable in time, converges at a rate of in Hausdorff distance, in accordance with the polylogarithmic rates encountered in Gaussian deconvolution problems. Part of our analysis also involves the optimality of the proposed estimator. We provide a lower bound for the minimax rate of estimation in Hausdorff distance that is .
We study the estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein’s identities, we develop methods that are applicable for the estimation of the variance index in the high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in the low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in the high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions.
We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution Q to be approximated and its proxy P. This non-symmetric measure is more informative than and implies bounds for the total variation distance.
Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases, we provide both upper and (matching) lower bounds.
We provide an approximation of the hitting probability for a small sphere for the following two dimensional process: In x-direction it is just a Brownian motion with positive constant drift, whereas in y-direction the process is a Brownian motion with drift given by a negative constant times the sign of . This process can be seen as the solution of a certain stochastic optimal control problem. It turns out that the approximating function can be expressed as the sum of a term involving a modified Bessel function and an ordinary Lebesgue integral.
In this article, we show that there exists a unique weak solution to the reflected Brownian motion with singular drift μ, where μ is a vector-valued Kato class measure on . Furthermore, we obtain some Gaussian type estimates of the transition density function of the solution.
In this paper, we study in the Markovian case the rate of convergence in Wasserstein distance when the solution to a BSDE is approximated by a solution to a BSDE driven by a scaled random walk as introduced in Briand, Delyon and Mémin (Electron. Commun. Probab.6 (2001) Art. ID 1). This is related to the approximation of solutions to semilinear second order parabolic PDEs by solutions to their associated finite difference schemes and the speed of convergence.
Consider a response variable subject to nonignorable nonresponse and a fully observed covariate vector. The purpose of our study is threefold. First, we study how to extend nonparametric sufficient dimension reduction to data with nonignorable nonresponse. Second, we utilize sufficient dimension reduction to search an instrument, a linear function of covariates that is related to the response variable but can be excluded from the propensity of nonignorable nonresponse, for the purpose of identifying unknown parameters in a semiparametric propensity and a nonparametric distribution of response variable and covariates. Third, we establish asymptotic results for parameter estimators based on sufficient dimension reduction and instrument search, and investigate the effect on the limiting distribution of parameter estimators due to instrument search. We evaluate the performance of proposed estimators in a Monte Carlo study and illustrate our method in an application to AIDS Clinical Trials Group Protocol 175 data.
We develop large sample theory including nonparametric confidence regions for r-dimensional ridges of probability density functions on , where . We view ridges as the intersections of level sets of some special functions. The vertical variation of the plug-in kernel estimators for these functions constrained on the ridges is used as the measure of maximal deviation for ridge estimation. Our confidence regions for the ridges are determined by the asymptotic distribution of this maximal deviation, which is established by utilizing the extreme value distribution of nonstationary χ-fields indexed by manifolds.
We consider a finite impulse response system with centered independent sub-Gaussian design covariates and noise components that are not necessarily identically distributed. We derive non-asymptotic near-optimal estimation and prediction bounds for the least squares estimator of the parameters. Our results are based on two concentration inequalities on the norm of sums of dependent covariate vectors and on the singular values of their covariance operator that are of independent value on their own and where the dependence arises from the time shift structure of the time series. These results generalize the known bounds for the independent case.
In this paper, we show that the basic results in large deviations theory hold for general monetary risk measures, which satisfy the crucial property of max-stability. A max-stable monetary risk measure fulfills a lattice homomorphism property, and satisfies under a suitable tightness condition the Laplace Principle (LP), that is, admits a dual representation with affine convex conjugate. By replacing asymptotic concentration of probability by concentration of risk, we formulate a Large Deviation Principle (LDP) for max-stable monetary risk measures, and show its equivalence to the LP. In particular, the special case of the asymptotic entropic risk measure corresponds to the classical Varadhan–Bryc equivalence between the LDP and LP. The main results are illustrated by the asymptotic shortfall risk measure.
We give necessary and sufficient conditions for the existence of a phantom distribution function for a stationary random field on a regular lattice. We also introduce a less demanding notion of a directional phantom distribution, with potentially broader area of applicability. Such approach leads to sectorial limit properties, a phenomenon well-known in limit theorems for random fields. An example of a stationary Gaussian random field is provided showing that the two notions do not coincide. Criteria for the existence of the corresponding notions of the extremal index and the sectorial extremal index are also given.
We consider an N by N real or complex generalized Wigner matrix , whose entries are independent centered random variables with uniformly bounded moments. We assume that the variance profile, , satisfies , for all and for all with some constant . We establish Gaussian fluctuations for the linear eigenvalue statistics of on global scales, as well as on all mesoscopic scales up to the spectral edges, with the expectation and variance formulated in terms of the variance profile. We subsequently obtain the universal mesoscopic central limit theorems for the linear eigenvalue statistics inside the bulk and at the edges, respectively.
We study the fluctuations, as , of the Wishart matrix associated to a random matrix with non-Gaussian entries. We analyze the limiting behavior in distribution of in two situations: when the entries of are independent elements of a Wiener chaos of arbitrary order and when the entries are partially correlated and belong to the second Wiener chaos. In the first case, we show that the (suitably normalized) Wishart matrix converges in distribution to a Gaussian matrix while in the correlated case, we obtain its convergence in law to a diagonal non-Gaussian matrix. In both cases, we derive the rate of convergence in the Wasserstein distance via Malliavin calculus and analysis on Wiener space.
We develop theory and methodology for the problem of nonparametric registration of functional data that have been subjected to random deformation (warping) of their time scale. The separation of this phase variation (“horizontal” variation) from the amplitude variation (“vertical” variation) is crucial in order to properly conduct further analyses, which otherwise can be severely distorted. We determine precise nonparametric conditions under which the two forms of variation are identifiable. These show that the identifiability delicately depends on the underlying rank. By means of several counterexamples, we demonstrate that our conditions are sharp if one wishes a genuinely nonparametric setup; and in doing so we caution that popular remedies such as structural assumptions or roughness penalties can easily fail. We then propose a nonparametric registration method based on a “local variation measure”, the main element in elucidating identifiability. A key advantage of the method is that it is free of any tuning or penalisation parameters regulating the amount of alignment, thus circumventing the problem of over/under-registration often encountered in practice. We provide asymptotic theory for the resulting estimators under the identifiable regime, but also under mild departures from identifiability, quantifying the resulting bias in terms of the amplitude variation’s spectral gap.
Consider the following distribution dependent SDE:
where stands for the distribution of . In this paper for non-degenerate σ, we show the strong well-posedness of the above SDE under some integrability assumptions in the spatial variable and Lipschitz continuity in μ about b and σ. In particular, we extend the results of Krylov–Röckner (Probab. Theory Related Fields131 (2005) 154–196) to the distribution dependent case.
We study the problem of detecting whether an inhomogeneous random graph contains a planted community. Specifically, we observe a single realization of a graph. Under the null hypothesis, this graph is a sample from an inhomogeneous random graph, whereas under the alternative, there exists a small subgraph where the edge probabilities are increased by a multiplicative scaling factor. We present a scan test that is able to detect the presence of such a planted community, even when this community is very small and the underlying graph is inhomogeneous. We also derive an information theoretic lower bound for this problem which shows that in some regimes the scan test is almost asymptotically optimal. We illustrate our results through examples and numerical experiments.
In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low/medium sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy mainly target the differences between marginal means and variances, whereas the test based on -distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings.
This paper discusses predictive densities under the Kullback–Leibler loss for high-dimensional Poisson sequence models under sparsity constraints. Sparsity in count data implies zero-inflation. We present a class of Bayes predictive densities that attain asymptotic minimaxity in sparse Poisson sequence models. We also show that our class with an estimator of unknown sparsity level plugged-in is adaptive in the asymptotically minimax sense. For application, we extend our results to settings with quasi-sparsity and with missing-completely-at-random observations. The simulation studies as well as application to real data illustrate the efficiency of the proposed Bayes predictive densities.
Drees and Rootzén (Ann. Statist.38 (2010) 2145–2186) have established limit theorems for a general class of empirical processes of statistics that are useful for the extreme value analysis of time series, but do not apply to statistics of sliding blocks, including so-called runs estimators. We generalize these results to empirical processes which cover both the class considered by Drees and Rootzén (Ann. Statist.38 (2010) 2145–2186) and processes of sliding blocks statistics. Using this approach, one can analyze different types of statistics in a unified framework. We show that statistics based on sliding blocks are asymptotically normal with an asymptotic variance which, under rather mild conditions, is smaller than or equal to the asymptotic variance of the corresponding estimator based on disjoint blocks. Finally, the general theory is applied to three well-known estimators of the extremal index. It turns out that they all have the same limit distribution, a fact which has so far been overlooked in the literature.
In this paper, we study the Dvoretzky covering problem with non-uniformly distributed centers. When the probability law of the centers is absolutely continuous w.r.t. Lebesgue measure and satisfies a regularity condition on the set of essential infimum points, we give a necessary and sufficient condition for covering the circle. When the lengths of covering intervals are of the form , we give a necessary and sufficient condition for covering the circle, without imposing any regularity on the density function.
Consider a spectrally positive Lévy process Z with log-Laplace exponent Ψ and a positive continuous function R on . We investigate the entrance from infinity of the process X obtained by changing time in Z with the inverse of the additive functional . We provide a necessary and sufficient condition for infinity to be an entrance boundary of the process X. Under this condition, the process can start from infinity and we study its speed of coming down from infinity. When the Lévy process has a negative drift , sufficient conditions over R and Ψ are found for the process to come down from infinity along the deterministic function solution to with . If as , , and R is regularly varying at ∞ with index , the process comes down from infinity and we find a renormalisation in law of its running infimum at small times.
In this paper, we study precise large deviations for the partial sums of a stationary sequence with a subexponential marginal distribution. Our main focus is on distributions which either have a regularly varying or a lognormal-type tail. We apply the results to prove limit theory for the maxima of the entries large sample covariance matrices.
We derive bounds on the scope for a confidence band to adapt to the unknown regularity of a nonparametric function that is observed with noise, such as a regression function or density, under the self-similarity condition proposed by Giné and Nickl (Ann. Statist.38 (2010) 1122–1170). We find that adaptation can only be achieved up to a term that depends on the choice of the constant used to define self-similarity, and that this term becomes arbitrarily large for conservative choices of the self-similarity constant. We construct a confidence band that achieves this bound, up to a constant term that does not depend on the self-similarity constant. Our results suggest that care must be taken in choosing and interpreting the constant that defines self-similarity, since the dependence of adaptive confidence bands on this constant cannot be made to disappear asymptotically.
The article determines the asymptotic shape of the extremal clusters in stationary regularly varying random fields. To deduce this result, we present a general framework for the Poisson approximation of point processes on Polish spaces which appears to be of independent interest. We further introduce a novel and convenient concept of anchoring of the extremal clusters for regularly varying sequences and fields. Together with the Poissonian approximation theory, this allows for a concise description of the limiting behavior of random fields in this setting. We apply this theory to shed entirely new light on the classical problem of evaluating local alignments of biological sequences.
In the extreme value analysis of stationary regularly varying time series, tail array sums form a broad class of statistics suitable to analyze their extremal behavior. This class includes for example, the Hill estimator or estimators of the extremogram and the tail dependence coefficient.
General asymptotic theory for tail array sums has been developed by Rootzén et al. (Ann. Appl. Probab.8 (1998) 868–885) under mixing conditions and in Kulik et al. (Stochastic Process. Appl.129 (2019) 4209–4238) for functions of geometrically ergodic Markov chains. A more general framework of cluster functionals is presented in Drees and Rootzén (Ann. Statist.38 (2010) 2145–2186).
However, the resulting limiting distributions turn out to be very complex and cumbersome to estimate as they usually depend on the whole extremal dependence structure of the time series. Hence, a suitable bootstrap procedure is desired, but available bootstrap consistency results for tail array sums are scarce. In this paper, following Drees (Drees (2015)), we consider a multiplier block bootstrap to estimate the limiting distribution of tail array sums. We prove that, conditionally on the data, an appropriately constructed multiplier block bootstrap statistic converges to the correct limiting distribution. Interestingly, in contrast, it turns out that an apparently natural, but naïve application of the multiplier block bootstrap scheme does not yield the correct limit.
In simulations, we provide numerical evidence of our theoretical findings and illustrate the superiority of the proposed multiplier block bootstrap over some obvious competitors. The proposed bootstrap scheme proves to be computationally efficient in comparison to other approaches.