Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
This paper studies schemes to de-bias the Lasso in sparse linear regression with Gaussian design where the goal is to estimate and construct confidence intervals for a low-dimensional projection of the unknown coefficient vector in a preconceived direction . Our analysis reveals that previously analyzed propositions to de-bias the Lasso require a modification in order to enjoy nominal coverage and asymptotic efficiency in a full range of the level of sparsity. This modification takes the form of a degrees-of-freedom adjustment that accounts for the dimension of the model selected by the Lasso. The degrees-of-freedom adjustment (a) preserves the success of de-biasing methodologies in regimes where previous proposals were successful, and (b) repairs the nominal coverage and provides efficiency in regimes where previous proposals produce spurious inferences and provably fail to achieve the nominal coverage. Hence our theoretical and simulation results call for the implementation of this degrees-of-freedom adjustment in de-biasing methodologies.
Let denote the number of nonzero coefficients of the true coefficient vector and Σ the population Gram matrix. The unadjusted de-biasing scheme may fail to achieve the nominal coverage as soon as if Σ is known. If Σ is unknown, the degrees-of-freedom adjustment grants efficiency for the contrast in a general direction when
where . The dependence in and is optimal and closes a gap in previous upper and lower bounds. Our construction of the estimated score vector provides a novel methodology to handle dense directions .
Beyond the degrees-of-freedom adjustment, our proof techniques yield a sharp error bound for the Lasso which is of independent interest.
A model-free bootstrap procedure for a general class of stationary time series is introduced. The theoretical framework is established, showing asymptotic validity of bootstrap confidence intervals for many statistics of interest. In addition, asymptotic validity of one-step ahead bootstrap prediction intervals is also demonstrated. Finite-sample experiments are conducted to empirically confirm the performance of the new method, and to compare with popular methods such as the block bootstrap and the autoregressive (AR)-sieve bootstrap.
We are interested in reconstructing the initial condition of a non-linear partial differential equation (PDE), namely the Fokker-Planck equation, from the observation of a Dyson Brownian motion at a given time . The Fokker-Planck equation describes the evolution of electrostatic repulsive particle systems, and can be seen as the large particle limit of correctly renormalized Dyson Brownian motions. The solution of the Fokker-Planck equation can be written as the free convolution of the initial condition and the semi-circular distribution. We propose a nonparametric estimator for the initial condition obtained by performing the free deconvolution via the subordination functions method. This statistical estimator is original as it involves the resolution of a fixed point equation, and a classical deconvolution by a Cauchy distribution. This is due to the fact that, in free probability, the analogue of the Fourier transform is the R-transform, related to the Cauchy transform. In past literature, there has been a focus on the estimation of the initial conditions of linear PDEs such as the heat equation, but to the best of our knowledge, this is the first time that the problem is tackled for a non-linear PDE. The convergence of the estimator is proved and the integrated mean square error is computed, providing rates of convergence similar to the ones known for non-parametric deconvolution methods. Finally, a simulation study illustrates the good performances of our estimator.
On the basis of independent and identically distributed bivariate random vectors, where the components are categorial and continuous variables, respectively, the related concomitants, also called induced order statistic, are considered. The main theoretical result is a functional central limit theorem for the empirical process of the concomitants in a triangular array setting. A natural application is hypothesis testing. An independence test and a two-sample test are investigated in detail. The fairly general setting enables limit results under local alternatives and bootstrap samples. For the comparison with existing tests from the literature simulation studies are conducted. The empirical results obtained confirm the theoretical findings.
It was shown by the authors that two one-dimensional probability measures in the convex order admit a martingale coupling with respect to which the integral of is smaller than twice their -distance (Wasserstein distance with index 1). We showed that replacing and respectively with and does not lead to a finite multiplicative constant. We show here that a finite constant is recovered when replacing with the product of times the centred ρ-th moment of the second marginal to the power . Then we study the generalisation of this new martingale Wasserstein inequality to higher dimension.
Component-wise MCMC algorithms, including Gibbs and conditional Metropolis-Hastings samplers, are commonly used for sampling from multivariate probability distributions. A long-standing question regarding Gibbs algorithms is whether a deterministic-scan (systematic-scan) sampler converges faster than its random-scan counterpart. We answer this question when the samplers involve two components by establishing an exact quantitative relationship between the convergence rates of the two samplers. The relationship shows that the deterministic-scan sampler converges faster. We also establish qualitative relations among the convergence rates of two-component Gibbs samplers and some conditional Metropolis-Hastings variants. For instance, it is shown that if some two-component conditional Metropolis-Hastings samplers are geometrically ergodic, then so are the associated Gibbs samplers.
The empirical eigenvalue distribution of the elliptic random matrix ensemble tends to the uniform measure on an ellipse in the complex plane as its dimension tends to infinity. We show this convergence on all mesoscopic scales slightly above the typical eigenvalue spacing in the bulk spectrum with an optimal convergence rate. As a corollary we obtain complete delocalisation for the corresponding eigenvectors in any basis.
In this paper, we propose and analyze a model selection method for tree tensor networks in an empirical risk minimization framework and analyze its performance over a wide range of smoothness classes. Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in numerical analysis and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension partition tree T, widths given by a tuple r of tensor ranks, and multilinear activation functions (or units). The approximation power of these model classes has been proved to be optimal (or near to optimal) for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree T and ranks r should be selected carefully to balance estimation and approximation errors. In this paper, we propose a complexity-based model selection strategy à la Barron, Birgé, Massart. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class. After deriving bounds of the metric entropy of tree tensor networks with bounded parameters, we deduce a form of the penalty from bounds on suprema of empirical processes. This choice of penalty yields a risk bound for the predictor associated with the selected model. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that the proposed strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy for the approximation of multivariate functions and univariate functions identified with tensors by tensorization (quantization).
We consider a stochastic differential equation and its Euler-Maruyama (EM) scheme, under some appropriate conditions, they both admit a unique invariant measure, denoted by π and respectively (η is the step size of the EM scheme). We construct an empirical measure of the EM scheme as a statistic of , and use Stein’s method developed in Fang, Shao and Xu (Probab. Theory Related Fields174 (2019) 945–979) to prove a central limit theorem of . The proof of the self-normalized Cramér-type moderate deviation (SNCMD) is based on a standard decomposition on Markov chain, splitting into a martingale difference series sum and a negligible remainder . We handle by the time-change technique for martingale, while prove that is exponentially negligible by concentration inequalities, which have their independent interest. Moreover, we show that SNCMD holds for , which has the same order as that of the classical result in Shao (J. Theoret. Probab.12 (1999) 385–398), Jing, Shao and Wang (Ann. Probab.31 (2003) 2167–2215).
We consider anchored Gaussian ℓ-simplices in the d-dimensional Euclidean space, that is, simplices with one fixed vertex and the remaining vertices randomly sampled from the d-variate standard normal distribution. We determine the distribution of the measure of such simplices for any d, any ℓ, and any anchor point y, which is of interest, e.g., when studying the asymptotic behaviour of U-statistics based on such simplex measures. We provide two proofs of the results. The first one is short but is not self-contained as it crucially relies on a technical result for non-central Wishart distributions. The second one is a simple and self-contained proof, that also provides some geometric insight on the results. Quite nicely, variations on this second argument reveal intriguing distributional identities on products of central and non-central chi-square distributions with Beta-distributed non-centrality parameters. We independently establish these distributional identities by making use of Mellin transforms. Beyond the aforementioned use to study the asymptotic behaviour of some U-statistics, our results do find natural applications in the context of robust location estimation, as we illustrate by considering a class of simplex-based multivariate medians that contains the celebrated spatial median and Oja median as special cases. Throughout, our results are confirmed by numerical experiments.
Regression models, in which the observed features and the response depend, jointly, on a lower dimensional, unobserved, latent vector , with , are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases.
To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z, and are sufficient for parameter identifiability.
Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β, along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its -norm minimax lower bound, and show that our proposed estimator is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n, and also when both, or either, p and K vary with n, while allowing for . This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime and , but without a variance estimate.
As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results.
We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.
Expectiles induce a law-invariant, coherent and elicitable risk measure that has received substantial attention in actuarial and financial risk management contexts. A number of recent papers have focused on the behaviour and estimation of extreme expectile-based risk measures and their potential for risk assessment was highlighted in financial and actuarial real data applications. Joint inference of several extreme expectiles has however been left untouched; in fact, even the inference about a marginal extreme expectile turns out to be a difficult problem in finite samples, even though an accurate idea of estimation uncertainty is crucial for the construction of confidence intervals in applications to risk management. We investigate the joint estimation of extreme marginal expectiles of a random vector with heavy-tailed marginal distributions, in a general extremal dependence model. We use these results to derive corrected confidence regions for extreme expectiles, as well as a test for the equality of tail expectiles. The methods are showcased in a finite-sample simulation study and on real financial data.
We establish asymptotic normality results for estimation of the block probability matrix B in stochastic blockmodel graphs using spectral embedding when the average degrees grows at the rate of in n, the number of vertices. As a corollary, we show that when B is of full-rank, estimates of B obtained from spectral embedding are asymptotically efficient. When B is singular the estimates obtained from spectral embedding can have smaller mean square error than those obtained from maximizing the log-likelihood under no rank assumption, and furthermore, can be almost as efficient as the true MLE that assumes the rank of B is known. Our results indicate, in the context of stochastic blockmodel graphs, that spectral embedding is not just computationally tractable, but that the resulting estimates are also admissible, even when compared to the purportedly optimal but computationally intractable maximum likelihood estimation under no rank assumption.
We consider the problem of sampling from a strongly log-concave density in , and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms.
We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least ε away from the target in total variation distance if the number of gradient queries is less than , where is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.
We consider Bayesian inference for a monotone density on the unit interval and study the resulting asymptotic properties. We consider a “projection-posterior” approach, where we construct a prior on density functions through random histograms without imposing the monotonicity constraint, but induce a random distribution by projecting a sample from the posterior on the space of monotone functions. The approach allows us to retain posterior conjugacy, allowing explicit expressions extremely useful for studying asymptotic properties. We show that the projection-posterior contracts at the optimal -rate. We then construct a consistent test based on the posterior distribution for testing the hypothesis of monotonicity. Finally, we obtain the limiting coverage of a projection-posterior credible interval for the value of the function at an interior point. Interestingly, the limiting coverage turns out to be higher than the nominal credibility level, the opposite of the undercoverage phenomenon observed in a smoothness regime. Moreover, we show that a recalibration method using a lower credibility level gives an intended limiting coverage. We also discuss extensions of the obtained results for densities on the half-line. We conduct a simulation study to demonstrate the accuracy of the asymptotic results in finite samples.
The paper deals with the problem of nonparametric estimating the –norm, , of a probability density on , from independent observations. The unknown density is assumed to belong to a ball in the anisotropic Nikolskii’s space. We adopt the minimax approach, and derive lower bounds on the minimax risk. In particular, we demonstrate that accuracy of estimation procedures essentially depends on whether p is integer or not. Moreover, we develop a general technique for derivation of lower bounds on the minimax risk in the problems of estimating nonlinear functionals. The proposed technique is applicable for a broad class of nonlinear functionals, and it is used for derivation of the lower bounds in the –norm estimation.
In this paper we develop rate–optimal estimation procedures in the problem of estimating the –norm, of a probability density from independent observations. The density is assumed to be defined on , and to belong to a ball in the anisotropic Nikolskii space. We adopt the minimax approach and construct rate–optimal estimators in the case of integer . We demonstrate that, depending on the parameters of the Nikolskii class and the norm index p, the minimax rates of convergence may vary from inconsistency to the parametric –estimation. The results in this paper complement the minimax lower bounds derived in the companion paper (Goldenshluger and Lepski (2020)).
We consider the closeness testing problem for discrete distributions. The goal is to distinguish whether two samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in -norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider a local minimax setting. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related one-sample testing problem over a wide range of cases.
One of the equivalent formulations of the Kadison–Singer problem which was resolved in 2013 by Marcus, Spielman and Srivastava, is the “paving conjecture”. In this paper, we first extend this result to real stable polynomials. We prove that for every multi-affine real stable polynomial satisfying a simple condition, it is possible to partition its set of variables to a small number of subsets such that the “restriction” of the polynomial to each subset has small roots. Then, we derive a probabilistic interpretation of this result. We show that there exists a partition of the underlying space of every strongly Rayleigh process into a small number of sets such that the restriction of the point process to each set has “almost independent” points. This result implies that the dependence structure of strongly Rayleigh processes is constrained—a phenomenon that is to be expected in negatively dependent measures. To prove this result, we introduce and study the notion of kernel polynomial for strongly Rayleigh processes. This notion is a natural generalization of the kernel of determinantal processes. We also derive an entropy bound for strongly Rayleigh processes in terms of the roots of the kernel polynomial which is interesting on its own.
Gram-type matrices and their spectral decomposition are of central importance for numerous problems in statistics, applied mathematics, physics, and machine learning. In this paper, we carefully study the non-asymptotic properties of spectral decomposition of large Gram-type matrices when data are not necessarily independent. Specifically, we derive the exponential tail bounds for the deviation between eigenvectors of the right Gram matrix to their population counterparts as well as the Berry-Esseen type bound for these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and relate machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising using dependent data.
In this study, we develop an asymptotic theory of nonparametric regression for locally stationary random fields (LSRFs) in observed at irregularly spaced locations in . We first derive the uniform convergence rate of general kernel estimators, followed by the asymptotic normality of an estimator for the mean function of the model. Moreover, we consider additive models to avoid the curse of dimensionality arising from the dependence of the convergence rate of estimators on the number of covariates. Subsequently, we derive the uniform convergence rate and joint asymptotic normality of the estimators for additive functions. We also introduce approximately-dependent RFs to provide examples of LSRFs. We find that these RFs include a wide class of Lévy-driven moving average RFs.
We show that a Cramér–Wold device holds for infinite divisibility of -valued distributions, i.e. that the distribution of a -valued random vector X is infinitely divisible if and only if the distribution of is infinitely divisible for all , and that this in turn is equivalent to infinite divisibility of the distribution of for all . A key tool for proving this is a Lévy–Khintchine type representation with a signed Lévy measure for the characteristic function of a -valued distribution, provided the characteristic function is zero-free.
We investigate the problem of deriving adaptive posterior rates of contraction on balls in density estimation. Although it is known that log-density priors can achieve optimal rates when the true density is sufficiently smooth, adaptive rates were still to be proven. Here we establish that the so-called spike-and-slab prior can achieve adaptive and optimal posterior contraction rates. Along the way, we prove a generic contraction result for log-density priors with independent wavelet coefficients. Interestingly, our approach is different from previous works on contraction and is reminiscent of the classical test-based approach used in Bayesian nonparametrics. Moreover, we require no lower bound on the smoothness of the true density, albeit the rates are deteriorated by an extra factor in the case of low smoothness.
By embedding a Markov-modulated random recurrence equation in continuous time, we derive the Markov-modulated generalized Ornstein-Uhlenbeck process. This process turns out to be the unique solution of a stochastic differential equation driven by a bivariate Markov-additive process. We present this stochastic differential equation as well as its solution explicitely in terms of the driving Markov-additive process. Moreover, we give necessary and sufficient conditions for strict stationarity of the Markov-modulated generalized Ornstein-Uhlenbeck process, and prove that its stationary distribution is given by the distribution of a certain exponential functional of Markov-additive processes. Finally, we propose a Markov-modulated risk model with investment that generalizes Paulsen’s risk process to a Markov-switching environment, and derive a formula for the ruin probability in this risk model.
We study the hydrodynamic and hydrostatic limits of the one-dimensional open symmetric inclusion process with slow boundary. Depending on the value of the parameter tuning the interaction rate of the bulk of the system with the boundary, we obtain a linear heat equation with either Dirichlet, Robin or Neumann boundary conditions as hydrodynamic equation. In our approach, we combine duality and first-second class particle techniques to reduce the scaling limit of the inclusion process to the limiting behavior of a single, non-interacting, particle.
We study the problem of empirical minimization for variance-type functionals over functional classes. Sharp non-asymptotic bounds for the excess variance are derived under mild conditions. In particular, it is shown that under some restrictions imposed on the functional class fast convergence rates can be achieved including the optimal non-parametric rates for expressive classes in the non-Donsker regime under some additional assumptions. Our main applications include variance reduction and optimal control.
We study an extension of the so-called defective Galton-Watson processes obtained by allowing the offspring distribution to change over the generations. Thus, in these processes, the individuals reproduce independently of the others and in accordance to some possibly defective offspring distribution depending on the generation. Moreover, the defect of the offspring distribution at generation n represents the probability that the process hits an absorbing state Δ at that generation. We focus on the asymptotic behaviour of these processes. We establish the almost sure convergence of the process to a random variable with values in and we provide two characterisations of the duality extinction-absorption at Δ. We also state some results on the absorption time and the properties of the process conditioned upon its non-absorption, some of which require us to introduce the notion of defective branching trees in varying environment.
We study the independent alignment percolation model on introduced by Beaton, Grimmett and Holmes. It is a model for random intersecting line segments defined as follows. First the sites of are independently declared occupied with probability p and vacant otherwise. Conditional on the configuration of occupied vertices, consider the set of all line segments that are parallel to the coordinate axis, whose extremes are occupied vertices and that do not traverse any other occupied vertex. Declare independently the segments on this set open with probability λ and closed otherwise. All the edges that lie on open segments are also declared open giving rise to a bond percolation model in . We show that for any and the critical value for λ satisfies completing the proof that the phase transition is non-trivial over the whole interval . We also show that the critical curve is continuous at .
We introduce a new approach to the spectral equivalence of Gaussian processes and fields, based on the methods of operator theory in Hilbert space. Besides several new results including identities in law of quadratic norms for integrated and multiply integrated Gaussian random functions we give an application to goodness-of-fit testing.
In the standard Bayesian framework data are assumed to be generated by a distribution parametrized by θ in a parameter space Θ, over which a prior distribution π is given. A Bayesian statistician quantifies the belief that the true parameter is in Θ by its posterior probability given the observed data. We investigate the behavior of the posterior belief in when the data are generated under some parameter , which may or may not be the same as . Starting from stochastic orders, specifically, likelihood ratio dominance, that obtain for resulting distributions of posteriors, we consider monotonicity properties of the posterior probabilities as a function of the sample size when data arrive sequentially. While the -posterior is monotonically increasing (i.e., it is a submartingale) when the data are generated under that same , it need not be monotonically decreasing in general, not even in terms of its overall expectation, when the data are generated under a different . In fact, it may keep going up and down many times, even in simple cases such as iid coin tosses. We obtain precise asymptotic rates when the data come from the wide class of exponential families of distributions; these rates imply in particular that the expectation of the -posterior under is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this result by developing an inequality that is related to Turán’s inequality for Legendre polynomials.
Consider a random walker on the nonnegative lattice, moving in continuous time, whose positive transition intensity is proportional to the time the walker spends at the origin. In this way, the walker is a jump process with a stochastic and adapted jump intensity. We show that, upon Brownian scaling, the sequence of such processes converges to Brownian motion with inert drift (BMID). BMID was introduced by Frank Knight in 2001 and generalized by White in 2007. This confirms a conjecture of Burdzy and White in 2008 in the one-dimensional setting.