Volume 49 Issue 4 | The Annals of Statistics

The Annals of Statistics

Memorial issue for Charles Stein

VOL. 49 · NO. 4 | August 2021

RECEIVE NEW CONTENT ALERTS FOR THIS ISSUE

< Previous Issue | Next Issue >

VIEW ALL ABSTRACTS +

Frontmatter

Table of Contents

Ann. Statist. 49 (4), (August 2021) Open Access

No abstract available

Editorial Board

Ann. Statist. 49 (4), (August 2021) Open Access

No abstract available

Articles

Editorial: Memorial issue for Charles Stein

Richard J. Samworth, Ming Yuan

Ann. Statist. 49 (4), 1811-1814, (August 2021) DOI: 10.1214/21-AOS2110 Open Access

No abstract available

Charles Stein and invariance: Beginning with the Hunt–Stein theorem

Morris L. Eaton, Edward I. George

Ann. Statist. 49 (4), 1815-1822, (August 2021) DOI: 10.1214/21-AOS2075 Open Access

KEYWORDS: amenability, decision theory, Fiducial inference, Haar measure, minimaxity, 6203, 62C05, 62C20, 62F03, 62F10

Read Abstract +

On Charles Stein’s contributions to (in)admissibility

William E. Strawderman

Ann. Statist. 49 (4), 1823-1835, (August 2021) DOI: 10.1214/21-AOS2108 Open Access

KEYWORDS: Admissibility, inadmissibility, minimaxity, 62C15, 62C20, 62F10

Read Abstract +

Stein 1956: Efficient nonparametric testing and estimation

A. W. van der Vaart, J. A. Wellner

Ann. Statist. 49 (4), 1836-1849, (August 2021) DOI: 10.1214/21-AOS2056 Open Access

KEYWORDS: Semiparametric efficiency, efficient score function, 62F12, 62G05, 62G20

Read Abstract +

Stein’s method of normal approximation: Some recollections and reflections

Louis H. Y. Chen

Ann. Statist. 49 (4), 1850-1863, (August 2021) DOI: 10.1214/21-AOS2083 Open Access

KEYWORDS: Stein’s method, Normal approximation, Stein equation, Stein identity, concentration inequality, exchangeable pair, 60F05, 62E17, 60B10

Read Abstract +

Second-order Stein: SURE for SURE and other applications in high-dimensional inference

Pierre C. Bellec, Cun-Hui Zhang

Ann. Statist. 49 (4), 1864-1903, (August 2021) DOI: 10.1214/20-AOS2005 Open Access

KEYWORDS: Stein’s formula, Variance estimate, risk estimate, SURE, SURE for SURE, Lasso, Elastic net, Model selection, variance of model size, debiased estimation, regression, 62H10, 62H12, 62J07, 62G15, 62F35

Read Abstract +

Stein’s formula states that a random variable of the form is mean-zero for all functions f with integrable gradient. Here, is the divergence of the function f and is a standard normal vector. This paper aims to propose a second-order Stein formula to characterize the variance of such random variables for all functions with square integrable gradient, and to demonstrate the usefulness of this second-order Stein formula in various applications.

In the Gaussian sequence model, a remarkable consequence of Stein’s formula is Stein’s Unbiased Risk Estimate (SURE), an unbiased estimate of the mean squared risk for almost any given estimator of the unknown mean vector. A first application of the second-order Stein formula is an Unbiased Risk Estimate for SURE itself (SURE for SURE): an unbiased estimate providing information about the squared distance between SURE and the squared estimation error of . SURE for SURE has a simple form as a function of the data and is applicable to all with square integrable gradient, for example, the Lasso and the Elastic Net.

In addition to SURE for SURE, the following statistical applications are developed: (1) upper bounds on the risk of SURE when the estimation target is the mean squared error; (2) confidence regions based on SURE and using the second-order Stein formula; (3) oracle inequalities satisfied by SURE-tuned estimates under a mild Lipschtiz assumption; (4) an upper bound on the variance of the size of the model selected by the Lasso, and more generally an upper bound on the variance of the empirical degrees-of-freedom of convex penalized estimators; (5) explicit expressions of SURE for SURE for the Lasso and the Elastic Net; (6) in the linear model, a general semiparametric scheme to de-bias a differentiable initial estimator for the statistical inference of a low-dimensional projection of the unknown regression coefficient vector, with a characterization of the variance after debiasing; and (7) an accuracy analysis of a Gaussian Monte Carlo scheme to approximate the divergence of functions .

Consistent nonparametric estimation for heavy-tailed sparse graphs

Christian Borgs, Jennifer T. Chayes, Henry Cohn, Shirshendu Ganguly

Ann. Statist. 49 (4), 1904-1930, (August 2021) DOI: 10.1214/20-AOS1985 Open Access

KEYWORDS: Sparse networks, estimation, graphons, 62G20, 62H30, 05C80

Read Abstract +

Nonexchangeable random partition models for microclustering

Giuseppe Di Benedetto, François Caron, Yee Whye Teh

Ann. Statist. 49 (4), 1931-1957, (August 2021) DOI: 10.1214/20-AOS2003 Open Access

KEYWORDS: power-law, Random partitions, completely random measure, stochastic process, sparse random graph, 62F15, 60G55, 60G57

Read Abstract +

Peskun–Tierney ordering for Markovian Monte Carlo: Beyond the reversible scenario

Christophe Andrieu, Samuel Livingstone

Ann. Statist. 49 (4), 1958-1981, (August 2021) DOI: 10.1214/20-AOS2008 Open Access

KEYWORDS: Markov chain Monte Carlo, Peskun ordering, Piecewise deterministic Markov processes, 65C40, 65C05, 62J10

Read Abstract +

Volatility coupling

Jean Jacod, Jia Li, Zhipeng Liao

Ann. Statist. 49 (4), 1982-1998, (August 2021) DOI: 10.1214/20-AOS2023 Open Access

KEYWORDS: coupling, high-frequency data, occupation measure, quantiles, Semimartingale, uniform inference, 60F15, 60G44, 62G20

Read Abstract +

Asymptotic distributions of high-dimensional distance correlation inference

Lan Gao, Yingying Fan, Jinchi Lv, Qi-Man Shao

Ann. Statist. 49 (4), 1999-2020, (August 2021) DOI: 10.1214/20-AOS2024 Open Access

KEYWORDS: nonparametric inference, high dimensionality, Distance correlation, test of independence, nonlinear dependence detection, central limit theorem, rate of convergence, power, blockchain, 62E20, 62H20, 62G10, 62G20

Read Abstract +

Confidence intervals for multiple isotonic regression and other monotone models

Hang Deng, Qiyang Han, Cun-Hui Zhang

Ann. Statist. 49 (4), 2021-2052, (August 2021) DOI: 10.1214/20-AOS2025 Open Access

KEYWORDS: Limit distribution theory, Confidence interval, multiple isotonic regression, Gaussian process, shape constraints, 60E15, 62G05

Read Abstract +

We consider the problem of constructing pointwise confidence intervals in the multiple isotonic regression model. Recently, Han and Zhang (2020) obtained a pointwise limit distribution theory for the so-called block max–min and min–max estimators (Fokianos, Leucht and Neumann (2020); Deng and Zhang (2020)) in this model, but inference remains a difficult problem due to the nuisance parameter in the limit distribution that involves multiple unknown partial derivatives of the true regression function.

In this paper, we show that this difficult nuisance parameter can be effectively eliminated by taking advantage of information beyond point estimates in the block max–min and min–max estimators. Formally, let (resp. ) be the maximizing lower-left (resp. minimizing upper-right) vertex in the block max–min (resp. min–max) estimator, and be the average of the block max–min and min–max estimators. If all (first-order) partial derivatives of are nonvanishing at , then the following pivotal limit distribution theory holds:

Here is the number of design points in the block , σ is the standard deviation of the errors, and is a universal limit distribution free of nuisance parameters. This immediately yields confidence intervals for with asymptotically exact confidence level and oracle length. Notably, the construction of the confidence intervals, even new in the univariate setting, requires no more efforts than performing an isotonic regression once using the block max–min and min–max estimators, and can be easily adapted to other common monotone models including, for example, (i) monotone density estimation, (ii) interval censoring model with current status data, (iii) counting process model with panel count data, and (iv) generalized linear models. Extensive simulations are carried out to support our theory.

On extended admissible procedures and their nonstandard Bayes risk

Haosui Duanmu, Daniel M. Roy

Ann. Statist. 49 (4), 2053-2078, (August 2021) DOI: 10.1214/20-AOS2026 Open Access

KEYWORDS: decision theory, Complete class theorems, nonstandard analysis, 62C07, 62C10, 62A01, 28E05

Read Abstract +

Debiased inverse-variance weighted estimator in two-sample summary-data Mendelian randomization

Ting Ye, Jun Shao, Hyunseung Kang

Ann. Statist. 49 (4), 2079-2100, (August 2021) DOI: 10.1214/20-AOS2027 Open Access

KEYWORDS: Causal inference, inverse variance weighted estimator, many weak instruments, Mendelian randomization, summary data, 62E30, 60K35, 46N60, 62P10

Read Abstract +

Boosted nonparametric hazards with time-dependent covariates

Donald K. K. Lee, Ningyuan Chen, Hemant Ishwaran

Ann. Statist. 49 (4), 2101-2128, (August 2021) DOI: 10.1214/20-AOS2028 Open Access

KEYWORDS: Survival analysis, gradient boosting, functional data, stepsize shrinkage, regression trees, likelihood functional, 62N02, 62G05, 90B22

Read Abstract +

Universal Bayes consistency in metric spaces

Steve Hanneke, Aryeh Kontorovich, Sivan Sabato, Roi Weiss

Ann. Statist. 49 (4), 2129-2150, (August 2021) DOI: 10.1214/20-AOS2029 Open Access

KEYWORDS: metric space, nearest neighbor, ‎classification‎, Bayes consistency, 54E70, 97K80, 62C12, 03E17, 03E55

Read Abstract +

Minimax optimal conditional independence testing

Matey Neykov, Sivaraman Balakrishnan, Larry Wasserman

Ann. Statist. 49 (4), 2151-2177, (August 2021) DOI: 10.1214/20-AOS2030 Open Access

KEYWORDS: Conditional independence, Minimax optimality, Hypothesis testing, 62G10

Read Abstract +

We consider the problem of conditional independence testing of X and Y given Z where and Z are three real random variables and Z is continuous. We focus on two main cases—when X and Y are both discrete, and when X and Y are both continuous. In view of recent results on conditional independence testing [Ann. Statist. 48 (2020) 1514–1538], one cannot hope to design nontrivial tests, which control the type I error for all absolutely continuous conditionally independent distributions, while still ensuring power against interesting alternatives. Consequently, we identify various, natural smoothness assumptions on the conditional distributions of as z varies in the support of Z, and study the hardness of conditional independence testing under these smoothness assumptions. We derive matching lower and upper bounds on the critical radius of separation between the null and alternative hypotheses in the total variation metric. The tests we consider are easily implementable and rely on binning the support of the continuous variable Z. To complement these results, we provide a new proof of the hardness result of Shah and Peters [Ann. Statist. 48 (2020) 1514–1538].

Estimation of the number of components of nonparametric multivariate finite mixture models

Caleb Kwon, Eric Mbakop

Ann. Statist. 49 (4), 2178-2205, (August 2021) DOI: 10.1214/20-AOS2032 Open Access

KEYWORDS: Finite mixture model, latent model, nonparametric mixture, Conditional independence, multivariate data, 62G05, 62G15, 62H30, 47A55, 47G10, 47N30

Read Abstract +

We propose a novel estimator for the number of mixture components (denoted by M) in a nonparametric finite mixture model. The setting that we consider is one where the analyst has repeated observations of variables that are conditionally independent given a finitely supported latent variable with M support points. Under a mild assumption on the joint distribution of the observed and latent variables, we show that an integral operator T that is identified from the data has rank equal to M. We use this observation, in conjunction with the fact that singular values of operators are stable under perturbations, to propose an estimator of M, which essentially consists of a thresholding rule that counts the number of singular values of a consistent estimator of T that are greater than a data-driven threshold. We prove that our estimator of M is consistent, and establish nonasymptotic results, which provide finite sample performance guarantees for our estimator. We present a Monte Carlo study, which shows that our estimator performs well for samples of moderate size.

Robust k-means clustering for distributions with two moments

Yegor Klochkov, Alexey Kroshnin, Nikita Zhivotovskiy

Ann. Statist. 49 (4), 2206-2230, (August 2021) DOI: 10.1214/20-AOS2033 Open Access

KEYWORDS: clustering, K-means, robust estimation, excess distortion bounds, 62F35, 62F12

Read Abstract +

We consider the robust algorithms for the k-means clustering problem where a quantizer is constructed based on N independent observations. Our main results are median of means based nonasymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of (Ann. Statist. 9 (1981) 135–140) who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in . In a special case of clustering in , under two bounded moments, we prove matching (up to constant factors) nonasymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators.

On the rate of convergence of fully connected deep neural network regression estimates

Michael Kohler, Sophie Langer

Ann. Statist. 49 (4), 2231-2249, (August 2021) DOI: 10.1214/20-AOS2034 Open Access

KEYWORDS: curse of dimensionality, deep learning, neural networks, Nonparametric regression, rate of convergence, 62G08, 41A25, 82C32

Read Abstract +

Infinite-dimensional gradient-based descent for alpha-divergence minimisation

Kamélia Daudel, Randal Douc, François Portier

Ann. Statist. 49 (4), 2250-2270, (August 2021) DOI: 10.1214/20-AOS2035 Open Access

KEYWORDS: variational inference, alpha-divergence, Kullback–Leibler divergence, Mirror Descent, 62F15, 62F30, 62F35, 62G07, 62L99

Read Abstract +

This paper introduces the -descent, an iterative algorithm which operates on measures and performs α-divergence minimisation in a Bayesian framework. This gradient-based procedure extends the commonly-used variational approximation by adding a prior on the variational parameters in the form of a measure. We prove that for a rich family of functions Γ, this algorithm leads at each step to a systematic decrease in the α-divergence and derive convergence results. Our framework recovers the Entropic Mirror Descent algorithm and provides an alternative algorithm that we call the Power Descent. Moreover, in its stochastic formulation, the -descent allows to optimise the mixture weights of any given mixture model without any information on the underlying distribution of the variational parameters. This renders our method compatible with many choices of parameters updates and applicable to a wide range of Machine Learning tasks. We demonstrate empirically on both toy and real-world examples the benefit of using the Power Descent and going beyond the Entropic Mirror Descent framework, which fails as the dimension grows.

Monitoring for a change point in a sequence of distributions

Lajos Horváth, Piotr Kokoszka, Shixuan Wang

Ann. Statist. 49 (4), 2271-2291, (August 2021) DOI: 10.1214/20-AOS2036 Open Access

KEYWORDS: change point detection, Empirical quantile function, sequential monitoring, Wasserstein distance, 62G30, 62L10, 62G10, 62G20

Read Abstract +

We propose a method for the detection of a change point in a sequence of distributions, which are available through a large number of observations at each . Under the null hypothesis, the distributions are equal. Under the alternative hypothesis, there is a change point , such that for and some unknown distribution G, which is not equal to . The change point, if it exists, is unknown, and the distributions before and after the potential change point are unknown. The decision about the existence of a change point is made sequentially, as new data arrive. At each time i, the count of observations, N, can increase to infinity. The detection procedure is based on a weighted version of the Wasserstein distance. Its asymptotic and finite sample validity is established. Its performance is illustrated by an application to returns on stocks in the S&P 500 index.

What is resolution? A statistical minimax testing perspective on superresolution microscopy

Gytis Kulaitis, Axel Munk, Frank Werner

Ann. Statist. 49 (4), 2292-2312, (August 2021) DOI: 10.1214/20-AOS2037 Open Access

KEYWORDS: Microscopy, (super)resolution, nanoscopy, minimax, Detection boundary, equivalence of experiments, 91B06, 94A12, 60F05

Read Abstract +

The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning

Léo Miolane, Andrea Montanari

Ann. Statist. 49 (4), 2313-2335, (August 2021) DOI: 10.1214/20-AOS2038 Open Access

KEYWORDS: Linear regression, Sparsity, Lasso, cross-validation, 62J05, 62J07

Read Abstract +

The Lasso is a popular regression method for high-dimensional problems in which the number of parameters , is larger than the number n of samples: . A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser, in a denoising problem in which the parameters are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit , pointwise in the parameters θ and in the value of the regularization parameter.

Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to θ belonging to balls, , and with respect to the regularization parameter. This allows us to derive sharp results for the performances of various data-driven procedures to tune the regularization.

Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon’s minimax theorem developed by Thrampoulidis, Oymak and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance that allows one to characterize properties of the minimizer itself.

Optimal change-point estimation in time series

Ngai Hang Chan, Wai Leong Ng, Chun Yip Yau, Haihan Yu

Ann. Statist. 49 (4), 2336-2355, (August 2021) DOI: 10.1214/20-AOS2039 Open Access

KEYWORDS: Bayes-type estimator, Confidence interval, double-sided random process, piecewise stationary time series, structural break, 62M10, 60F25

Read Abstract +

Propriety of the reference posterior distribution in Gaussian process modeling

Joseph Muré

Ann. Statist. 49 (4), 2356-2377, (August 2021) DOI: 10.1214/20-AOS2040 Open Access

KEYWORDS: Gaussian process, kriging, correlation function, Jeffreys prior, reference prior, integrated likelihood, posterior propriety, 62F15, 62M30, 60G15

Read Abstract +

In a seminal article, Berger, De Oliveira and Sansó [J. Amer. Statist. Assoc. 96 (2001) 1361–1374] compare several objective prior distributions for the parameters of Gaussian process models with isotropic correlation kernel. The reference prior distribution stands out among them insofar as it always leads to a proper posterior. They prove this result for rough correlation kernels: Spherical, Exponential with power , Matérn with smoothness . This paper provides a proof for smooth correlation kernels: Exponential with power , Matérn with smoothness , Rational Quadratic, along with tail rates of the reference prior for these kernels.

Community detection with dependent connectivity

Yubai Yuan, Annie Qu

Ann. Statist. 49 (4), 2378-2428, (August 2021) DOI: 10.1214/20-AOS2042 Open Access

KEYWORDS: Bahadur representation, high-order approximation, multiple networks, Stochastic block model, variational EM, 62R07, 62E17

Read Abstract +

Correction note

Correction note: “Statistical inference for the mean outcome under a possibly nonunique optimal treatment rule”

Alex Luedtke, Aurélien Bibaut, Mark van der Laan

Ann. Statist. 49 (4), 2429-2430, (August 2021) DOI: 10.1214/20-AOS2031 Open Access

KEYWORDS: Efficient estimator, nonregular inference, online estimation, optimal treatment, Pathwise differentiability, Semiparametric model, optimal value, 62G05, 62N99

No abstract available

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS