Open Access
August 2020 On Nearly Assumption-Free Tests of Nominal Confidence Interval Coverage for Causal Parameters Estimated by Machine Learning
Lin Liu, Rajarshi Mukherjee, James M. Robins
Statist. Sci. 35(3): 518-539 (August 2020). DOI: 10.1214/20-STS786


For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators $\hat{\psi}_{1}$ are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross-fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal $(1-\alpha)$ Wald confidence interval $\hat{\psi}_{1}\pm z_{\alpha/2}\widehat{\mathsf{s.e.}}[\hat{\psi}_{1}]$ may still undercover even in large samples, because the bias of $\hat{\psi}_{1}$ may be of the same or even larger order than its standard error of order $n^{-1/2}$.

In this paper, we introduce essentially assumption-free tests that (i) can falsify the null hypothesis that the bias of $\hat{\psi}_{1}$ is of smaller order than its standard error, (ii) can provide a upper confidence bound on the true coverage of the Wald interval, and (iii) are valid under the null under no smoothness/sparsity assumptions on the nuisance parameters. The tests, which we refer to as Assumption Free Empirical Coverage Tests (AFECTs), are based on a U-statistic that estimates part of the bias of $\hat{\psi}_{1}$.

Our claims need to be tempered in several important ways. First no test, including ours, of the null hypothesis that the ratio of the bias to its standard error is smaller than some threshold $\delta$ can be consistent [without additional assumptions (e.g., smoothness or sparsity) that may be incorrect]. Second, the above claims only apply to certain parameters in a particular class. For most of the others, our results are unavoidably less sharp. In particular, for these parameters, we cannot directly test whether the nominal Wald interval $\hat{\psi}_{1}\pm z_{\alpha/2}\widehat{\mathsf{s.e.}}[\hat{\psi}_{1}]$ undercovers. However, we can often test the validity of the smoothness and/or sparsity assumptions used by an analyst to justify a claim that the reported Wald interval’s actual coverage is no less than nominal. Third, in the main text, with the exception of the simulation study in Section 1, we assume we are in the semisupervised data setting (wherein there is a much larger dataset with information only on the covariates), allowing us to regard the covariance matrix of the covariates as known. In the simulation in Section 1, we consider the setting in which estimation of the covariance matrix is required. In the simulation, we used a data adaptive estimator which performs very well in our simulations, but the estimator’s theoretical sampling behavior remains unknown.


Download Citation

Lin Liu. Rajarshi Mukherjee. James M. Robins. "On Nearly Assumption-Free Tests of Nominal Confidence Interval Coverage for Causal Parameters Estimated by Machine Learning." Statist. Sci. 35 (3) 518 - 539, August 2020.


Published: August 2020
First available in Project Euclid: 11 September 2020

MathSciNet: MR4148230
Digital Object Identifier: 10.1214/20-STS786

Keywords: assumption-free , Causal inference , higher-order influence functions , U-statistics , valid inference

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.35 • No. 3 • August 2020
Back to Top