The projected covariance measure for assumption-lean variable significance testing

Anton Rask Lundborg; Ilmun Kim; Rajen D. Shah; Richard J. Samworth

doi:10.1214/24-AOS2447

Abstract

Testing the significance of a variable or group of variables X for predicting a response Y, given additional covariates Z, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for X is nonzero. However, when the model is misspecified, the test may have poor power, for example, when X is involved in complex interactions, or lead to many false rejections. In this work, we study the problem of testing the model-free null of conditional mean independence, that is, that the conditional mean of Y given X and Z does not depend on X. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of Y on X and Z using one-half of the data, and then to estimate the expected conditional covariance between this projection and Y on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

Funding Statement

ARL was supported by research grant 0069071 from the Novo Nordisk Fonden.
IK, RDS and RJS were supported by EPSRC programme grant EP/N031938/1; RJS was also supported by European Research Council Advanced grant 101019498.
IK was partly supported by the National Research Foundation of Korea (2022R1A4A1033384), and the Korea government (MSIT) (RS-2023-00211073).

Acknowledgements

The authors are grateful to the anonymous reviewers for their constructive feedback, which helped to improve the paper.

Citation

Download Citation

Anton Rask Lundborg. Ilmun Kim. Rajen D. Shah. Richard J. Samworth. "The projected covariance measure for assumption-lean variable significance testing." Ann. Statist. 52 (6) 2851 - 2878, December 2024. https://doi.org/10.1214/24-AOS2447

Information

Received: 1 November 2022; Revised: 1 May 2024; Published: December 2024

First available in Project Euclid: 18 December 2024

Digital Object Identifier: 10.1214/24-AOS2447

Subjects:

Primary: 62G10 , 62G20 , 62H20

Keywords: Conditional mean independence , Hypothesis testing , minimax power , sample splitting , spline regression

Abstract

Funding Statement

Acknowledgements

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS