We study the problem of estimating the smallest achievable mean-squared error in regression function estimation. The problem is equivalent to estimating the second moment of the regression function of $Y$ on $X\in{\mathbb{R}} ^{d}$. We introduce a nearest-neighbor-based estimate and obtain a normal limit law for the estimate when $X$ has an absolutely continuous distribution, without any condition on the density. We also compute the asymptotic variance explicitly and derive a non-asymptotic bound on the variance that does not depend on the dimension $d$. The asymptotic variance does not depend on the smoothness of the density of $X$ or of the regression function. A non-asymptotic exponential concentration inequality is also proved. We illustrate the use of the new estimate through testing whether a component of the vector $X$ carries information for predicting $Y$.

Electron. J. Statist., Volume 12, Number 1 (2018), 1752-1778.

Received: June 2017
First available in Project Euclid: 6 June 2018

Primary: 62G08: Nonparametric regression
Secondary: 62G20: Asymptotic properties

Regression functional Nearest-neighbor-based estimate Asymptotic normality Concentration inequalities Dimension reduction

Devroye, Luc; Györfi, László; Lugosi, Gábor; Walk, Harro. A nearest neighbor estimate of the residual variance. Electron. J. Statist. 12 (2018), no. 1, 1752--1778. doi:10.1214/18-EJS1438. https://projecteuclid.org/euclid.ejs/1528250442

