Abstract
Estimation of the density of regression errors is a fundamental issue in regression analysis and it is typically explored via a parametric approach. This article uses a nonparametric approach with the mean integrated squared error (MISE) criterion. It solves a long-standing problem, formulated two decades ago by Mark Pinsker, about estimation of a nonparametric error density in a nonparametric regression setting with the accuracy of an oracle that knows the underlying regression errors. The solution implies that, under a mild assumption on the differentiability of the design density and regression function, the MISE of a data-driven error density estimator attains minimax rates and sharp constants known for the case of directly observed regression errors. The result holds for error densities with finite and infinite supports. Some extensions of this result for more general heteroscedastic models with possibly dependent errors and predictors are also obtained; in the latter case the marginal error density is estimated. In all considered cases a blockwise-shrinking Efromovich–Pinsker density estimate, based on plugged-in residuals, is used. The obtained results imply a theoretical justification of a customary practice in applied regression analysis to consider residuals as proxies for underlying regression errors. Numerical and real examples are presented and discussed, and the S-PLUS software is available.
Citation
Sam Efromovich. "Estimation of the density of regression errors." Ann. Statist. 33 (5) 2194 - 2227, October 2005. https://doi.org/10.1214/009053605000000435
Information