Open Access
April 2020 Prediction error after model search
Xiaoying Tian
Ann. Statist. 48(2): 763-784 (April 2020). DOI: 10.1214/19-AOS1818

Abstract

Estimation of the prediction error of a linear estimation rule is difficult if the data analyst also uses data to select a set of variables and constructs the estimation rule using only the selected variables. In this work, we propose an asymptotically unbiased estimator for the prediction error after model search. Under some additional mild assumptions, we show that our estimator converges to the true prediction error in $L^{2}$ at the rate of $O(n^{-1/2})$, with $n$ being the number of data points. Our estimator applies to general selection procedures, not requiring analytical forms for the selection. The number of variables to select from can grow as an exponential factor of $n$, allowing applications in high-dimensional data. It also allows model misspecifications, not requiring linear underlying models. One application of our method is that it provides an estimator for the degrees of freedom for many discontinuous estimation rules like best subset selection or relaxed Lasso. Connection to Stein’s Unbiased Risk Estimator is discussed. We consider in-sample prediction errors in this work, with some extension to out-of-sample errors in low-dimensional, linear models. Examples such as best subset selection and relaxed Lasso are considered in simulations, where our estimator outperforms both $C_{p}$ and cross validation in various settings.

Citation

Download Citation

Xiaoying Tian. "Prediction error after model search." Ann. Statist. 48 (2) 763 - 784, April 2020. https://doi.org/10.1214/19-AOS1818

Information

Received: 1 November 2016; Revised: 1 January 2019; Published: April 2020
First available in Project Euclid: 26 May 2020

zbMATH: 07241568
MathSciNet: MR4102675
Digital Object Identifier: 10.1214/19-AOS1818

Subjects:
Primary: 62F12 , 62H12
Secondary: 62F07 , 62J07

Keywords: Degrees of freedom , model search , prediction error , SURE

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 2 • April 2020
Back to Top