Abstract
For the random design nonparametric regression, cross-validation is a popular bandwidth selector. It is constructed by using the criterion of ``weighted" integrated square error. In practice, however, the weighting scheme by the design density in the criterion causes that its associated cross-validation function puts more emphasis in regions with more data, gives little attention to regions with few data, but has no consideration for regions without data. In such a case, the value of the cross-validated bandwidth depends on the distribution of the design points, but is independent of the location of the interval on which the regression function value is estimated. Hence, if there are sparse regions in the realization of the design, then the resulting cross-validated bandwidth is usually not large enough in magnitude such that its corresponding kernel regression function estimate has rough appearance in these sparse regions. To avoid this drawback to cross-validation, we suggest using the criterion of ''unweighted'' integrated square error to construct the bandwidth selector. Under the criterion, a bandwidth selector called integrated cross-validation is proposed, and the resulting bandwidth is shown to be asymptotically optimal. Empirical studies demonstrate that the kernel regression function estimate obtained by using our proposed bandwidth is better than that employing the ordinary cross-validated bandwidth, in both senses of having smoother appearance and yielding smaller sample unweighted integrated square error.
Citation
Tzu-Kuei Chang. Wen-Shuenn Deng. Jung-Huei Lin. C. K. Chu. "INTEGRATED CROSS-VALIDATION FOR THE RANDOM DESIGN NONPARAMETRIC REGRESSION." Taiwanese J. Math. 9 (1) 123 - 141, 2005. https://doi.org/10.11650/twjm/1500407750
Information