In kernel density estimation, those data values that make a nondegenerate contribution to the estimator (computed at a given point) tend to be spaced well apart. This property has the effect of suppressing many of the conventional consequences of long-range dependence, for example, slower rates of convergence, which might otherwise be revealed by a traditional loss-or risk-based assessment of performance. From that viewpoint, dependence has to be very long-range indeed before a density estimator experiences any first-order effects. However, an analysis in terms of the convergence rate for a particular realization, rather than the rate averaged over all realizations, reveals a very different picture. We show that from that viewpoint, and in the context of functions of Gaussian processes, effects on rates of convergence can become apparent as soon as the boundary between short- and long-range dependence is crossed. For example, the distance between ISE- and MISE-optimal bandwidths is generally of larger order for long-range dependent data. We shed new light on cross-validation, too. In particular we show that the variance of the cross-validation bandwidth is generally larger for long-range dependent data, and that the first-order properties of this bandwidth do not depend on how many data are left out when constructing the cross-validation criterion. Moreover, for long-range dependent data the cross-validation bandwidth is usually perfectly negatively correlated, in the limit, with the optimal stochastic bandwidth.
"Effect of dependence on stochastic measures of accuracy of density estimations." Ann. Statist. 30 (2) 431 - 454, April 2002. https://doi.org/10.1214/aos/1021379860