Asymptotic optimality of a multivariate version of the generalized cross validation in adaptive smoothing splines

We consider an adaptive smoothing spline with a piecewiseconstant penalty function λ(x), in which a univariate smoothing parameter λ in the classic smoothing spline is converted into an adaptive multivariate parameter λ. Choosing the optimal value of λ is critical for obtaining desirable estimates. We propose to choose λ by minimizing a multivariate version of the generalized cross validation function; the resulting estimator is shown to be consistent and asymptotically optimal under some general conditions—i.e., the counterparts of the nice asymptotic properties of the generalized cross validation in the ordinary smoothing spline are still provable. This provides theoretical justification of adopting the multivariate version of the generalized cross validation principle in adaptive smoothing splines. MSC 2010 subject classifications: Primary 62G20; secondary 62G08.


Introduction
We consider the problem of estimating an unknown function f (•) given observations where the design points x i follow a strictly positive continuous density function and ǫ i are independent random noise with mean 0 and unknown variance σ 2 .Without loss of generality, we assume that the domain of f is [0, 1].Smoothing spline is one of the most popular methods for estimating f .Let f (m) denote the 160 H. Kim and X. Huo mth derivative of f .The smoothing spline estimator is the unique solution of the following problem where W m 2 [0, 1] is the mth order Sobolev space defined as {f : f (j) is absolutely continuous for j = 0, . . ., m − 1, and f (m) ∈ L 2 [0, 1]} where L 2 [0, 1] is the space of squared integrable functions, and λ is a smoothing parameter that controls the trade-off between smoothness of the estimated function (the second term) and the goodness of fit (the first term).Large values of λ produce smoother functions while smaller values produce more wiggly functions.For the automatic choice of λ, many procedures have been proposed, including cross validation (CV) (Stone, 1974), generalized cross validation (GCV) (Craven and Wahba, 1979), and Mallow's C p (Mallows, 1973).
Even though smoothing splines have been shown to perform well in many examples, if the underlying function is spatially nonhomogeneous, traditional smoothing splines will fail to capture the varying degrees of smoothness properly.In practice, there are various types of functions with varying smoothness, and four popular scenarios in Donoho and Johnstone (1995) are illustrated in Fig. 1.The functions in Fig. 1 change rapidly in some regions while being smooth in others.If we choose the global smoothing parameter to be relatively small, the resulting spline estimate will describe the function well in the regions of large variations, however it will under-smooth in other regions.On the other hand, if the global smoothing parameter is chosen to be relatively large, then the estimated function will be over-smoothed in the regions of large variations.This indicates that in fitting functions with varying roughness, using a global smoothing parameter is not sufficient.
To resolve such a problem and attain more flexible estimation of the function, there have been attempts to allow for the smoothing parameter to vary adaptively with x (Abramovich and Steinberg, 1996;Pintore, Speckman and Holmes, 2006;Storlie, Bondell and Reich, 2010;Liu and Guo, 2010;Kim and Huo, 2012;Wang, Du and Shen, 2013).Instead of (2), the following minimization problem has been considered in the framework of smoothing splines: where λ(x) is a variable smoothing parameter-a function of x.
In the framework in (3), one popular approach in deriving the solution is to discretize λ(x).That is, λ(x) is approximated by a step function, i.e., a piecewise constant function.Pintore, Speckman and Holmes (2006) assumed an equal-size piecewise structure for λ(x).The number of jumps and the jump locations need to be prespecified.Then the step function is estimated by minimizing the multivariate version of the generalized cross validation.Liu and Guo (2010) extended the work of Pintore, Speckman and Holmes (2006).They also assume a step function for λ(x), but the segmentation is data-driven.The number of jumps and the jump locations are chosen based on the structure of data.Then the step function is estimated by maximizing the generalized likelihood.In these methods, the optimal choice of the multivariate smoothing parameter and the associated asymptotic properties have not been studied.Wang, Du and Shen (2013) develop a general framework for asymptotic analysis of adaptive smoothing splines with the use of the Green's function.However, they require some highly technical mathematical knowledge, and the assumptions made in their paper to establish theoretical results seem to be very strong, e.g., one assumption is that the underlying true function is 2m-times continuously differentiable.
With the assumption that λ(x) is approximated by a step function, we study an optimal choice for λ(x).We consider a discretized version of λ(x) with knots Then we need to estimate η = (η 1 , η 2 , . . ., η k ) whose dimension is k.For the optimal choice of η, we propose to use the multivariate version of the generalized cross validation (mGCV).We show that under some moderate conditions, if we choose η by minimizing the mGCV, then the resulting estimate is consistent in the sense that the true loss tends to zero as the sample size goes to infinity, and is asymptotically optimal in the sense that it achieves the smallest possible loss in probability when the sample size goes to infinity.Our theoretical analysis depends only on simple linear algebra and calculus.In approximating λ(x) by a step function, we allow the discretization of λ(x) to be as flexible as possible: our theoretical results cover all possible cases of step functions assumed for λ(x) with flexible step size and step number, regardless of the location of s 1 , . . ., s k and the number of k.
For the ordinary smoothing splines (i.e., with univariate λ), the optimal choice of λ has been extensively studied.In particular, it has been shown that if one chooses λ via GCV, the resulting estimate has nice asymptotic properties (Craven and Wahba, 1979;Li, 1985Li, , 1986)).However, for the adaptive smoothing spline, similar study on the optimal choice of λ(x) (or its discretized version-multivariate smoothing parameter) has not appeared.The main contribution of this paper is to show that the adaptive smoothing spline estimator with the mGCV choice of the multivariate smoothing parameter has the same asymptotic properties as the ones established for the ordinary smoothing splines-consistency and asymptotic optimality.This paper focuses on the theoretical study of the mGCV.Nevertheless, we present some numerical study of the mGCV in Section 4 to show the practical advantage of the mGCV.
It is worth mentioning another popular approach to solve (3).Instead of considering a step function, there have been attempts to assume a particular continuously varying penalty function for λ(x).Abramovich and Steinberg (1996) assumed that λ(x) is proportional to (f (2) (x)) −2 , and Storlie, Bondell and Reich (2010) assumed that λ(x) is proportional to (|f (2) (x)| + δ) −2γ that allows more flexibility due to two more tuning parameters δ and γ.Kim and Huo (2012) derived an asymptotically optimal local penalty function for λ(x).
The rest of the paper is organized as follows.In Section 2, we review ordinary smoothing splines and the justification of GCV.In Section 3, we study asymptotic properties of the mGCV in choosing the multivariate smoothing parameter in adaptive smoothing splines.In Section 4, we show the practical effectiveness of the mGCV via simulations.We conclude in Section 5.

A review of ordinary smoothing splines and GCV
We review ordinary smoothing splines and the justification of GCV in Section 2.1 and Section 2.2, respectively.

A review of ordinary smoothing splines
We briefly review ordinary smoothing splines.For more details, we refer to Green and Silverman (1994) and Eubank (1999).Throughout this paper, we focus on the cubic smoothing splines (i.e., with m = 2 in (2)) which are the most commonly used splines in practice.Using the cubic smoothing splines, f is estimated by minimizing the following objective function: (The last equation indicates that we restrict ourselves to the case of equally spaced samples-more general case is approachable, however not described here.)Using these notations, the objective function in (4) can be restated as where δ = (δ 2 , . . ., δ n−1 ) T and M is defined to be the (n − 2) × (n − 2) matrix with elements m ij , given by m ii = 2h 3 for i = 1, . . ., n − 2; m i,i+1 = m i+1,i = h 6 for i = 1, . . ., n − 3; zeros elsewhere.Using the fact that Mδ = Qf , one can find the minimizer of J(λ; f ) as follows: where Q is defined to be the (n − 2) × n matrix with elements q ij , given by q ii = q i,i+2 = 1 h and q i,i+1 = −2 h for i = 1, . . ., n − 2, and zeros elsewhere.

A review of generalized cross validation
In (6), the choice of λ is an important issue, and many procedures have been proposed for the optimal choice of λ.One of the most popular procedures is the generalized cross validation (GCV) (Craven and Wahba, 1979).GCV selects λ by minimizing where A(λ) is the smoothing matrix that satisfies f = A(λ)y (i.e., A(λ) • indicates the Euclidean norm that will be used throughout the paper, and 'tr' denotes trace.In the remainder of this subsection, we justify the adoption of GCV: if we choose λ by minimizing GCV, the resulting estimate minimizes the true loss for estimating f with f (λ) defined by In the following, we use A n (λ) (instead of A(λ)) to integrate the sample size n.Similarly, f n denote f when the sample size is n.
Theorem 1 (Li (1985)).Consider the following Stein's estimate fn (λ), the associated Stein's unbiased risk estimator (SU RE n (λ)), and the loss Ln (λ) while estimating f n by fn (λ): Under the following conditions: (C.1)The 4th moment of ǫ ′ i s are upper bounded by a constant, where ǫ i are random noise in (1), (C.2) There exists a constant K, such that ∀a > 0, for any δ > 0, we have sup Theorem 1 demonstrates that SU RE n (λ) is a uniformly consistent estimator of Ln (λ).Also note that minimizing the GCV function in ( 7) is equivalent to minimizing the SU RE n (λ).In Theorem 1, the conditions (C.1) and (C.2) can be replaced by the following condition The asymptotic equivalence between fn (λ) and fn (λ) are known as in the following theorem due to Li (1986).
Theorem 2 (Li (1986)).For any sequence and under the condition that fn (λ n ) and fn (λ n ) are asymptotically indistinguishable in the sense that Let λG denote the value of λ chosen by minimizing GCV in (7).It is known (Li, 1986) that if f is not a polynomial of degree 2 or less, (11) holds, and that if x i are equispaced and ǫ i are i.i.d.N (0, σ 2 ), ( 9) and ( 10) hold with λ n = λG .Under these general conditions, therefore, fn ( λG ) and fn ( λG ) are asymptotically equivalent due to Theorem 2. Together with Theorem 1, this demonstrates that if we choose λ by minimizing GCV, the resulting smoothing spline estimate fn ( λG ) minimizes the true loss L n (λ) in (8).

Adaptive smoothing splines and mGCV
We describe an optimization approach to achieve desirable spatial adaptation in the framework of (3).This section is organized as follows.We introduce a penalization method for spatial adaptation in Section 3.1.The key idea is to turn a univariate λ in Section 2.1 to a function of the location, λ(x), which is subsequently discretized to a multivariate smoothing parameter.For the choice of the multivariate smoothing parameter, a multivariate version of the GCV (mGCV) is suggested in Section 3.2.The consistency and asymptotic optimality of the mGCV are shown in Section 3.3 and Section 3.4, respectively.

A strategy to achieve spatial adaptivity
The essence of achieving spatial adaptivity is to utilize λ(x), which is a function of location x, instead of constant λ.We assume that λ(x) is absolutely continuous nearly everywhere except for a set of points whose measure is zero.As described in Section 1, we consider a discretized version of λ(x): λ(x) and s i ∈ {x 1 , . . ., x n }.Then we need to estimate η = (η 1 , η 2 , . . ., η k ) whose dimension is k.The objective of this paper is to provide theoretical justification of the mGCV for choosing η by proving that the mGCV choice of η is asymptotically optimal.
For our theoretical results to cover all possible cases of the step function assumed for λ(x) with flexible step size and step number, we allow the discretization of λ(x) to be as flexible as possible.By adopting a new sequence λ = (λ 1 , . . ., λ n−1 ), we consider the most general case of the discretization: we assume that λ(x) Then estimating η is equivalent to estimating λ η which is a special case of λ and is defined as where n i is the number of design points included in [s i , s i+1 ), i = 1, . . ., k, and To establish the asymptotic optimality of the mGCV for all possible cases of the step function, it suffices to prove the asymptotic optimality for the most general case, i.e., λ.For this reason, λ will be considered rather than η in our theoretical analysis throughout this paper.Then our established theoretical results cover all possible cases of the step function for λ(x), including an equal-size piecewise constant penalty function with a few steps, which is considered in Pintore, Speckman and Holmes (2006).We have the following theorem.
, where h, Q, M were previously defined, and 2 , 1 ≤ i ≤ n − 3, and zeros elsewhere.See Appendix A for the proof of Theorem 3. We will propose to use the mGCV for the optimal choice of λ in Section 3.2.

Multivariate version of GCV (mGCV)
For the estimator f (λ), the choice of the multivariate smoothing parameter λ is important for appropriate spatial adaptation.We suggest to use the multivariate version of generalized cross validation (mGCV) defined as where S(λ) denotes the smoothing matrix which satisfies f (λ) = S(λ)y.
Parallel to Theorem 1, we establish the uniform consistency of SU RE n (λ) in the following theorem.Together with the fact that minimizing mGCV is equivalent to minimizing SU RE n (λ), the following theorem gives a justification of the mGCV.In the following, we use S n (λ) (instead of S(λ)) to take into account the sample size n.
Theorem 4. Consider the following Stein estimate fn (λ), the associated unbiased risk estimate (SU RE n (λ)), and the true loss Ln (λ) while estimating f n by fn (λ): and Then SU RE n (λ) is a uniformly consistent estimate of Ln (λ) over f n and λ: For any δ > 0, sup Proof of the above theorem is in Appendix B. Theorem 4 establishes the uniform consistency between SU RE n (λ) and Ln (λ) which involves fn (λ).To consider the original loss L n (λ) for estimating f n with fn (λ), we establish the asymptotic equivalence between fn (λ) and fn (λ) in the following theorem.
Theorem 5.For any λ such that under the following condition, Proof of the above theorem is in Appendix C.
(A.1) states that the optimal rate of convergence of EL n (λ) to zero must be slower than n −1 .For (A.1), in the typical polynomial spline smoothing problems, inf λ>0 EL n (λ) tends to zero at the rate n −1+δ for some small constant δ > 0 except if the underlying function is the sampled values of a low order polynomial (Wahba, 1985).In our framework, we need to study when (A.1) holds-this is an open problem.
Let λmG denote the value of λ chosen by minimizing the mGCV function in (13).Using Theorem 5, we can show that under certain conditions, fn ( λmG ) and fn ( λmG ) are asymptotically equivalent, which will be established in Theorem 9 in Section 3.3.

Consistency of mGCV
We say that fn (λ) is consistent if L n (λ) → 0 as n → ∞, where L n (λ) is the loss while estimating f n by fn (λ), i.e., L n (λ) = n −1 f n − fn (λ) 2 .In this section, we show that if we choose λ via mGCV, then the resulting f is consistent.To establish the consistency of mGCV, we need the following two conditions: For any m such that m/n → 0, we have The following theorem establishes the consistency of mGCV.
Proof of the above theorem is in Appendix D. The above (A.2) involves eigenvalues, and seems hard to verify.The following theorem provides a simple sufficient condition for it.
Theorem 7.For the estimator f (λ), if where max(λ n ) and min(λ n ) denote the maximal and minimal values among Proof of the above theorem is in Appendix E. For (A.3), it is known (Li, 1985, Theorem 5.5) that such λ n exists for the nonadaptive case if x i 's are equispaced and ǫ i 's are i.i.d.N (0, σ 2 ).Since the nonadaptive smoothing spline is a special case of the spatially adaptive smoothing spline, under the same conditions, (A.3) holds for the spatially adaptive smoothing splines too.Then we have the following corollary, together with Theorem 7: Corollary 8.If x i 's are equispaced and ǫ i 's are i.i.d.N (0, σ 2 ), fn ( λmG ) is consistent, provided that max( λmG )/ min( λmG ) < Constant as n → ∞.
In the next theorem, using Theorem 5 and Theorem 6, we show that under certain conditions, fn ( λmG ) and fn ( λmG ) are asymptotically indistinguishable.
Proof of the above theorem is in Appendix F. Together with Theorem 4, Theorem 9 demonstrates that if we choose λ by minimizing mGCV, the resulting estimate fn ( λmG ) asymptotically minimizes the true loss L n (λ), which is a direct measure for estimating f n by fn (λ).

Asymptotic optimality of mGCV
In a series of historic papers (Li, 1985(Li, , 1986;;Girard, 1991), asymptotic optimality has been established for the GCV.In this section, we establish the asymptotic optimality of the mGCV in the same sense as in Li (1986).The asymptotic optimality in adaptive smoothing splines is defined as follows: where λmG is the minimizer of the mGCV function in (13).Under certain conditions, the mGCV method of selecting λ is asymptotically optimal as indicated by the following theorem.
Proof of the above theorem is in Appendix G. Theorem 10 states that under certain conditions, the mGCV choice, λmG , and the optimal value of λ behave the same for sufficiently large n in terms of the corresponding values of the loss.It also says that L n ( λmG ) will tend toward the minimal loss as n → ∞.
Recall that λ has the dimension of the sample size minus one.Although this λ allows the most flexible adaptation to varying roughness, it may not be computationally efficient.For more efficient computation, it is generally recommended to reduce the dimension of λ by assuming a step function for λ(x) with the number of jumps much less than the sample size, such as λ η in (12).As a special case of Theorem 10, it can be easily shown that if we choose λ η by minimizing the mGCV, the asymptotic optimality still holds.Under the following conditions (A.1 ′ ), (A.2 ′ ), and (A.3 ′ ): For any m such that m/n → 0, we have where λη,mG is the value of λ η chosen by minimizing the mGCV function in (13) with λ replaced by λ η .

Numerical study
The main objective of this paper is to provide the theoretical justification for employing the mGCV in adaptive smoothing splines.Nevertheless, we present some numerical study of the mGCV to show its practical effectiveness.We suggest a simple segment-based search algorithm to estimate a piecewise constant penalty function using the mGCV, which has proved to work well in practice: 1. Assume the initial λ = (λ 1 , . . ., λ n−1 ) = (λ, . . ., λ), where λ is chosen via the GCV. 2. Given λ from the step 1, define a sequence λ1 (i 0 , i 1 , α) = ( λ1 1 , . . ., λ1 n−1 ) where we impose the following: if i 0 ≤ i ≤ i 1 , λ1 i = λ i + α; and λ1 i = λ i , otherwise.We find i 0 , i 1 , α, such that the mGCV( λ1 ) is minimized.This step can be done via an extensive search.3. Given λ1 from the step 2, define a sequence λ2 (β) such that for 1 We find β such that the mGCV( λ2 ) is minimized.This step can be done via an extensive search.We declare the convergence and terminate, if β is close enough to 1 or the newly obtained minimum value of the mGCV is larger than the one at the previous iteration.Otherwise, bring λ2 back to the step 2 with λ replaced by λ2 .The final λ2 is our estimate of the multivariate smoothing parameter.(top and bottom rows, respectively).In (a), the estimated function using our method (solid) is shown, together with the result from the classic smoothing spline (dashed).In (b), our multivariate smoothing parameter (solid) is shown, together with the global smoothing parameter from the standard GCV (dashed).
Even though the asymptotic optimality of the mGCV holds for the most general case of the piecewise constant penalty function, assuming a small number of pieces is recommended in practice for more efficient computation.In our numerical study, we force the number of pieces to be small by specifying the step size of the above search algorithm to be large enough.The suggested simple algorithm only guarantees a local minimum of the mGCV function.Nevertheless, promising numerical results have been obtained in our simulation study: we found that our method outperforms or performs comparably with other competitive methods.As an alternative, well-known nonlinear optimization algorithms (e.g., Nelder-Mead, quasi-Newton, and conjugate gradient methods) can be used to minimize the mGCV function.However, researching on the best numerical strategy is beyond the scope of this paper.
In Fig. 2, we illustrate our numerical results using the two popular examples of the Doppler and Bumps functions introduced in Fig. 1.For both examples, we consider 128 data points sampled regularly on [0, 1] and the signalto-noise ratio of 7.Each row shows the estimated function in (a) and the estimated smoothing parameter in log scale in (b).Our results are shown in a solid line, together with the results from the standard smoothing spline in a dashed line.For the choice of the smoothing parameter in the standard smoothing spline, we use the GCV.We observe that our estimated functions in (a) show much better agreement with the true functions than the standard smoothing splines do.We also observe that our estimated multivariate smoothing parameter in (b) is spatially adaptive: it has relatively small values in the regions of large local variations, and large values in the regions of small local variations.
Based on repeated simulations, we further verify the performance of our method using the Doppler and Bumps examples via the following four ways: • Comparison with the traditional smoothing splines (denoted as SS): This comparison will show the advantage of adopting the adaptive smoothing parameter over the global smoothing parameter.The global smoothing parameter for SS is chosen via the GCV.• Comparison with the spatially adaptive smoothing splines (denoted as SASS) in Pintore, Speckman and Holmes ( 2006): SASS assumes an equalsize piecewise constant penalty function.5 and 10 pieces are assumed for the Doppler and Bumps functions, respectively.This comparison will show the advantage of adopting more flexible structure on the piecewise constant penalty function rather than the equal-size pieces.• Comparison with the locally optimal adaptive smoothing splines (denoted as LOASS) in Kim and Huo (2012): LOASS assumes an asymptotically optimal local penalty function that is continuously varying.This comparison will show the advantage of adopting a piecewise constant penalty function rather than the continuously varying penalty function.• Comparison with the Wavelet shrinkage method in Donoho and Johnstone (1995): Wavelet shrinkage has emerged to be a powerful nonparametric smoothing method.We choose the Symmlet wavelets with 8 vanishing moments, and the coarsest level is set to be 4.This comparison will show the effectiveness of our spline-based method for fitting functions with varying roughness.
We compare the mean squared error (MSE) for fitting the two functions.The MSE is defined as For each function, we consider 128, 256, 512 data points sampled regularly on [0, 1] and the signal-tonoise ratio of 7.For each example, we run 100 experiments, and then take the averaged MSE as a performance measure.The results are summarized in Table 1: the averaged MSE (in bold-face if it is the smallest one) with the standard error (in parentheses) are reported.The results from the competitive methods are reproduced from Kim and Huo (2012).For our method to be computationally efficient, we intentionally made the number of pieces in the estimated piecewise constant penalty function to be small by setting the step size of the search algorithm large enough.As a result, the average of the number of pieces in our estimated penalty function was obtained as 5 and 8 for the Doppler and Bumps functions, respectively.In Table 1, we observe that our method outperforms or performs competitively with other methods.

Conclusion
The asymptotic optimality of the generalized cross validation (GCV) is well known in smoothing splines.The multivariate version of the GCV (mGCV) is more flexible in practice, and has been used to achieve spatial adaptivity in smoothing splines.However, little is known about its theoretical property.
We show that the mGCV also has the asymptotic optimality, under general conditions that are comparable as those conditions in the case of GCV.Our analysis provides theoretical justification for employing the mGCV in choosing the multivariate smoothing parameter in spatially adaptive smoothing splines.
Appendix A: Proof of Theorem 3 We have where δ + i is the second derivative of f from the right at x i , and δ − i+1 is the second derivative of f from the left at x i+1 , that is, , where x + i = lim a→0,a>0 (x i + a) and x − i = lim a→0,a<0 (x i + a), and h (Pintore, Speckman and Holmes, 2006).Recall that λ(x) ≡ λ i for x ∈ [x i , x i+1 ], 1 ≤ i ≤ n − 1.Then as n → ∞, we have λ(x − i ) ≈ λ(x + i ), and consequently, δ + i ≈ δ − i .Then as n → ∞, (20) can be approximated as where δ i = f (2) (x i ), i = 2, . . ., n − 1, δ 1 = δ n = 0. Using (21), as n → ∞, we have where δ = (δ 2 , . . ., δ n−1 ) T and M(λ 2 , 1 ≤ i ≤ n − 3, and zeros elsewhere.Pintore, Speckman and Holmes (2006) showed that with the assumption of step function on λ(x), the solution f (λ) satisfies all conditions, but the continuity of the second derivative of f , to be a natural cubic spline.However, since δ + i ≈ δ − i as n → ∞, we can take advantage of the fact that Mδ ≈ Qf as n → ∞. (Note that the identity Mδ = Qf holds if and only if f (x) is a natural cubic spline.)Then, as n → ∞, using the fact that Mδ ≈ Qf and by considering the first order condition, from (22), f (λ) can be approximated as f (λ
We also need the following Lemmas 14, 15 and 16.
Proof.It can be proved similarly as in Li (1985): See the proof of Lemma 5.2 in Li (1985Li ( )[pp.1374Li ( -1376]].The key is to upper bound five terms in (7.3.8) in Li (1985) with a small quantity ǫ/5.Note that as n → ∞, our smoothing matrix S(λ) has the same canonical form of (4.9) in Li (1985) with λ i replaced by τ i .
A careful check of the proof in Li (1985) reveals that the same argument applies for arbitrary λ i , hence the above lemma can be established accordingly.
Lemma 15.For any sequence λ such that Recall that as n → ∞, we have S n ( λ) It is clear that the eigenvalues of I − S n ( λ) are τ i (1 + τ i ) −1 .Similarly as in Li (1985), let τ be the random variable taking values τ i with probability n −1 for each i.Then ( 43) means (Eτ (1 + τ It is known (Girard, 1991) that (A.2) can be replaced with the following weaker condition (A.2 ′ ): There exist constants p and q, 0 < p < q < 1 such that lim sup τ [pn] /τ [qn] < 1, where [x] denotes the greatest integer less than or equal to x.Since (44) implies that both τ Lemma 16.For any sequence λ such that mGCV n ( λ) → σ 2 , fn ( λ) is consistent if and only if n −1 trS n ( λ) → 0.
Applying operator sup Ωi inf x∈Ωi on all the three terms above, we have Notice that the first term above is γ 1 (A)d 2 i and the last term is γ n (A)d 2 i .

Fig 1 .
Fig 1. Illustration of four widely-studied functions with varying degrees of smoothness: (a) Blocks function, (b) Bumps function, (c) Heavisine function, (d) Doppler function.Solid curve is the true function and dots are noisy observations.

Fig 2 .
Fig 2. Illustration of our numerical results using the Doppler and Bumps functions(top and  bottom rows, respectively).In (a), the estimated function using our method (solid) is shown, together with the result from the classic smoothing spline (dashed).In (b), our multivariate smoothing parameter (solid) is shown, together with the global smoothing parameter from the standard GCV (dashed).

Table 1
The five competing methods are compared using the Doppler and Bumps functions.The averaged MSE based on 100 simulations are shown.The values in the parentheses are standard errors.In each case, the smallest MSE is indicated in bold-face