Bayesian estimation in a high dimensional parameter framework ∗

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.


Introduction
In the last two decades, there have been numerous contributions for the statistical inference in function spaces, motivated by the analysis of high-dimensional data (see, for example, [3,6,7,11,14,15,16], among others).We particularly refer to the general overview on statistical estimation in high-dimensional spaces provided in Bosq and Blanke [7], and the references therein.In the cited references, special attention has been paid to the Hilbert valued random variable context.Specifically, Bosq [6] addresses the problem of infinite-dimensional parameter estimation and prediction of autoregressive Hilbert valued processes, deriving their asymptotic properties.In this framework, alternative projection methodologies have been considered in [1], for wavelet bases, in [4], for spline bases, in [17], for the autocorrelation operator's left and right eigenvector systems, among others.Recent developments in the context of directed acyclic graphs (also known as Bayesian networks) allow the efficient estimation of the adjacency matrix closely related with the influence matrix (see [18]).In particular, penalized likelihood estimation is considered in [19].These results can be applied to the estimation of the autocorrelation operator eigenvalues (defining the influence matrix) of an autoregressive Hilbertian process.We finally refer to the Bayesian framework adopted by [10] in the statistical analysis of highdimensional longitudinal, spatial and history data, in terms of suitable priors for the coefficients obtained after projection in general bases, e.g., spline bases.Special attention is paid to the semiparametric regression, and the generalized linear mixed effect model frameworks (see also [5], in relation to Bayesian prediction for stochastic processes).
The present paper provides new developments in relation to the comparison of componentwise Bayesian and classical parameter estimators, in terms of their asymptotic efficiency, for the l 2 valued Poisson process and Hilbert valued Gaussian random variable framework.Specifically, the main goal of the paper is to provide sufficient conditions, that ensure the same asymptotic efficiency of componentwise Bayesian and classical estimators of the infinite-dimensional parameters characterizing such models.For the l 2 valued Poisson process, conjugate Gamma priors on the intensity components are considered for their Bayesian estimation.Theorem 1 provides the asymptotic equivalence of the Bayesian and classical estimators of the infinite-dimensional intensity vector, as well as of the corresponding plug-in preditors of l 2 valued Poisson process.
In the Hilbert valued Gaussian random variable context, suitable ranges of the values of the projections of the mean function, with respect to the autocovariance eigenvector system, are obtained, for a more efficient performance of the functional mean Bayes estimator with respect to the classical one, when Bayesian estimation is achieved under the Gaussian conjugate family framework, in the finite sample size case.The asymptotic efficiency of Bayesian and classical estimators of the mean in the one-dimensional and infinite-dimensional cases is respectively obtained in Theorems 2 and 3, where the asymptotic equivalence of both estimators is also proved.Gamma priors are considered in the Bayesian estimation of the eigenvalues of the auto-covariance operator of a Hilbert valued Gaussian random variable.Sufficient conditions for the asymptotic equivalence of the Bayesian and classical estimators of the covariance operator are established in Theorem 4, with the explicit derivation of the common limit of their functional mean-square errors as well.
A simulation study is undertaken illustrating the results on asymptotic equivalence of Bayes, and classical estimators of the infinite-dimensional intensity vector, in the l 2 valued Poisson process case.Specifically, the empirical approximations of the functional mean-square errors of Bayes and classical estimators of the infinite-dimensional vector of intensities are computed.Different rates of convergence of the intensity components are analyzed.The relative empirical efficiency is calculated as well, showing that its value is one for the increasing sequence of sample sizes tested, in all the cases studied.The same statistics are computed for comparing classical and Bayesian estimators of the mean function of a Gaussian Hilbert valued random variable, when different rates of convergence of the eigenvalues of the covariance operator are considered, obtaining similar results in relation to the rate of convergence to zero of the empirical mean-square errors, when the sample size increases, as well as in relation to the empirical relative efficiency, which again is equal to one.
Special attention is devoted to the numerical results obtained in relation to the Bayesian and classical estimation of the covariance operator of a Hilbert valued zero-mean Gaussian random variable, since in that case the empirical results seem to depend on the rate of convergence of the auto-covariance operator eigenvalues.Specifically, for the finite sample sizes and for the truncation orders tested, the equivalence between Bayesian and classical estimators holds when the eigenvalues of the auto-covariance operator display an integer-order polynomial rate of convergence.However, for fractional-order polynomial and negative exponential rates, the relative efficiency of Bayes and classical estimators depends on the truncation order and the finite sample size tested.Namely, classical estimator outperforms the Bayesian estimator in the high-dimensional case corresponding to large truncation orders.
The remainder of the paper is organized as follows: Section 2 provides the fundamental concepts needed in the development of the paper, in relation to Hilbert valued parameter estimation and asymptotic efficiency.Section 3 shows the asymptotic equivalence of Bayesian and classical parameter estimators of the infinite-dimensional intensity vector of the l 2 valued Poisson process.The problem of estimation of the functional mean from the observation of a sample of independent and identically distributed (i.i.d.) Hilbert valued Gaussian random variables is addressed in Section 4.2, in a Bayesian and classical framework.Sufficient conditions for the asymptotic equivalence of the Bayes and classical estimators of the functional mean are derived as well.The Bayesian and classical estimation of the auto-covariance operator is studied in Section 5, where the asymptotic equivalence of both estimators is also established.Finally, a simulation study is undertaken in Section 6 providing additional information of the relative efficiency of Bayes and classical parameter estimators, when truncation is performed, and an increasing sequence of finite functional sample sizes is tested, for different rate of convergence to zero on the components of the infinite-dimensional parameters approximated.

Estimation and prediction in Hilbert spaces
In the following all the functions are supposed to be measurable and defined on a basic probability space (Ω, A, P ).Also, given some real separable Hilbert space (RSH), say G, its scalar product will be denoted by •, • G and its norm by • G .Now, let X be an observed random variable taking values in some real separable Hilbert space H.The distribution P θ of X depends on an unknown parameter θ ∈ Θ ⊆ Θ 0 , with Θ 0 being also a RSH.In this paper we address the problem of approximating g(X, θ) from s(X).If g(X, θ) = g(θ), we will refer to it as an estimation problem.It becomes a prediction problem if g(X, θ)=E θ (Y |X) where Y is also an H valued random variable such that E Y 2 H < ∞, and where E θ (. |X) denotes conditional expectation with respect to X.It can be seen that the prediction problem consisting of the estimation of the functional response Y coincides with the problem of approximating E θ (Y |X) (see [7]).Here, we will refer to the prediction problem only in the case of the l 2 valued Poisson process, since the asymptotic properties of the predictor directly follow from the ones derived for the corresponding infinite-dimensional intensity estimator.
In this paper, the quality of the approximation is measuring in the sense of the following preference relation where the norm • is taken in the RSH Θ 0 , when the estimation problem is addressed, i.e., when g(X, θ) = g(θ).The norm • is taken in the space H, when the prediction problem is considered, that is, when g(X, θ) = E θ (Y |X).However, a somewhat different point of view is to ask for which values of θ, s 1 (X) is better than s 2 (X)?This question will be addressed in Section 4.2.Specifically, sufficient and necessary conditions that ensure that the Bayesian estimator of the infinite-dimensional mean outperforms the classical estimator, in the sense of equation ( 1), are provided.
Through this paper we will say that two functional estimators, s 1n (X 1 , . . ., X n ) and s 2n (X 1 , . . ., X n ) are asymptotically equivalent if In this paper, we will refer to the case where s 1n and s 2n are componentwise classical and Bayesian estimators, respectively.In the next section, using criterion (2), we will compare classical and Bayesian estimators of the intensity of l 2 valued Poisson process.As commented, asymptotic comparison of the corresponding plug-in predictors then follows.

Estimation and prediction of a Poisson process in a Hilbert space
Let {N t,j , t ∈ R + , j ≥ 1} be a sequence of independent homogeneous Poisson processes with respective intensities it follows that j N 2 t,j < ∞ almost surely.Then, M t = {N t,j , j ≥ 0} defines a random variable with values in l 2 and satisfying E M t 2 < ∞.Thus, {M t , t ≥ 0} is a continuous time l 2 valued process.

Classical prediction
Assume that M T is observed and we want to predict M T +h (h > 0).For this purpose, we first define a MLE for {λ j , j ≥ 1}.Let N ⊂ l 2 be the family of sequences {x j , j ≥ 1} = (x j ) such that x j is an integer for each j and x j = 0, for sufficiently large j.We may write Clearly, N k is countable for every k.It then follows that N is countable since it is the countable union of countable sets.Thus, one may define the counting measure µ on N and extend it by setting µ(l 2 − N ) = 0.The obtained measure is then σ-finite.Now M t is considered as N valued random variable.Actually, since N 2 t,j is an integer, j N 2 t,j < ∞ (a.s.) implies that N t,j = 0, a.s.for j large enough.More precisely: There exists Ω 0 such that P (Ω 0 ) = 1, and for all ω ∈ Ω 0 , there exists a j 0 (ω, λ, T ) such that N T,j (ω) = 0, for j > j 0 .Therefore, one may define the likelihood of M T with respect to µ by setting hence, the MLE of (λ) is given by: Clearly, ∞ j=1 λ j,T < ∞ (a.s) and ( λ) T is unbiased ((λ) being considered as a parameter with values in l 2 ).It follows that is an unbiased predictor of M T +h and of Concerning efficiency, one may write Thus, f (M T ) is an unbiased efficient predictor provided the following definition: f (M T ) is said to be efficient if and only if each component of f (M T ) is an efficient predictor of the associated component of M T +h .

Bayesian prediction
In order to define a Bayesian parameter estimator of (λ), and a Bayesian predictor of M T +h , we set the a-priori distribution Γ(a j , b j ) on each λ j , with a j , b j > 0, for each j ≥ 1.The posterior distribution is then given by the Gamma distribution Γ(a j + N T,j , b j + T ), for each j ≥ 1 (see, for example, [13]).Therefore, the Bayesian estimator is and the corresponding predictor This predictor is well defined if j λ j,T < ∞ almost surely, a natural condition (even if j λ 2 j,T < ∞ a.s.should be enough).A sufficient condition for this is

Asymptotic equivalence of classical and Bayesian parameter estimators and predictors
This section provides the asymptotic equivalence of the classical and Bayes parameter estimators and predictors in the sense of equation (2).In particular, the asymptotic efficiency of the Bayes estimator, and the resulting plug-in predictor is also proved.In the derivation of the next theorem, the following assumption is made.
Theorem 1.Let ( λ) T be the classical parameter estimator introduced in (3), and ( λ) T be the Bayes estimator of (λ) constructed from (6).As before, we denote by f 0 (M T ) and f (M T ) the respective Bayesian and classical predictors.Under Assumption A0, the following identities hold: lim lim Proof.From equations (3) and ( 5), equation (8) follows in a direct way with lim Hence, the classical estimator ( λ) T of (λ) is asymptotically efficient.Equation ( 9) is now proved.Specifically, for each j ≥ 1, from equation ( 6), We then study the limit as T → ∞ of , and j λ j < ∞, we can applied Dominated Convergence Theorem with dominant summable sequence λ j , j ≥ 1, to conclude that, for every j ≥ 1, lim Secondly, we prove that lim T →∞ ∞ j=1 T B j (T ) = 0. Indeed, we have that As there exist positive constants c 1 and c 2 such that 13) is given by C T 2 , with C being a finite constant, which leads to lim T →∞ ∞ j=1 T B j (T ) = 0. From equation (11), lim and from ( 12) and ( 13), lim Therefore, the Bayesian estimator ( λ) T of (λ) is asymptotically efficient.Finally, equation ( 10) is obtained from the following identities, keeping in mind equation (5) lim 4. Bayesian and classical estimation of the mean in the H valued Gaussian random variable framework Sufficient conditions, for a better performance of the Bayes estimator of the mean of a Hilbert valued Gaussian random variable against the classical one, are derived.Conjugate Gaussian priors are considered in the Bayesian estimation of the mean projections with respect to the eigenvectors of the autocovariance operator.The asymptotic equivalence, in the sense of (2), of the Bayes and classical estimators of the functional mean is also obtained.

Bayesian and classical mean estimators in the real valued case
Let us first begin with the one-dimensional Gaussian case, where we denote by X n and θ n the classical and Bayesian estimators of the mean, respectively.Let also X 1 , . . ., X n be a sample of independent and identically distributed real valued Gaussian random variables with mean θ 1 and variance λ 1 > 0. That is, X i ∼ N (θ 1 , λ 1 ), i = 1, . . ., n.In the Bayesian estimation, we consider as prior for θ 1 , a zero-mean Gaussian distribution, i.e., θ 1 ∼ N (0, γ 1 ), γ 1 > 0.Then, the Bayesian estimator of θ 1 , the mean of the a-posteriori distribution, is given by (see [13], p. 234).The values of θ 1 for which the Bayesian estimator outperforms the classical one are given in the following lemma.
Proof.By direct computation Replacing, in equation ( 17), That is, denoting x = λ 1 /n, equation ( 18) holds if and only if which yields to the desired condition in terms of the upper bound λ1 n + 2γ 1 .The asymptotic equivalence in the sense of (2) of X n and θ 1n is now established in Theorem 2.

Bayesian and classical estimators of the functional mean
Denote, as before, by H a real separable Hilbert space equipped with its scalar product •, • H , and its norm • H . Let us now consider {X n , n ≥ 1} to be a sequence of i.i.d H valued Gaussian random variables such that and its auto-covariance operator C is given by where h ⊗ g denotes the tensor product of functions h and g in H, defining a Hilbert-Schmidt operator on H as follows: We suppose that C does not depend on θ.From equations ( 21) and ( 22), it defines a Hilbert-Schmidt operator on H admitting a spectral decomposition in terms of a pure point spectrum {λ j , j ≥ 1}, with λ j ≥ 0, j ≥ 1, and an associated system of eigenvectors {v j , j ≥ 1} in H satisfying Hence, from Hilbert-Schmidt Theorem, C can be represented as in the sense that (see [9], pp.119 and 126).In the following we also assume that C is in the trace class, i.e., ∞ j=1 λ j < ∞.Furthermore, the eigenvectors {v j } of C constitute an orthonormal complete system in H. Consequently, θ admits the following representation in H: Then, As an a-priori distribution of the unknown parameter θ we consider θ ∼ N (0, Γ), where Γ denotes, as usual, the covariance operator of the H valued random variable θ, i.e., Γ = E[θ⊗θ].We also assume that Γ admits the following spectral representation in terms of the eigenvectors of the covariance operator C of X: where γ j > 0, j ≥ 1, are the respective eigenvalues of Γ associated with the eigenvectors v j , j ≥ 1, satisfying Γv j = γ j v j , j ≥ 1.To applying Kolmogorov extension theorem in the definition of the prior θ ∼ N (0, Γ), we assume that ∞ j=1 γ j < ∞ (see, for example, [8], and [12]).Hence, θ j ∼ N (0, γ j ).In order to compute E(θ|X 1 , . . ., X n ), we suppose that θ j ⊥(X i,j ′ , i ≥ 1, j ′ = j), which leads to We can then perform an independent computation of the respective posterior distributions of the projections θ j , j ≥ 1, with respect to the eigenvectors {v j , j ≥ 1}, of parameter θ.Thus, for j ≥ 1, since the prior θ j ∼ N (0, γ j ) is conjugate with the likelihood X nj |θ j ∼ N (θ j , λj n ), the Bayesian estimator θ jn of θ j is given by provided that the posterior distribution [13], p. 234).Hence, Clearly, Let us consider the problem of looking for conditions that provide the values of θ for which the Bayes estimator outperforms the classical estimator, in the sense of variance, as given in Lemma 1 in the real valued case.The following proposition addresses this problem, considering additional conditions now formulated: Assumption A2.
The following assertions hold for a finite sample size n: where (ii) Under Assumption A1, (iii) Under Assumption A2, Proof.
(i) First, note that where and For each j ≥ 1, recall that similarly to the one-dimensional case (see Lemma 1): In assertion (i) we have used the fact that Θ 2 , C and Γ are trace operators.Specifically, trace(Θ 2 ) = ∞ j=1 θ 2 j < ∞ by Parseval identity, keeping in mind that θ ∈ H, with v j , j ≥ 1, being an orthonormal basis of H.As we consider before, trace we assumed in the definition of the Gaussian prior distribution for H valued random variable θ.Therefore, A = Θ 2 − 1 n C − 2Γ is also in the trace class.From the positiveness and trace property of operators Θ 2 , C and Γ inequality (27) implies which is equivalent (see Lemma 1) to the inequality leading to (ii) Under Assumption A1, the following inequalities hold: Therefore, A sufficient condition can then be derived from (29) by imposing which is equivalent to That is, leading to the sufficient condition for θ n ≺ X n formulated in (ii).Note that this condition still depends on n, since the upper bound (31) decreases with n, i.e., (iii) To find an upper bound independently of n, we consider Assumption A2.Under A1-A2, the upper bound (31) can be reformulated independently of n.Specifically, since in the case where M = m, we have yielding to the sufficient condition not depending on n, The asymptotic equivalence of θ n and X n in the sense of ( 2) is now derived under Assumptions A1 in the following theorem.Theorem 3.Under Assumption A1, the following identities hold: Proof.First, we recall and Hence, Uniformly in n, ( nγj nγj +λj ) 2 λ j ≤ λ j = g(j), with g independent of n, and ∞ j=1 g(j) < ∞, due to the trace property of operator C. Thus, from Dominated Convergence Theorem, we can interchange the limit when n → ∞ with the sum in A n , obtaining Additionally, In particular, under A1, i.e., θ n and X n are asymptotically equivalent.
Remark 1.Since C and Γ are assumed to be in the trace class, their respective sequences of eigenvalues {λ j } j≥1 and {γ j } j≥1 converge to zero when j → ∞.
Hence, 1/γ j → ∞ when j → ∞.Thus, A1 holds if and only if {λ j } j≥1 and {γ j } j≥1 go to zero with the same rate, i.e., λ j = O(γ j ), j → ∞.This condition is equivalent to for some positive constant K, where N C (λ) and N Γ (γ) denote the respective counting functions associated with the eigenvalues of operators C and Γ, given by with Card denoting the cardinality of a set.In particular, N C (λ k ) = k, and In practice, when C is known we can propose as prior distributions for the projections θ j , j ≥ 1, of θ with respect to the eigenvectors v j , j ≥ 1, a sequence of zero-mean Gaussian distributions whose variances γ j , j ≥ 1, define a real positive summable sequence such that γ j = Lλ j , for some positive constant L. In the case where λ j , j ≥ 1, are unknown the problem becomes more difficult, requiring, for example, the consistent estimation of the covariance operator eigenvalues (see, for example, [6]).Furthermore, if v j , j ≥ 1, are also unknown, suitable estimators should be computed (see, for example, [2]).
Let us begin with the analysis of the classical estimator of the covariance operator where is an unbiased estimator of λ j .Clearly, Thus, Since C n − C, as an element of the Hilbert space S(H) of Hilbert-Schmidt operators on H, has coordinates λ jn −λ j , j ≥ 1, with respect to the orthonormal basis v j ⊗ v j , j ≥ 1, of S(H), by Parseval identity, Hence, provided that there exists a positive constant c such that by assumption.As before, • S(H) denotes the norm in S(H).
For each j ≥ 1, let us now consider the Bayesian estimator λ jn of λ j .Then, where From equations ( 38) and ( 39), Since g j > 2, we have, and there exists a positive constant c such that 2 where the sum is interpreted as an integral with respect to a counting measure µ(dx) = ∞ j=1 δ An(j) (x)dx, with δ a (x) denoting the Dirac Delta measure at point a, and A n (j), j ≥ 1, are given in equation (40).
Additionally, again keeping in mind that g j > 2, In particular, the following assumption gives a sufficient condition for Remark 2. Note that for g j = 3, j ≥ 1, we have β j ≤ 5λ j , j ≥ 1.Thus, Assumption A3 leads to some upper bounds on the parameters of the a-priori Gamma distribution, in terms of the eigenvalues λ j , j ≥ 1, of the covariance operator C. Some prior knowledge about λ j is then required.To ensure that condition A3 holds parameters g j , and β j , j ≥ 1, involved in the respective Gamma priors proposed for parameters τ j = 1 2λj , j ≥ 1, could be chosen as follows: Here, ǫ i (j), j ≥ 1, i = 1, 2, are arbitrary positive constants that, in particular, could not depend on j.
For implementation of equation ( 43) in practice, since λ j , j ≥ 1, are unknown, a possible choice could be to replace in (43) λ j , j ≥ 1, by a sequence of positive real numbers, whose sum is smaller than the empirical variance computed from the data.
The following result is then formulated.
That is, they are asymptotically equivalent in the sense of equation (2).

Simulation study
The comparison of the efficiency of the finite-dimensional approximation of the studied Bayesian and classical estimators, in the l 2 valued Poisson process and the Hilbert valued Gaussian random variable frameworks, is now performed through a simulation study, for an increasing sequence of functional sample sizes.

Equivalence of Bayesian and classical estimators and predictors of l 2 valued Poisson process
We study the following models of l 2 valued Poisson process M t = {N t,j , j ≥ 0}, with infinite-dimensional vectors of intensities {λ ji , j ≥ 1}, i = 1, 2, 3, respectively given by: Note that in equation (45), λ ki , k ≥ 1, are selected in order to ensure that ∞ k=1 λ ki < ∞, testing different rates of convergence, for each model i, with i = 1, 2, 3.The following parameters values are tested for the Gamma distributions defining the priors on the intensity components (45) Note that {a ki } k≥1 and {b ki } k≥1 are selected in order to ensure that Assumption A0 holds, since ∞ k=1 a ki < ∞, and sup k≥1 b ki < ∞ for each i = 1, 2, 3.In addition, for each model i = 1, 2, 3, different rates of convergence to zero are tested in the selection of parameter sequence {a ki } k≥1 in equation ( 46).Tables 1-2 show the statistics MSEICi and MSEIBi respectively denoting the integrated (truncated) empirical mean-square errors of the classical and Bayes estimators of the intensity vector of the Poisson process, obtained after truncation at term M, in Model i, for i = 1, 2, 3. Integration here is understood with respect to a counting measure given by µ(dx) in the case of MSEICi, and given by µ(dx) dx, in the case of MSEIBi, for i = 1, 2, 3, where, as before, δ a (x) denotes the Dirac delta measure at point a.The number of observed times is denoted as S. In these tables the values of S displayed are in the interval [1000, 10000], i.e., S ∈ [1000, 10000], with discretization step size DSS = 500.The truncation order considered is M = 75.The showed integrated empirical mean-square errors are based on R = 500 repetitions.Specifically, Tables 1-2 (see also Figure 1) where λ l kS,i represents the Bayesian estimator of the kth component of the vector of intensities associated with the l 2 valued Poisson process {M t,i , t ≥ 0}, based on the lth observed realization {M l t,i , t = 1, . . ., S} of the process generated in Model i, for i = 1, 2, 3, with M l t,i = {N l t,k,i , k = 1, . . ., M }, for t ∈ {1, . . ., S}, and for l = 1, . . ., R. Similarly, λ l kS,i denotes the classical estimator of the kth component of the vector of intensities associated with the l 2 valued Poisson process {M t,i , t ≥ 0}, based on the lth observed realization {M l t,i , t = 1, . . ., S} of the process generated in Model i, for i = 1, 2, 3, with M l t,i = {N l t,k,i , k = 1, . . ., M }, for t ∈ {1, . . ., S}, and for l = 1, . . ., R.
The classical and Bayes estimators of the intensity vector display the same behavior in relation to the rate of convergence to zero of their integrated empirical mean-square errors, when the sample size increases.A faster decreasing can be seen in Model 2, attaching the theoretical order of magnitude 1/n = 10 −4 .While the order of magnitude of the integrated empirical mean-square error in Models 1 and 3 is 10 −3 , presenting a slower rate of decrease for the samples sizes tested.These observed rates of decrease of the integrated empirical mean square  errors in the three models analyzed reflect that the empirical approximations of A k (T ) and B k (T ), k = 1, . . ., M = 75 (see equation ( 11)), providing upper bounds of the computed integrated empirical mean square errors, decrease when the parameters a ki , k ≥ 1, and λ ki , k ≥ 1, decrease, for i = 1, 2, 3. Therefore, the smallest values of such empirical quantities correspond to Model 2. Note that the differences between parameters involved in Models 1 and 3 are not significative in relation to the sample sizes tested.Summarizing, the rate of decrease of parameters a ki , k ≥ 1, as well as of the components of the intensity vector λ ki , k ≥ 1, for i = 1, 2, 3 affects the rate of convergence to zero of the integrated empirical mean square errors of the Bayes and classical estimators of the intensity vector of l 2 valued Poisson process.
To illustrate that the relative efficiency of Bayes and classical predictors is one, the absolute distances of their scaled, by the sample size, integrated empirical mean square errors to the theoretical limit derived in Theorem 1 (see also equation ( 15)) are displayed in Figure 2. Specifically, for model i = 1, 2, 3, by AEBP i we denote the absolute distance of the integrated empirical mean square error, scaled by the sample size S, of the Bayes predictor to the theoretical value h  error, scaled by the sample size S, of the classical predictor to the theoretical value h 2 M k=1 λ ki given in Theorem 1. Equivalently, for i = 1, 2, 3, the following empirical quantities are displayed in Figure 2: where the value h = 2 has been considered with the same notation as before.
The sample sizes studied coincide with the ones tested in Tables 1-2, S ∈ [1000, 10000], with discretization step size DSS = 500, and truncation order M = 75.The approximation of the integrated empirical mean-square error is computed from R = 500 realizations of Poisson process over the interval (0, S].
The results displayed show almost the same values of the absolute distances to the fixed theoretical limit h 2 M k=1 λ ki , given in Theorem 1, of the the integrated empirical mean-square errors of the Bayes and classical estimators, for the 20 samples sizes S tested between 1000 and 10000.It can also be observed that a larger number of realizations of Poisson process over the interval (0, S] should be generated for each S studied, in order to smoothing the absolute distances observed.However, memory restrictions, as well as running time have limited our computations to the case of 500 realizations of Poisson process, which allows us to illustrate our primary objective in relation to the value one of the relative efficiency of Bayes and classical estimators.

Equivalence of Bayesian and classical mean estimators for H valued Gaussian random variables
To generate examples of trace, and, in particular, of Hilbert-Schmidt covariance operators, admitting the spectral decomposition given in equation ( 23), in terms of a system of eigenvalues and eigenvectors, we will consider the resolution of the identity ∞ k=1 φ k ⊗ φ k , defined in terms of the system of eigenfunctions of the Dirichlet negative Laplacian operator on L 2 ([0, T ]), given by where L 2 ([0, T ]) denotes the space of square integrable functions on [0, T ] (see, for example, [20]).As usual, by Dirichlet negative Laplacian operator on L 2 ([0, T ]), denoted as (−∆) [0,T ] , we understand the operator defined by The following examples of covariance operators are then considered: with eigenvectors φ k , k ≥ 1, given in equation ( 49), and with eigenvalues λ ki , k ≥ 1, i = 1, 2, 3, respectively defined by the following l 1 sequences: where we have considered different rates of convergence to zero to investigate if they affect the relative efficiency of Bayes and classical estimators for finite truncation order and for the finite sample sizes tested below.As commented, we have considered H = L 2 ([0, T ]), with T = 200, and E[X i ] = θ i ∈ L 2 ([0, T ]) having Fourier coefficients {θ ki , k ≥ 1}, i = 1, 2, 3, with respect to the eigenfunction basis (49), given by where again different rates of convergence to zero are tested.To ensure Assumption A1 holds, for i = 1, 2, 3, let the Gaussian prior on θ i has covariance operator Γ i with eigenvalues γ ki , k ≥ 1, with respect to the eigenvectors (49), given by Here, any positive value of constant L ensures that Assumption A1 holds with m = M = 1/L.In particular, we have considered L = 10.For i = 1, 2, 3, Γ i is then defined as For i = 1, 2, 3, the Bayesian estimator of θ i is computed from equation ( 24), and compared with the classical estimate X ni , in terms of their integrated empirical mean square errors truncated at term M, respectively denoted as MSEMB i and MSEMC i , for each model i = 1, 2, 3.That is, they are compared in terms of the following empirical quantities approximating the theoretical values Here, for i = 1, 2, 3, θ l kn,i and X l kn,i denote the respective Bayes and classical estimates of θ ki , based on the lth generation of a finite-dimensional approximation of sample X l 1i , . . ., X l ni of size n of the Hilbert valued Gaussian random variable X i of Model i, for l = 1, . . ., R.
Bayesian and classical estimators of the functional mean display the same efficiency in the three models studied for the sample sizes tested.That is, their relative efficiency is almost one as it can be seen in Tables 3-4.The values MSEMB 1 and MSEMC 1 are below the values of MSEMB 2 and MSEMC 2 , and MSEMB 3 and MSEMC 3 in all the sample sizes tested (see Tables 3-4 and  parameters are positive.Note that, in the models studied In addition, we also consider, for i = 1, 2, 3, the absolute distances AEMB i and AEMC i of the respective integrated (truncated) empirical mean-square errors, scaled by the sample size n, of the Bayes and classical estimators to the limit quantity M k=1 λ ki derived in Theorem 3. Equivalently, for i = 1, 2, 3, let us consider AEMB i and AEMC i respectively defined, for a given truncation order M, as: where the values M = 75, R = 500, and n ∈ [1000, 10000], with discretization step size DSS = 500, have been tested.The order of magnitude of AEMB i and AEMC i is substantially smaller for i = 1, that is, for Model 1, which also displays a faster decreasing to zero of the integrated (truncated) empirical mean square error.Regarding Models 2 and 3, again we have a faster decreasing to zero of the absolute distances of the integrated empirical mean-square error for Model 2 to the limit M k=1 λ k2 , derived in Theorem 3, than for Model 3. As commented, this behavior is due to the relative magnitude of the covariance operator eigenvalues and mean function projections in the three models analyzed, as reflected in equation (55).A larger number of repetitions than R = 500 should be considered in order to smoothing the results displayed in Figure 4. We have displayed the case of 500 repetitions by memory and running time restrictions, which, on the other hand, it is sufficient to show the equivalent behavior of Bayes and classical estimators in relation to their efficiency for the large finite sample sizes tested.

Asymptotic equivalence of Bayesian and classic covariance operator estimators for H valued Gaussian random variables
To analyze, for finite samples sizes and a given truncation order, the influence of the rate of convergence to zero of the eigenvalues of the covariance operator on the equivalent behavior of Bayes and classical estimators, we consider the following models C i of trace covariance operators for i = 1, . . ., 6: with H = L 2 ([0, T ]), φ k , k ≥ 1, are defined as in equation ( 49), and In order to ensure that the restrictions for the asymptotic equivalence of Bayes and classical estimators required in Section 5 are satisfied by the parameters of the Gamma priors for the eigenvalues of the covariance operator (see also Assumption A3 and Remark 2, for i = 1, 2, 3, the following values of parameters g ki and β ki , characterizing the Gamma prior distributions on τ ki = 1 2λ ki , k ≥ 1, are selected (see also equation ( 43)): For i = 1, . . ., 6, the Bayesian C i and classical C i estimators of the covariance operator are compared in terms of their integrated (truncated at term M ) empirical mean square errors, respectively denoted as MSEB i and MSEC i .That is, the following empirical quantities are computed for their comparison for the finite sample sizes n ∈ [1000, 10000] tested: approximating the theoretical ones where λ l kn,i and λ l kn,i respectively denote the Bayes (36) and classical (37) estimates of λ ki , based on the lth generation of a sample Y l 1i , . . ., Y l ni of size n of the Gaussian Hilbert valued random variable Y i of Model i, with l = 1, . . ., R. Specifically, we have considered H = L 2 ([0, T ]), with T = 200, and R = 500 to computing (59).In addition, for i = 1, . . ., 6, the absolute distances AEB i and AEC i to the limit 2 M k=1 λ 2 ki , derived in Theorem 4, of the integrated (truncated at term M ) empirical mean square errors of the Bayes and classical estimators are respectively computed as follows: which they provide an empirical finite-dimensional approximation of The behavior of the classical and Bayes estimators for the six covariance models introduced in (57) is investigated, when truncation is achieved at levels M = 5, 10, 25, 50, 75, for the sample sizes n ∈ [1000, 10000], with discretization step size DSS = 500.The evaluated empirical approximations MSEB i and MSEC i , i = 1, . . ., 6, of the functional mean-square errors of the Bayes and classical estimators display a hyperbolic rate of convergence to zero, with respect to the increasing sequence of sample sizes analyzed, in the six covariance models tested, for all the truncation orders studied (see Figure 5).It can be seen that the rate of convergence to zero of the eigenvalues of the covariance operator affects the rate of decrease of the integrated (truncated) empirical mean square errors for the finite sample sizes tested.Specifically, a slower rate of decrease to zero of the eigenvalues of the covariance operator leads to a slower rate of decrease to zero of MSEB i and MSEC i , as observed for i = 1 and 6, that is, for Model 1 and 6 (see top-left and bottom-right graphs in Figure 5).The fastest rate of decrease to zero of the eigenvalues of the covariance operator corresponds to Models 3 and 4, where the fastest decrease to zero is also observed for MSEB i and MSEC i , with i = 3, 4 (see center-left and center-right graphs in Figure 5).The middle moderate case of decrease rate of the covariance operator eigenvalues corresponds to Models 2 and 5, where the same behavior is observed in the decrease rate of MSEB i and MSEC i , with i = 2, 5 (see top-right and bottom-left graphs in Figure 5).In all the models studied, the Bayes and classical estimators of the covariance operator display an equivalent behavior, in the sense of the rate of convergence to zero of their integrated empirical mean square errors, for the truncation orders and finite sample sizes tested.In Figure 6, for M = 5, 10, 25, 50, 75, the corresponding values of AEB i and AEC i , i = 1, 2, 3, 4, are displayed.In particular, AEB i , i = 1, 2, 3, 4, are represented with solid line, and AEC i i = 1, 2, 3, 4, are given with dotted lines.We observe that Models 1 and 2 display a similar behavior.For i = 1, 2 (see toppanels in Figure 6), AEB i and AEC i are very close for all the truncation orders tested.Thus, Bayesian and classical estimators have relative efficiency close to one.For i = 3, 4 (see bottom-panels in Figure 6), AEB i and AEC i are close, increasing the distance between their values for the truncation orders M = 50, 75.Note that, for these truncation orders, the classical estimator outperforms the Bayesian estimator, specially, for the smallest sample sizes studied.The relative efficiency of Bayes and classical estimators is farer from 1 in Model 5 and 6 than in the previous referred models (see Figure 7).The theoretical results derived in Sections 3-5 provide sufficient conditions for the asymptotic equivalence of Bayes and classical estimators of the infinitedimensional parameters, characterizing the l 2 valued Poisson process and the Hilbert valued Gaussian models studied.In practice, for large finite sample sizes, the numerical results obtained in Section 6, in terms of the integrated (truncated) empirical mean square error, support the theoretical asymptotic results previously derived.In addition, for finite sample sizes, the rate of convergence to zero of the components of the infinite-dimensional parameters affects the rate of decrease of the integrated empirical mean square errors of their Bayes and classical estimators.The absolute distance, to the theoretical limits derived in Theorems 1, 3 and 4, of the scaled, by the sample size, integrated empirical mean square errors of Bayes and classical estimators of the infinite-dimensional parameters considered constitutes a more unstable statistics, which requires a large number of repetitions to be run for its smoothing.In addition, it can be substantially affected by the truncation order in the case of the estimation of the covariance operator eigenvalues.

Final comments
The componentwise Bayesian and classical parameter estimators proposed for the infinite-dimensional vector of intensities, in the l 2 valued Poisson process case, as well as for the functional mean, in the Hilbert valued Gaussian random variable context, are asymptotically equivalent under the conditions assumed.Under the same setting of conditions, for a given truncation order, they present an equivalent behavior in relation to their finite-sample-size efficiency.Note that the components of these estimators are linear functionals of the empirical projections of the data.However, the situation is different for the non-linear componentwise Bayesian and classical covariance operator estimators.Although the asymptotic equivalence of both estimators is theoretically proved under suitable conditions, their relative efficiency for finite sample sizes, and for a given truncation order also depends on the rate of convergence to zero of the eigenvalues of the covariance operator.
In the case of correlated functional data, for example, in the case of observation of the functional values of an autoregressive Hilbertian process of order one (ARH(1) process), the problem becomes more complex.In the Bayesian autocorrelation operator estimation two priors can be considered related to the Gaussian and Beta families.In the last case, the conjugate Beta prior for the Gaussian likelihood leads to the definition of two possible Bayes estimators of the autocorrelation operator.Sufficient conditions can be derived to ensure a better asymptotic performance of the componentwise Bayesian estimator against the classical one in both cases.The rate of convergence to zero of the eigenvalues of the autocorrelation operator also affects the finite-sample size comparison of these estimators for a given truncation order in the two referred cases.That is, it affects the rate of decrease of the corresponding integrated (truncated) empirical mean square errors of the Bayes and classical estimators of the autocorrelation operator, but this subject will be addressed in a subsequent paper.

Theorem 4 .
Under assumption A3, the estimators C n and C n satisfy

Fig 1 .
Fig 1. Results reflected in Tables 1-2 are represented here for Model 1 at the top-left, for Model 2 at the top-right, and for Model 3 at the bottom.Note that the red and blue lines coincide in all the cases.

Fig 2 .
Fig 2. AEBP i , and AECP i values defined in equation (48), for i = 1 at the top-left, i = 2 at the top-right, and i = 3 at the bottom.They are represented with respect to the sample sizes S ∈ [1000, 10000], reflected in the horizontal axis, where discretization step size DSS = 500 is considered.

Fig 4 .
Fig 4. AEMB i and AEMC i values (vertical axis), for i = 1, 2 at the top, and for i = 3 at the bottom, considering the sample size n ∈ [1000, 10000], with discretization step size DSS = 500 (horizontal axis).

Fig 5 .
Fig 5.For M = 5, 10, 25, 50, 75, the corresponding values of MSEB 1 and MSEC 1 are represented at the top-left, of MSEB 2 and MSEC 2 at the top-right, of MSEB 3 and MSEC 3 at the center-left, of MSEB 4 and MSEC 4 at the center-right, of MSEB 5 and MSEC 5 at the bottom-left and of MSEB 6 and MSEC 6 at the bottom-right.

Fig 6 .
Fig 6.For M = 5, 10, 25, 50, 75, the corresponding values of AEB 1 and AEC 1 are represented at the top-left, of AEB 2 and AEC 2 at the top-right, of AEB 3 and AEC 3 at the bottom-left, and of AEB 4 and AEC 4 at the bottom-right.

Fig 7 .
Fig 7.For M = 5, 10, the corresponding values of AEB 5 and AEC 5 are represented at the top-left, and of AEB 6 and AEC 6 at the bottom-left.For M = 25, 50, 75, the corresponding values of AEB 5 and AEC 5 are represented at the top-right, and of AEB 6 and AEC 6 at the bottom-right.

Table 4
MSEMC 3 and MSEMB 3 values in equation (54) are displayed for n ∈ [1000, 10000], with DSS = 500, and for M = 75Figure 3).Moreover, the largest values of the integrated empirical mean square errors correspond to Model 3 (see Table4).The results obtained reflect the fact that the functional mean-square errors of the Bayes and classical estimators, computed in the proof of Theorem 3, are monotone increasing functions with respect to the parameters λ ki , and θ ki , k ≥ 1, for each i = 1, 2, 3, when these The values of AEB i and AEC i ,