Estimation of the variance matrix in bivariate classical measurement error models

: The presence of measurement errors is a ubiquitously faced problem and plenty of work has been done to overcome this when a single covariate is mismeasured under a variety of conditions. However, in practice, it is possible that more than one covariate is measured with error. When measurements are taken by the same device, the errors of these measurements are likely correlated. In this paper, we present a novel approach to estimate the covariance matrix of classical additive errors in the absence of validation data or auxiliary variables when two covariates are subject to measurement error. Our method assumes these errors to be following a bivariate normal distribution. We show that the variance matrix is identiﬁable under certain conditions on the support of the error-free variables and propose an estimation method based on an expansion of Bernstein polynomials. To investigate the per- formance of the proposed estimation method, the asymptotic properties of the estimator are examined and a diverse set of simulation studies is con- ducted. The estimated matrix is then used by the simulation-extrapolation (SIMEX) algorithm to reduce the bias caused by measurement error in lo- gistic regression models. Finally, the method is demonstrated using data from the Framingham Heart Study.


Introduction
The term measurement error is commonly used to refer to situations whereby variables can only be measured with consequential error or be substituted by surrogate values because of not being accessible in principle. This might be caused by the nature of the measured quantity itself or by the measuring process. The latter could be exemplified by inaccuracies due to a measuring device, 1832 E. Kekeç

and I. Van Keilegom
a biased attitude during data collection, miscategorization, high expenses of a measuring process or incomplete information because of missing observations. If the presence of measurement error is not taken into account during the analysis, exceedingly biased estimates might be obtained. It has been of considerable interest to propose methods that correct measurement error in parametric, semiparametric or nonparametric models.
In a multiple regression model, ignorance of measurement error and the application of naïve methods can bring about bias, and the consistency property of maximum likelihood estimators fails [33]. The direction in which the bias attenuates the estimates is not always the same. Even the coefficient estimates for precisely measured covariates can be biased because of the correlation with the error-prone covariates. This correlation determines the direction of the bias [11]. Besides, implementation of standard statistical techniques on data with errors-in-variables could result in concealment of meaningful characteristics of the data and loss of power in exploring the relations between covariates [13]. No matter what the nature of the constructed regression model is, neglecting measurement error could result in biased estimates. Independent of the type of model (longitudinal, survival, logistic, linear, etc.), measurement errors should therefore be taken into account when making statistical inferences [8].
If measurement error is not treated appropriately, it may cause serious problems. To avoid this, a proper specification of both the measurement error model and the distribution of the measurement error is essential. If any distributional assumption is made regarding the unobserved covariate(s), structural methods are implemented; otherwise, functional methods are used. Classical measurement error models are the most frequently considered models in the literature. They assume that W = X + U, (1.1) where W and X refer to the surrogate and error-prone variables, respectively, while U represents the measurement error, which is presumed to be independent of X and to have zero mean. This model applies also to situations in which there are multiple covariates measured with error. In this case, all components of (1.1) are multi-dimensional. While researchers have shown considerable interest in measurement error in a single covariate, dealing with measurement error in the case of multiple errorcontaminated predictors has been studied much less. [36] offered a method to construct confidence intervals for the parameters of a logistic regression model with multiple covariates. It relies on a validation study and regression calibration. [52] constructed a bivariate measurement error model and built a logistic regression model by using replications and bio-markers. [41] noted that the regression calibration method is not capable of correcting the bias induced by measurement errors if the individual measurement errors of the covariates of interest are associated and suggested to use a modified estimator which stands for generalized inverse-variance weighted average. In order to overcome the attenuation and problems preventing to obtain identifiable models, [45] introduced augmented validation study designs and presented semiparametric estimators based on instrumental variables. In a study by [19], measurement errors in three covariates are handled by multi-level multivariate regression calibration modeling. In order to estimate the measurement error variances, they combine the results of various research studies and perform meta-analysis to model the correlated measurement errors as well as the time effect. Besides, [1] obtain nonparametric maximum likelihood estimators for generalized linear models by the expectation-maximization (EM) algorithm under the assumption that the measurement errors of two mismeasured explanatory variables are independent. Although [31] recommend Bayesian hierarchical models via integrated nested Laplace approximations to jointly model measurement errors in two covariates, they assume no association among the covariates of interest as well as their corresponding measurement errors. [27] improve a procedure to allow for the implementation of regression calibration for the joint measurement errors in two covariates. In [6], unobserved covariates are treated as missing and analyzed by multiple overimputation based on the assumption that the observed data follow a multivariate normal distribution. Finally, [17] consider the presence of contaminated covariates in small area estimation problems. They assume structural measurement error models with uncorrelated errors and propose empirical best estimates for small area means.
In spite of the appealing properties of the above mentioned methods that constitute a vast literature to cope with a multivariate measurement error structure, the requirements for validation data, replications or auxiliary variables, the dependency on stringent assumptions or the difficulty of generalizing to the interrelated multivariate error structure, points out the need for a flexible method to estimate the covariance matrix of measurement errors. To the best of our knowledge, no practical method is offered at present that allows for the assessment of the covariance matrix that characterizes the measurement error if only one-record-at-a-time is available.
In this paper, we fulfill such a need to develop methods for bivariate correlated measurement errors in continuous covariates with an absence of auxiliary data, and we propose a methodology that is easy to implement and can be applied to error contaminated data from various fields. Our method extends the work of [5] and assumes the bivariate classical measurement error model where X = (X 1 , X 2 ) and U = (U 1 , U 2 ) are independent, the density of X is unknown, and the vector U has a bivariate Gaussian distribution, i.e.
with Σ unknown. Note that not only we allow the errors U 1 and U 2 to be correlated, but we also allow the unobserved variables X 1 and X 2 to be correlated. The extension to multivariate measurement error models of dimension larger than two will be briefly discussed in Section 7. For the identifiability of measurement error models, we refer to the work of [39] as a pioneering paper in the univariate case. We will generalize their proof of the identifiability to the bivariate case, i.e. we will show that the density of X and the matrix Σ are identified even when both the errors and the unobserved variables of interest are correlated. Similarly as in the univariate case, we will only assume that X has a density on a compact (but unknown) support and that Σ is a symmetric, positive semi-definite matrix. To approximate the density of X = (X 1 , X 2 ) , we benefit from Bernstein polynomials [4]. The proposed methodology allows the use of the estimated error covariance matrix when applying any bias adjustment method to construct any type of regression model. This paper is organized as follows. In the next section, we show the identifiability of the proposed model. In Section 3, we explain how to estimate the model, and we develop asymptotic properties of the model estimators. The finite-sample characteristics of the proposed estimators are reported in Section 4. In Section 5, the proposed estimators of the variance matrix are used to correct for measurement error in a logistic regression model by means of the SIMEX method. Section 6 contains an application of the proposed methodology to data from the Framingham Heart Study. Finally, conclusions and ideas for further research are given in Section 7. The online supplement [26] contains additional simulation results.

Identifiability
Identifiability of a model is a key property in statistics to ensure that accurate inferences can be made. In a likelihood-identifiable model, the observed data contain the essential information needed for the estimation of the model parameters ( [28]; Chp. 5, p.124). In this section, we show that model (1.2)-(1.3) is likelihood-identifiable under certain additional specifications.
We first suppose that model (1.2)-(1.3) is satisfied, and that in addition, the support of (X 1 , X 2 ) is compact, but unknown. For simplicity, we assume that it has a rectangular shape. Building on the work of [5], we write X 1 and X 2 as follows: where S = (S 1 , S 2 ) is a bivariate continuous random vector defined on [0, 1] × [0, 1] and a 1 , a 2 , b 1 and b 2 are unknown parameters, with a 1 and a 2 positive. Therefore, the density of W = (W 1 , W 2 ) can be written as: where f S1,S2 (·, ·) and f X1,X2 (·, ·) are the density of S and X respectively, and f U1,U2 (·, ·; Σ) is the bivariate Gaussian density with mean zero and variance Σ. This shows that where * represents the convolution, and P W1,W2 and P X1,X2 are the laws of (W 1 , W 2 ) and (X 1 , X 2 ) , respectively.
Let us now show that model (1.2)-(1.3)-(2.1) is identifiable. We will show this identifiability in a more general model which, instead of (2.1), assumes that the law P X1,X2 belongs to the set P 2,0 defined as follows: where P 2 is the set of all bivariate probability laws, and supp(P ) is the support of the law P . It is worthy to note here that finite values of q 1 and q 2 are needed for the identifiability of both the diagonal and off-diagonal elements of the error covariance matrix. Many distributions fall into the class P 2,0 , like distributions corresponding to positive random variables (which is the case for many commonly studied variables in practice), variables defined on a compact interval (like proportions, percentages,etc.). On the other hand, it excludes the bivariate normal distribution, but this is not surprising, since it is impossible to identify a normal distribution from the convolution of two normals. So, this is in a sense, the price to pay for proving the identifiability of the error covariance matrix. Also note that in Section 4 we truncate the covariates to meet this condition. In addition, the matrix Σ needs to belong to i.e., Θ constitutes the space of 2 × 2 positive semi-definite covariance matrices. Note that if (X 1 , X 2 ) satisfies (2.1), then P X1,X2 obviously belongs to the set P 2,0 , since the support of (X 1 , Theorem 2.1. Suppose that P X1,X2 * N 2 (0, Σ) = PX 1 ,X2 * N 2 (0,Σ), with P X1,X2 , PX 1 ,X2 ∈ P 2,0 and Σ,Σ ∈ Θ. Then, P X1,X2 = PX 1 ,X2 and Σ =Σ.
The proof relies on the following lemma.
implying that P X1 ∈ P 1,0 . A similar approach is applicable to show that P X2 ∈ P 1,0 .

Estimation
We are now ready to develop an estimation procedure for our model. This will be done by approximating the joint density of S 1 and S 2 by Bernstein polynomials with positive coefficients. A bivariate Bernstein polynomial of degree (m 1 , m 2 ) defined on the unit square, is given by See also [46] and [2]. Hence, if we assume that S = (S 1 , S 2 ) has a continuous density f S1,S2 (·, ·), it can be approximated by where Beta 1, 2 (·) is the density of a Beta variable with parameters 1 and 2 , and θ m = (θ m 0,0 , θ m 0,1 , ..., θ m 0,m2 , θ m 1,0 , ..., θ m m1,m2 ) , which is a vector of size (m 1 + 1) × (m 2 + 1). This shows that we approximate f S1,S2 by a weighted sum of products of two beta densities.
In order forf m S1,S2 (s 1 , s 2 ; θ m ) to be a valid probability density function, some constraints on the coefficients are needed, namely the area underf m S1,S2 (s 1 , s 2 ; θ m ) needs to add up to 1 and the function needs to be non-negative on its support. These constraints are set by imposing that m1 k1=0 m2 k2=0 θ m k1,k2 = 1 and that θ m k1,k2 ≥ 0.
Since we approximate f S1,S2 (s 1 , s 2 ) byf m S1,S2 (s 1 , s 2 ; θ m ), the joint density f X1,X2 (x 1 , x 2 ) of X 1 and X 2 can be approximated bỹ due to (2.1). Note that in (2.1), we need to define X 1 and X 2 as such since bivariate Bernstein polynomials are defined only on the unit square. As a result, the density of (W 1 , W 2 ) , given in (2.2), can be approximated byf ).
If f S1,S2 (·, ·) is continuous, then, This equation shows that the density of the observed variables can be accurately approximated by means of bivariate Bernstein polynomials, provided the degree of these polynomials is sufficiently large.
This motivates us to use the following estimation procedure, based on an i.i.d. sample W i = (W i1 , W i2 ) , i = 1, . . . , n, having the same distribution as W . The log-likelihood of the parameters a, b, Σ and θ m is given by Then, the maximum likelihood estimates of (a, b, Σ, θ m ), denoted by can be computed for fixed m 1 and m 2 , where the maximization is done with respect to the parameter space under the constraint that m1 k1=0 m2 k2=0 θ m k1,k2 = 1. Let us now consider degree selection for Bernstein polynomials. Polynomials with larger degrees lead to better approximations. However, in this case, we need to estimate a larger number of parameters which increases the variance. Hence, we suggest choosing the degree (m 1 , m 2 ) by means of the Bayesian Information Criterion (BIC): for m 1 , m 2 ≥ 0, since the number of (free) parameters is 2 + 2 + 3 + (m 1 + 1)(m 2 + 1) − 1 = (m 1 + 1)(m 2 + 1) + 6. In practice, we fit models for several values of m 1 and m 2 and select the one for which BIC(m 1 , m 2 ) is minimal.
Asymptotic properties of the proposed estimators can now be developed for fixed m 1 , m 2 ≥ 0. Note that the vector (â,b,Σ,θ m ) maximizes the likelihood of a potentially misspecified model. Therefore, we use White (1982), who developed sufficient conditions for consistency and asymptotic normality of maximum likelihood (ML) estimators under potential misspecification. In that case, the target vector of parameters is given by (a * , b * , Σ * , θ * m ), the minimizer (over (a, b, Σ, θ m )) of the Kulback-Leibler information criterion, defined by Note here that (a * , b * , Σ * , θ * m ) equals to the true parameter vector (a, b, Σ, θ m ) if the model is correctly specified.
We have the following result.

If in addition assumptions
, and

Simulation study
In this section, in order to examine the numerical performance of the proposed method, we conduct simulation studies for various settings. Under model (1.2)-(1.3)-(2.1), we need to specify the bivariate distribution of (X 1 , X 2 ) and the variance matrix of (U 1 , U 2 ). For the latter, we will specify the values of σ 1 , σ 2 and ρ = Corr(U 1 , U 2 ) = σ 12 /(σ 1 σ 2 ). The values of σ 1 and σ 2 will depend on the setting and can be found in Table 2, whereas for ρ we will work with 0.7, 0.1 and −0.5 in each setting. For the bivariate distribution of (X 1 , X 2 ) we consider four settings. In the first setting, S 1 and S 2 are simulated independently from a Beta(1, 1) distribution, and X 1 and X 2 equal X 1 = a 1 S 1 +b 1 and X 2 = a 2 S 2 +b 2 with a 1 = 2, a 2 = 3, b 1 = −1 and b 2 = −2. In the second setting, X 1 and X 2 are independent and distributed according to a N (0, 1; −1, 1) and N (0, 1; 0, 2) distribution respectively, where the notation N (0, 1; t L , t U ) stands for a standard normal distribution truncated to the interval [t L , t U ]. The final two settings deal with the case where X 1 and X 2 are correlated. In the third setting, the vector (X 1 , X 2 ) follows a zero-mean bivariate normal distribution with unit variance and correlation given by δ = 0.1, which we truncate to the rectangle determined by t L = (−1, −2) and t U = (0, 1) , whereas in the fourth setting δ = 0.7, t L = (−1, −2) and t U = (2, 0) . Note that not only the measurement errors but also the unobserved covariates are correlated in the last two settings. To compute the bivariate integral in (3.4), we use Monte Carlo (MC) integration which allows to integrate any bivariate function defined on the unit square. Under this method, the integral is computed by evaluating the function of interest in a sufficiently large number of pseudo uniform variables [25]. If the bounds of the integral are different from [0, 1]×[0, 1], they should be transformed. This is achieved as follows: where R 11 , . . . , R D1 and R 12 , . . . , R D2 are two random samples of size D, obtained independently from a U [0, 1] distribution.
MC integral estimators are known to be unbiased and the accuracy increases with increasing values of D. For our case, we generated MC samples of size D = 10, 000. Therefore, the log-likelihood function in (3.4) can be approximated by The use of MC integration significantly reduces the computation time to calculate the likelihood function. Next, we use constrained optimization algorithms for the numerical maximization of the log-likelihood in (4.1).
For each setting, N = 200 samples of size n = {300, 500, 4000} are drawn. To determine the necessary number of Bernstein polynomials, seventeen different degrees (m 1 , m 2 ) are fitted, namely (m 1 , m 2 ) for m 1 , m 2 = 0, 1, 2, and (m 1 , 0) and (0, m 2 ) for m 1 , m 2 = 3, 4, 5, 6. Then, these degrees are evaluated by BIC. Following degree selection and parameter estimation, we evaluate the performance of the proposed method for approximating the joint density of (X 1 , X 2 ) via the mean integrated absolute error (MIAE): wheref m,r X1,X2 (x 1 , x 2 ;â,b,θ m ) represents the estimated density using the data on replication r, while f X1,X2 (x 1 , x 2 ) is the unknown density of (X 1 , X 2 ). The results of the simulations are presented in Tables 1-3. Table 1 and the first two columns of Table 2 summarize the simulation outcomes for the marginal error distributions. Tables 1 and 2 show that the estimation of the unknown parameters is in general accurate. Compared with the results for n = {300, 4000} (not Table 1 Simulation results for the estimation of f X 1 ,X 2 (x 1 , x 2 ) when n = 500. For each distribution, the three lines correspond to three values for ρ, namely ρ = 0.7, 0.1 and −0. 5   shown here, but provided in the online supplement), we see that the performance of the proposed method improves as the sample size increases. The tables also show that our method has some difficulties in differentiating the contributions of the measurement error and the covariates when there is a curvature in the true covariate density. If the generated covariates have densities with a relatively flat shape, the method performs more satisfactorily. The relatively poor performance in the bivariate normal settings could be explained in this way. When the length of the truncation interval is shorter, the problem vanishes.
The last column of Table 2 shows the performance of the estimator of the error correlation. Overall, regardless of the sign and the magnitude of the error correlation, a good estimation performance is observed in most settings. Note however that the accuracy is less good than for the estimation of the marginal parameters a 1 , a 2 , b 1 , b 2 , σ 1 and σ 2 , which can be explained by the fact that the correlation between two completely unobserved variables is a hard quantity to identify and estimate. The table shows nevertheless that the method works. Table 3 shows the distribution of the selected pairs (m 1 , m 2 ). In the first setting with two Beta densities the mostly chosen pair is (0, 0), which is as Table 3 Simulation results for the selection of (m 1 , m 2 ) when n = 500. For each distribution, the three lines correspond to three values for ρ, namely ρ = 0.7, 0.1 and −0.5.

Covariate
Distribution of the selected (m1, m2) (in proportion)  Table 1 shows the performance of the estimated density of the unobserved covariates (X 1 , X 2 ). Note that the last setting leads to the highest values for the MIAE, defined in (4.2) above. This can be due to the poor performance of the estimator of the error correlation in this case. Finally, we ran a simulation experiment for the scenario where no measurement error is present. The results can be found in the online supplement, and show that our method also performs well when the data are not contaminated.
Although the proposed method offers overall a promising estimation performance, it might sometimes suffer from practical identification problems. Making inference about the density of the unobserved covariates and differentiating the contributions of the measurement error and the covariates in the absence of replication data is a complicated problem. [13] state that technical and theoretical identifiability may not necessarily imply practical identifiability when data are observed one-at-a-time and no validation data are available. If the covariates are highly correlated as well as the measurement errors, the problem becomes even more complex. It is a common issue in measurement error problems without additional data.

Logistic regression based on SIMEX and the estimated variance matrix
The methodology developed in Section 3 can be used not only for estimating the error variance-covariance matrix and the bivariate density of (X 1 , X 2 ), but also for estimating a regression model with error-prone covariates, of which the errors are correlated. We will demonstrate this for the special case of a logistic regression model in which we use the SIMEX method to correct for the measurement errors, but the estimated error variance matrix can also be used in any other regression model and with any other method that corrects for measurement errors.
In a logistic regression model, the conditional mean of a binary response Y is given by where X is a vector of covariates and β is a vector of corresponding regression coefficients. If the covariates are exposed to measurement errors, the naïve methods cannot adequately estimate the conditional mean in (5.1), and thus are expected to lead to biased estimates [47]. To address this issue, a large variety of measurement error correction methods for logistic regression have been proposed in the literature. For instance, [47] proposed three correction methods that all reduce the bias of the naïve estimator, one of them being a corrected score approach. [48] generalized the latter approach to more general models.
[37] introduced a method for constructing confidence intervals for coefficients in logistic regression when one of the covariates is error-prone, while [36] extended this method to the case where multiple covariates are subject to measurement error. In addition, [35] presented a nonparametric methodology to take the covariate measurement error into account while constructing a logistic regression model. We also refer to [15] for the use of multiple imputation in the same context. These methods rely on distributional assumptions on the covariate(s) or the presence of validation, replication or cohort data to specify the error distribution. In contrast to the aforementioned methods, the simulation-extrapolation algorithm (SIMEX) is applicable to a large class of regression models (logistic, linear, survival, etc.) as a remedy to correct for the consequences of measurement errors. For these reasons, we prefer to use SIMEX to get error-corrected coefficient estimates of a logistic model. SIMEX, introduced by [16], attenuates the bias due to measurement errors when an additive measurement error model is present. The principle of this method is to mimic the influence of measurement errors on the coefficient estimates with Monte Carlo simulations. SIMEX is based on two steps. In the simulation step, the data are contaminated with noise and parameter estimates are obtained. This step is repeated for varying noise levels. Then, in the extrapolation step, a (parametric) curve is fit through the parameter estimates corresponding to these different noise levels and the curve is extrapolated to the case without measurement error.
It is still common practice to use the naïve method that neglects the measurement errors in the covariates. In order to be fully aware of the merits introduced by SIMEX, we compare the performance of both methods. On the other hand, applying SIMEX without taking the measurement error correlation into account is also a common practice in such situations. Therefore, we will also compare our method with this simplified method that ignores the correlation. Finally, we will compare these three estimators with the SIMEX estimators based on the true Σ and based on the true variances but by neglecting the correlation. To obtain the SIMEX estimates, we use a quadratic extrapolation function.
In order to carry out this comparison, we will study the five estimation methods in four different settings. In each setting there will be five covariates in the logistic model, so X = (X 1 , . . . , X 5 ) . The covariates X 1 and X 2 are generated in exactly the same way as in the four settings studied in Section 4. We will even work with the same sample size (n = {300, 500, 4000}) and the same samples, so that the error covariance matrix does not need to be re-estimated, since it has already been computed in Section 4. On the other hand, the vari- Table 4 Simulation results for the logistic regression model when n = 500 and ρ = 0.7.

Covariate
Estimation ables X 3 , X 4 and X 5 are new, and are generated in the same way in each of the four settings. They are generated independently of each other and of (X 1 , X 2 ). The variable X 3 is contaminated, so we observe . Note that U 1 , U 2 and U 3 are independent of all other variables. The variable X 3 is drawn from a Beta(1,1)-1 distribution and the variance of U 3 equals σ 2 3 = 0.07. The variables X 4 and X 5 are not errorprone and are generated from a Bernoulli(0.7) and N (0, 1) distribution, respectively. For an i.i.d. sample of size n, the i-th data point is composed of The comparative simulation results for these five methods when ρ = 0.7 are presented in Table 4. For almost all cases, the naïve method leads to highly biased estimates. When the measurement errors in the covariates are taken into account, less biased estimates are obtained. Overall, SIMEX based onΣ outperforms the naïve estimator and the SIMEX estimator based on (σ 1 ,σ 2 ,σ 3 ) in terms of bias reduction. On the other hand, the SIMEX estimates have a higher variance than the naïve ones. This phenomenon is referred to as 'variance versus bias tradeoff' in the measurement error literature. The complexity of correction procedures often leads to higher variability than when using the simple naïve procedure that ignores the measurement error. See e.g. Stefanski and Carroll (1985), who observed that the naïve method is not always worse than their proposed correction methods when the sample size is small, i.e. when variance dominates bias. Since small bias is often considered as being more important than small variance, we believe that the proposed method should be preferred in practice, despite its larger variance. Finally, we note that when the true Σ and/or (σ 1 , σ 2 , σ 3 ) are used in the SIMEX procedure, the results are better than when their corresponding estimators are used, but the differences are not very significant, which suggests that the estimation of the matrix Σ does not have an important impact on the performance of the SIMEX estimator.
It is also interesting to note that the SIMEX method that ignores the correlation between the measurement errors does not perform very well. This shows that it is essential to take the correlation between measurement errors into account in order to do correct inference. The simulation results for ρ equal to 0.1 and −0.5 are available in the online supplement. The same conclusions hold. However, the outperformance of SIMEX(Σ) with respect to SIMEX(σ 1 ,σ 2 ,σ 3 ) is more significant when the magnitude of the error correlation is larger (i.e. when ρ = 0.7).

Application
In this section, we apply the proposed method on data from the Framingham Heart Study. Our main interest is to model the 10-year risk of coronary heart disease (CHD) using various covariates on the demographical and medical characteristics of the patients. There are 3,827 individuals with complete cases and 592 of them have the disease.
The covariates in this data set are gender (male (1) 24). Note that in this application, SBP, DBP and glucose are the error-prone variables. Although self-reported variables are known to be potentially mismeasured in epidemiological studies, CPD is not considered as a contaminated covariate in this application. [7] conducted an analysis for a comparison of self-reported CPD counts with the returned cigarette butts and blood sample measurements regarding the nicotine level and concluded that self-reported counts are reliable sources of smoking behavior information. Therefore, we assume that CPD is an error-free covariate.
Neglecting the presence of measurement errors and applying a naïve logistic regression analysis is expected to lead to incorrect inferences. Since both blood pressures, SBP and DBP, are measured by the same device, it is likely for these covariates to have correlated measurement errors. However, glucose level in the blood is measured independently by using a separate device. For this reason, we assume that the measurement error of glucose level is independent of the measurement errors in the two blood pressures. Hence, we take these issues into account and assess the covariate effects after remedying the measurement errors. We use the following transformed variables: This transformation has e.g. been used for SBP by [13] (page 118) in order to homogenize the error variance of SBP. We followed a similar approach for DBP and Glucose, and replaced the value 50 which was used for SBP by respectively   Tables 5 and 6 show the results of the estimation. Table 5 indicates that the density of Glucose * can be estimated using Bernstein polynomials of degree 0. On the other hand, Table 6 suggests that the bivariate density of (SBP * ,DBP * ) can be estimated by bivariate Bernstein polynomials of degree (0, 0) . The estimated error variance-covariance matrix of (Glucose * , SBP * , DBP * ) is found to be:Σ The correlation between the measurement errors in SBP * and DBP * is hence estimated to be around 0.887, which points to a strong positive linear relationship.
Let us now use this estimated variance matrix to estimate the coefficients of the logistic regression model by means of the SIMEX method. SIMEX plots for each covariate are provided in Figure 1. A comparison between the coefficient estimates obtained from the naïve method and from the SIMEX method is given in Table 7. Standard errors for the SIMEX estimates are also provided.  Carroll et al. (1996) (page 245). After adjusting for the measurement errors, the estimates for Glucose * , SBP * and DBP * have considerably changed, which suggests that correcting for correlated measurement errors is essential.

Discussion and future research
When dealing with measurement error models, it is common to assume that the measurement errors in multiple covariates are independent of each other. When this assumption is violated, the correlation between the errors must be taken into account in order to make correct inferences. On the other hand, in the literature on correlated measurement errors many existing methods suppose that external information sources such as validation data sets, replications or auxiliary variables are available to estimate the covariance matrix of the measurement errors. However, these sources may not always be accessible in practice. This limits the usability of such methods.
In this paper, we proposed a method to take the correlated measurement error structure into account. We worked on a bivariate classical measurement error model with Gaussian errors to estimate the error variance matrix. Our method is flexible, since it does not rely on external information sources and does not make distributional assumptions on the unobserved covariates, apart from some minor assumptions on the support of these variables. Instead, these covariate distributions are approximated by using bivariate Bernstein polynomials. Thus, our method can be applied to a wide range of contexts. Once the covariance matrix is estimated, any correction method can be used to estimate the regression model correctly.
Simulation studies indicate the good finite sample performance of the proposed method in various settings. Moreover, our method performs well in a logistic regression model in which the SIMEX method is used to correct for the measurement errors. Analysis of the data from the Framingham Heart Study allowed us to assess the effects of both the contaminated and the error-free covariates on the risk of coronary heart disease.
In our simulations, polynomials up to degree 6 are sufficient, and degree 6 is actually rarely selected. However, if for a given data set higher degrees would be required, the computation time can become problematic. In order to overcome this issue, a two-step estimation scheme can be used in that case. In the first step, one could estimate the parameters (a j , b j , σ j ), j = 1, 2, related to the marginal error distributions by using the method of [5]. Note that the error U j (j = 1, 2) is independent of X j and has a univariate normal distribution with mean zero and variance σ 2 j . In addition, X j has compact support [b j , a j + b j ]. Hence, we are exactly in the setting needed for applying the method of [5]. Then, in the second step, the estimates (â j ,b j ,σ j ), j = 1, 2 are plugged into the log-likelihood function (4.1), and the error covariance σ 12 and the parameters θ m related to the bivariate distribution f X1,X2 can be estimated. Although esti-mation of all parameters in one step is accurate, the two-step estimation scheme could be a practical solution when the computation time for higher degrees becomes excessive. In order to reduce the computation time in optimization, such methods are not uncommon. See for example, [20] and [30].
The method in this paper can be considered as an important step in the literature on correlated measurement errors. A number of extensions can be studied in the future. First of all, we assumed that the measurement errors of two covariates are correlated. A natural extension would be to consider the case where instead of having two covariates X 1 and X 2 subject to measurement error, we have p ≥ 2 covariates of which the errors are correlated. When the corresponding errors follow a multivariate normal distribution, the p×p variancecovariance matrix of these errors needs to be identified and estimated. In the bivariate case the critical parameter in the identification of the model, is the correlation ρ between U 1 and U 2 . In the multivariate case, there will be several (partial) correlations. Although we expect that the identification and estimation of this matrix will be feasible, it will be technically more challenging.
A second possible extension is regarding the identifiability of model (1.2)-(1.3)-(2.1) when the assumed bivariate normal distribution of the errors is replaced by another parametric family of distributions. [39] mention in their Remark 2.3 a few examples of univariate parametric families under which the univariate measurement error model is identified. However, they do not give necessary and sufficient conditions, i.e. they do not give a full characterization of the class of parametric families that make the model identifiable. It would be useful to develop such an identification result, first in the univariate case, and if successful, also in the more challenging bivariate or multivariate case.

Software
Software in the form of R code and complete documentation are available on request from the corresponding author.