Tuning parameter calibration for personalized prediction in medicine

: Personalized prediction is of high interest in medicine; poten- tial applications include the prediction of individual drug responses or risks of complications. But typical statistical pipelines such as ridge estimation combined with cross-validation ignore the heterogeneity among the patients and, therefore, are not suited for personalized prediction. We, therefore, in- troduce an alternative ridge-type pipeline that can minimize the prediction error of each patient individually. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly eﬀective both in simulations and on real data.


Introduction
In the last decade, improvements in genomic, transcriptomic, and proteomic technologies have enabled personalized medicine, or precision medicine, to become an essential part of contemporary medicine. Personalized medicine takes into account individual variability in genes, proteins, environment, and lifestyle to decide on disease prevention and treatment (Hamburg and Collins 2010). The use of a patient's genetic and epigenetic information has already proven to be highly effective to tailor preventive care or drug therapies in a number of applications, such as breast cancer (Cho, Jeon and Kim 2012), prostate cancer (Nam et al. 2007), ovarian cancer (Hippisley-Cox and Coupland 2015), and Tuning parameter calibration for personalized medicine 5311 pancreatic cancer (Ogino et al. 2011), cardiovascular disease (Ehret et al. 2011), cystic fibrosis (Waters et al. 2018), and psychiatry (Demkow and Wolańczyk 2017). The subfield of pharmacogenomics studies specifically how genes affect a person's response to particular drugs to develop more efficient and safer medications (Ziegler et al. 2012). Following Kosorok and Laber (2019) precision medicine may be formalized as sequence of decision rules, a treatment regime, mapping patient information to a recommended action among several different treatments or preventive care.
Genomic, epigenomic, and transcriptomic data used in personalized medicine, such as gene expression, copy number variants, or methylation levels are often high-dimensional with a number of variables that rivals or exceeds the number of observations. Using such data to estimate and predict treatment response or risk of complications, therefore requires regularization typically by the 1 norm (lasso), the 2 norm (ridge), or other terms. While there exists tuning-free regularization, such as the methods proposed in (Lederer and Müller 2015;Huang, Xie and Lederer 2021), such methods are not tailored for minimizing the personalized prediction error. On the other hand, regularization often introduces one or more tuning parameters, and these tuning parameters are usually calibrated based on the averaged prediction risks. Most commonly used, K-fold crossvalidation (CV) divides the data into K folds (typically K ∈ {5, 10}), predicts each fold out-of-sample, averages over all folds for a range of tuning parameters, and selects the value with the lowest averaged error (Stone 1974;Golub, Heath and Wahba 1979). But the averaging removes the inherent individual heterogeneity of the patients and, therefore, results in sub-optimal prediction performance for the individual patients. This may ultimately lead to unsuitable treatment, administration of improper medication with adverse side effects, or lack of preventive care (Hamburg and Collins 2010).
Hence, rather than minimizing an averaged prediction error, our goal is to minimize each patient's individual ("personalized") prediction error. The idea was first introduced by Hellton and Hjort (2018) as a personalized procedure for ridge regression, but they lacked an appropriate calibration scheme for the tuning parameters, utilizing only a naïve plug-in approach, and could therefore not demonstrate a superior predictive performance. In this paper, we introduce an alternative ridge estimator, referred to as Euclidean distance ridge (edr) and calibrate the tuning parameter based on the ideas of adaptive validation (Chichignoud, Lederer and Wainwright 2016) for each patient individually. We show that this approach offers compelling theory, fast computations, and accurate prediction on data.
One goal within personalized medicine is to select among multiple treatments (Jeng, Lu and Peng 2018). Our goal, however, is different: we want to predict the risk for a complication or effect of a single treatment as precisely as possible. The specific motivation for our method is to unravel the relationship between gene expression and weight gain in kidney transplant recipients (Cashion et al. 2013). Kidney transplant recipients are known to often gain substantial weight during the first year after transplantation, which can result in adverse health effects (Patel 1998). Individual predictions of this weight gain based on the genetic data can help in identifying high-risk patients.
The standard regularizers are currently sparsity-inducing regularizers such as the 1 norm (Hastie, Tibshirani and Wainwright 2015). Sparsity is invoked to facilitate interpretation and because applications might be inherently sparse in the first place. However, since we focus sharply on prediction, interpretability is not the key issue, and there is also little evidence for inherent sparsity: quite in contrast, Cashion et al. (2013) already indicates that there might be many genes associated with weight gain. We, therefore, opt for the more classical ridge regression. Ridge regression (Hoerl and Kennard 1970) yields good predictive performance for dense or non-sparse effects, that is, for outcomes related to systemic conditions, as the method does not perform variable selection. Ridge regression is a classical tool for prediction based on genomic data, and it has been shown that ridge regression can outmatch competing prediction methods for survival based on gene expression (Bøvelstad et al. 2007).
From a practical perspective, a short summary of our pipeline is as follows: 1. Compute the ridge estimator for a range of tuning parameters; 2. Translate these estimators into what we call Euclidean distance ridge estimators; 3. Find an optimal tuning parameter for these estimators through a testing scheme.
Our three key contributions of this paper are: • We introduce a prediction pipeline that takes the heterogeneity among patients into account; • We develop theoretical guarantees for this pipeline; • We establish a fast and ready-to-implement algorithm with publicly available code.
The remainder of this paper is organized as follows: We introduce the linear regression framework and the problem statement in Section 2. We then introduce the main methodology of our approach, and present theoretical guarantees in Section 3. In addition, we discuss the algorithm and analyze its performance through simulation studies using synthetic and real data in Section 4. We further apply our pipeline to kidney transplant data in Section 5. Finally, we discuss the results in Section 6 and we defer all proofs to the Appendix A. All data are publicly available and our code is available at https://github.com/LedererLab/personalized_medicine.

Problem Setup
We consider data (y, X) that follows a linear regression model Let p denote the number of parameters, e.g. genes or genetic probes, and n the number of samples or patients, then y ∈ R n is the vector of outcomes, y i , for example, a person's response to treatment. We let X ∈ R n×p denote the design matrix, where each row x i ∈ R p , i ∈ {1, . . . n}, contains the genome information or other covariates of the corresponding person. Each element β * j , j ∈ {1, . . . p}, of the regression vector β * ∈ R p models the gene's influence on the person's response. We ensure the uniqueness of β * by assuming that it is a projection onto the linear space generated by the n rows of X (Shao and Deng 2012;Bühlmann 2013). For the random error vector u ∈ R n , we make no assumptions on the probability distribution.
Our goal is to estimate the regression vector β * from data (y, X), or in terms of our application, predicting a person's treatment response based on that person's specific information. Mathematically, this amounts to estimating z β * in terms of the personalized prediction error where z ∈ R p is what we call the person's "covariate information," which could include genome information as one example.
Since the data in personalized medicine is often high-dimensional, that is, the number of parameters (genes) p rivals or exceeds the number of samples (patients) n, we consider regularized least-squares estimators of the form Here, f denotes a function that takes into account prior information, such as sparsity or smaller regression coefficients, and the tuning parameter r ≥ 0 balances the least-squares term and the prior term. Regularization can also improve prediction accuracy in low-dimensional cases. Given an estimator (2.3), the main challenge is to find a good tuning parameter in line with our statistical goal. This means that we want to mimic the tuning parameter r * := arg min which is the optimal tuning parameter in terms of prediction in a given set of candidate parameters R := {r 1 , r 2 , . . . , r m }. The optimal tuning parameter r * depends on the family of estimators (2.3), the unknown noise u, and the patient's genome information z. The dependence on z is integral to personalized medicine: different patients can respond very differently to the same treatment. But standard tuning-parameter calibration such as CV schemes do not take this personalization into account but instead attempt to minimize the averaged prediction error ||Xβ * − Xβ[r]|| 2 2 /n rather than the personalized prediction error |z (β * −β[r])|. We, therefore, develop a new prediction pipeline that is tailored to the personalized prediction error and equip our methods with fast algorithms and sharp guarantees.

Methodology
In this section, we introduce an alternative version of the ridge estimator (Hoerl and Kennard 1970) along with a calibration scheme tailored to personalized medicine. Two distinct features of the pipeline are its finite-sample bounds and its computational efficiency. Our estimator is called Euclidean distance ridge (edr) and is defined aŝ The edr replaces the ridge estimator's squared 2 prior term f ridge [β] ≡ ||β|| 2 2 by its square-root f edr [β] ≡ f ridge [β] ≡ ||β|| 2 . This modification allows us to derive finite-sample oracle inequalities that can be leveraged for tuningparameter calibration. At the same time, the edr preserves two of the ridge estimator's most attractive features: it can model the influences of many parameters, and it can be computed without the need for elaborate descent algorithms. Finally, we will exploit the theory and method that is developed for the edr to calibrate personalized tuning parameters for ridge regression, see Section 4.
Our first step is to establish finite-sample guarantees for the edr. The key idea is that if the tuning parameter is large enough, the personalized prediction error (2.2) is bounded by a multiple of the tuning parameter. In the main text, we assume an orthonormal design X X = I p×p for ease of presentation, but we show in the Appendix that this assumptions is required neither in theory (see Appendix B) nor in practice (Appendix D). We establish the following guarantee for edr: Such guarantees are usually called oracle inequalities (Lederer et al. 2019). The given oracle inequality is an ideal starting point for our pipeline, because it gives us a mathematical handle on the quality of tuning parameters: a good tuning parameter should be large enough to meet the stated condition and yet small enough to give a sharp bound. The original ridge estimator, however, lacks such inequalities for personalized prediction. Our proof techniques, which are based on the optimality conditions of the estimator, also yield a similar bound for the original ridge estimator The following pipeline can then be applied the same way as for the edr. But the crucial advantage of the edr's bound is that its right-hand side is bounded by ||z|| 2 · r, which ensures that the results do not scale with β * .
The factor c [z, r] can be interpreted as the absolute value of the correlation between the person's covariate information z and the estimatorβ edr [r]. This factor, and therefore z, are included in our calibration scheme below, and our pipeline, hence, optimizes the prediction for particular study subjects.
Lemma 3.1 bounds the personalized prediction error of edr as a function of the tuning parameter r. Given z, the best tuning parameter in terms of the bound minimizes c[z, r] · r over all tuning parameters, that satisfy the lower bound The value at the lower bound, which we call the oracle tuning parameter, can be interpreted as the closest theoretical mimic of the optimal tuning parameter r * from (2.4): Definition 3.1 (Oracle tuning parameter for personalized prediction). Given a new person's covariate information z, the oracle tuning parameter for personalized prediction in a candidate set R is given by The oracle tuning parameter r o is the best approximation of the optimal tuning parameter r * in view of the mathematical theory expressed by Lemma 3.1. In practice, however, one does not know the target β * nor the noise u (typically not even its distribution), such that neither r * nor r o are accessible. Such lower bounds on the tuning parameters and corresponding notions of oracle tuning parameters are standard in high-dimensional regression-see, for example, Bühlmann and van de Geer (2011, Chapter 6); Hastie, Tibshirani and Wainwright (2015); Zhuang and Lederer (2018, Section 2). Broadly speaking, the lower bounds indicate that the tuning parameters need to be chosen sufficiently large to "overrule the noise." The lower bounds typically include X u rather than u directly as a consequence of using Hölder's inequality in the proofs (see the Appendix for our proofs); in other words, the estimators are affected by X u rather than by u directly. The values of the lower bounds are, therefore, called the effective noise (Lederer and Vogt 2020, Section 1). Our lower bounds additionally include the personalized quantities z and c[z, r] simply because we consider personalized prediction. Overall, our lower bounds and the oracle tuning parameters are only slight variations of standard notions in high-dimensional statistics.
Our goal is now to estimate r o in order to match its prediction accuracy (and, therefore, to attempt to reach the accuracy of r * ) with a completely data-driven scheme. Our proposal is based on pairwise tests along the tuning parameter path: Definition 3.2 (PAV edr : Personalized adaptive validation for edr). We select a tuning parameterr byr where the set of admissible tuning parameters is The idea of using pairwise tests for tuning-parameter calibration in highdimensional statistics has been introduced by Chichignoud, Lederer and Wainwright (2016) under the name adaptive validation. A difference here is that the factors c[z, r] · r are not constant but depend both on r and z. The dependence on z in particular reflects our focus on personalized prediction.
The following result guarantees that the data-driven choicer indeed provides-up to a constant factor 3-the same performance as the oracle tuning parameter r o .
This result guarantees that our calibration pipeline selects an essentially optimal tuning parameter from any grid R. Our pipeline is the only method for tuning parameter selection in personalized medicine that is equipped with such finitesample guarantees. It does, moreover, not require any knowledge about the regression vector β * nor the noise u.
Our calibration method is fully adaptive to the noise distribution; however, it is instructive to exemplify our main result by considering Gaussian noise (see Appendix A.3 for the detailed derivations): Example 3.1 (Gaussian noise). Suppose orthonormal design and Gaussian random noise u ∼ N n [0 n , σ 2 I n×n /n]. For any δ ∈ (0, 1), it holds with probability at least 1 − δ that The bound provides the usual parametric rate σ/ √ n in the number of samples n; the factor ||z|| 2 entails the dependence on the number of parameters p.

Algorithm and Numerical Analysis
One of the main features of our pipeline is its efficient implementation. This implementation exploits a fundamental property of our estimator: there is a oneto-one correspondence between the edr and the ridge estimator via the tuning parameters.

Connections to the ridge estimator
The ridge estimator is the 2 2 -regularized least-squares estimator (Hoerl and Kennard 1970) where t > 0 is a tuning parameter. Its computational efficiency, which is due to its closed-form expression, provides a basis for the computation of our edr estimator. The closed-form of the ridge estimator can be derived from the Karush-Kuhn-Tucker (KKT) conditions aŝ noting that the matrix (X X + tI p×p ) is always invertible if t > 0. However, the inversion of the matrix X X + tI p×p still deserves some thought: first, the matrix might be ill-conditioned, and second, the matrix needs to be computed for a range of tuning parameters rather than only for a single one. A standard approach to these two challenges is a singular value decomposition (svd) of the design matrix X.
Lemma 4.1 (Computation of the ridge estimator through singular value decomposition). Let a singular value decomposition of X be given by X = U DV , where U ∈ R n×n and V ∈ R p×p are orthonormal matrices, and Then, the ridge estimator can be computed aŝ . The singular value decomposition of the design matrix does not depend on the tuning parameter; therefore, the ridge estimatorsβ ridge [t] can be readily computed for multiple tuning parameters just by substituting the value of t in D † . The resulting set of ridge (edr) estimators for a set of tuning parameters T is called the ridge (edr) path for T . Now, the crucial result is that the ridge estimator and the edr are computational siblings. This mapping transforms, in particular, the optimal tuning parameter of the ridge estimator to a corresponding optimal tuning parameter of the edr estimator. It can be viewed as a consequence of the edr penalty being a continuous transformation of the ridge penalty. More generally, this mapping allows us to compute the edr estimator via the ridge estimator-see below.

Algorithm
The core idea of our proposed algorithm is to exploit the above one-to-one mapping between edr estimator and ridge estimator. This correspondence allows us to efficiently compute edr solution paths via the ridge's explicit formulation and svd. First, consider a set of ridge tuning parameters T and its corresponding set of edr tuning parameters given by with cardinality m := |R φ |. This set contains, in particular, the tuning parameterr, whose optimality is guaranteed under Theorem 3.1. To compute the tuning parameterr, given data z, we first order the elements r 1 , r 2 , . . . , The PAV edr method can then be formulated in terms of the binary random variableŝ . . , m}, and an algorithm is as follows: The full pipeline can be summarized by the following four steps: Step 1: Generate a set T of tuning parameters for ridge regression.
Step 2: Compute the ridge solution path with respect to T by using (4.3).
Step 3: Transform the ridge tuning parameters to their edr counterparts R φ using (4.4) and sort the tuning parameters according to (4.5).
Step 4: Use the PAV edr method (Algorithm 1) to compute the tuning parameterr and map it back to its ridge counterpartt.
The algorithm can be readily implemented and is fast: it essentially only requires the computation of one ridge solution path (a single svd). In strong contrast, K-fold CV requires the computation of K ridge solution paths. Consequently, the ridge estimator with PAV edr can be computed approximately K times faster than with K-fold CV, which we will confirm in the simulations. Moreover, CV still requires a tuning parameter, namely, the number of folds K, while PAV edr is completely parameter-free. We defer a detailed discussion on the complexity and run time of the algorithm to Appendix C.

Simulation Study
We evaluate the prediction performance of the PAV edr method using (1) fully simulated data with random design and (2)  The mean personalized prediction error |z (β * −β[r])| for all 100 data testing vectors z is compared with (i) Fridge method proposed by Hellton and Hjort (2018), (ii) for r ∈ T calibrated by PAV edr , 5-fold CV, and 10-fold CV, (iii) the oracle tuning parameter defined in Definition 3.1, and (iv) the optimal tuning parameter r * defined in (2.4). The computation is repeated 100 times and results are averaged. To compare the distributions of the mean personalized prediction error obtained by PAV edr , 5-fold CV, and 10-fold CV, respectively, we further compute the corresponding standard error (SE) that is defined as the standard deviation of all N := 100 × 100 computed personalized prediction errors divided by √ N . We observe that in all considered cases, PAV edr improves on CV and Fridge both in terms of accuracy as well as in standard error (Table 1). In particular, PAV edr mimics the prediction performance of the oracle tuning parameter r o defined in Definition 3.1 well. While there is still a reasonable discrepancy between the personalized prediction performance of r o and r * , we were able to achieve a strong improvement compared to all other tested tuning parameter calibration methods.
In the second setting, we base our simulation on real data for covariates but simulate the outcome. We use the genomic data from the application in Section 5 where the sample size is n = 26. The number of covariates in the design matrix is restricted to the p = 1936 gene probe targets identified as potentially influential by Cashion et al. (2013). The regression vector and the noise are generated as in the first simulation setting above. The results were averaged over 100 runs and are summarized in Table 2. We observe again that PAV edr improves on CV and Fridge both in terms of accuracy as well as in standard error.
Finally, we investigate a simulation setting where the design matrix X has mutually correlated coordinates with varying degree of correlation. Again, we observe a large improvement in terms of accuracy as well as in standard error of PAV edr compared to CV and Fridge. The details and results are deferred to Appendix D. In summary, the results of our simulation studies demonstrate that PAV edr is a contender on data, which confirms and complements our theoretical findings from before.

Application
Kidney transplant recipients are known to gain significant weight during the first year after transplantation, with a reported average increase of 12 kg (Patel 1998). Such substantial weight gain over a relatively short time period gives an increased risk for several adverse health effects, such as cardiovascular disease, leads to less favorable graft outcomes and may be detrimental for the overall outcome of the patient. The weight gain has been explained by the use of prescribed steroids which increase the appetite, but steroid-free protocols alone have not reduced the risk of obesity, suggesting alternative causes. Even though weight gain is fundamentally caused by a too high calorie intake relative to the energy expenditure, the heterogeneity in the individual response is substantial. Genetic variation has, therefore, been considered as a contributing factor, and several genes have been linked to obesity and weight gain (Bauer et al. 2009;Cheung et al. 2010). Cashion et al. (2013) investigated whether genomic data can be used to predict weight gain in kidney transplant recipients. This was done by measuring gene expression in subcutaneous adipose tissue which has an important role in appetite regulation and can easily be obtained from the patients during surgery. The patients' weight was recorded at the time of transplantation and at a 6-months follow-up visit, resulting in a relative weight difference. The adipose tissue samples were collected from 26 transplant patients at the time of surgery, and mRNA levels were measured to obtain the gene expression profiles for 28 869 gene probe targets using Affymetrix Human Gene 1.0 ST arrays. All data is publicly available in the EMBL-EBI ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-GEOD-33070. As excessive weight gain can have severe consequences for the patients, the goal is to predict the future weight increase based on the available gene expression profiles. When a large weight increase is predicted, additional efforts assisting with diet restriction and physiotherapy can be set into effect to better tailor the care of each individual patient.
We compare the performance of our method in predicting weight gain for the kidney transplant patients to the prediction of standard ridge regression calibrated by CV. In detail, we make predictions for each patient both in-sample and out-of-sample, leaving out the observation and using the remaining data to fit the penalized regression model and select the optimal tuning parameter. Since we do not know the true parameter β * , we can only examine the performance of our method and CV by comparing their estimation errors, which is defined by (5.1) As described in the previous section, the columns of the design matrix are normalized to have Euclidean norm one. Unlike in Section 4.3, we here take all 28 869 gene probes into consideration.
The averaged results are summarized in Table 3a and Table 3b. We observe that PAV edr clearly outperforms 5-fold and 10-fold CV for both in-sample and out-of-sample prediction of the kidney transplant data. For out-of-sample prediction, we observe an improvement of about 9.3% in the estimation error compared to 10-fold CV. These improvements, especially in standard deviation, reinforce the advantages of a personalized approach to tuning-parameter calibration.
By predicting the individual weight gain more precisely, our method contributes in ensuring that each patient may get the best possible care. Even though kidney transplant patients are typically encouraged to adhere to a healthy lifestyle (diet and physical activity), obesity prevention can be a difficult to achieve (Cashion et al. 2013). Thus by identifying high-risk patients as precisely as possible, one can better tailor the necessary lifestyle changes, through additional dietary counselling and physical activity. This may prevent the adverse weight gain documented in the first year after transplantation.

Conclusion
We have introduced a pipeline that calibrates ridge regression for personalized prediction. Its distinctive features are the finite sample guarantees (see Theorem 3.1) and the statistical and computational efficiency (see Tables 1, 2, and 4). These features are echoed when predicting the weight gain of kidney transplant patients (see Table 3). Hence, our pipeline can improve personalized prediction and, thereby, further the cause of personalized medicine.
One possible limitation of our procedure, relevant for all personalized prediction methods, is that the training data needs to be available to compute each new prediction. This may give certain constraints in terms of data storage and privacy when implementing the procedure in practice or as a commercial product. Nowadays memory tends to not be a problem, in particular for medical dataset where the number of patients is typically small, ranging from tens to hundreds. The memory needed is thus relatively small in view of current online and offline storage capacities. Further, to ensure proper data privacy and software safety any included data would have to be properly anonymized.
Despite our focus on personalized medicine, we also envision applications in other areas where individual heterogeneity is crucial for predictions. Two examples are item recommendation, predicting the rating of an item or product assigned by a specific user (Guy et al. 2010;Rafailidis et al. 2014), and personalized marketing, delivering individualized product prices or messages to specific costumers (Tang, Liao and Sun 2013). Future work includes to extend the methodology to logistic and generalized linear regression and to explore possibilities regarding 1 or lasso regularization. Further, the improvements in prediction error are also beneficial when selection between different treatments, and our proposed methodology may be extended in the setting of Jeng, Lu and Peng (2018).

A.4. Proof of Lemma 4.1
Proof. Let X = U DV be a singular value decomposition of X as given in Lemma 4.1. Then by algebraic manipulation of Equation (4.2) the ridge estimator can be written aŝ where the matrix D † is defined as

A.5. Proof of Theorem 4.1
Proof. We consider the KKT-conditions of (3.1) and replace the edr estimator with the ridge estimator to obtain By taking the 2 -norm of both sides and with r > 0, we obtain Thus, we can transform the ridge tuning parameter t to the edr tuning parameter r with respect to the same estimator. Moreover, there is a one-to-one relationship between edr and ridge. The ridge estimator in (4.2) implies that Since

Appendix B: Beyond Orthogonality
To avoid digression, we have restricted the theories in the main body of the paper to orthonormal design matrices. However, there are straightforward extensions along established lines in high-dimensional theory. In general, the influence of correlation on regularized estimation has been studied extensivelysee, for example, Dalalyan, Hebiri and Lederer (2017) and Hebiri and Lederer (2013) for the lasso case. The most straightforward extension of our theories goes via the ∞ -restricted eigenvalue introduced in Chichignoud, Lederer and Wainwright (2016). This condition allows for design matrices, that satisfy ||X Xδ|| ∞ ||δ|| ∞ for certain δ. We omit the details; importantly, our simulations demonstrate that our method provides accurate prediction far beyond orthonormal design.

Appendix C: Run Time Measurement
In this section, we compare the complexity and run time of our PAV edr algorithm in theory and practice. The PAV edr algorithm can be split into two parts: the computation of the ridge solution path and the patient-wise tuning parameter calibration. The ridge solution path is computed using the results of Lemma 4.1 and essentially requires only the computation of a single svd. This computation is independent of the genome information z of new patients and needs to be performed only once.
Algorithm 1 describes the second part of the PAV edr method and is computed for each new patient's genome information. The method mainly requires ordering of the bounds c[z, r i ] · r i and computation of the binary random variableŝ s ri for all tuning parameters r i , i = 1, . . . , m. Hence, its complexity scales with the number of tuning parameters m, which is fixed; and we use m = 300 in all of our simulation studies and applications. Ordering the bounds c[z, r i ] · r i can be achieved using standard sorting algorithms with average complexity of O(m log m). However, for a pair of tuning parameters r i , r j with i < j the role of c[z, r i ], c[z, r j ] is very limited in practice and r i < r j ; hence, these bounds can be expected to be largely presorted. Indeed, we observed that all c[z, r i ] · r i were fully presorted in all of our simulation studies.
We recorded the computational run time of the tuning parameter calibration using PAV edr , Fridge (Hellton and Hjort 2018), 5-fold CV, and 10-fold CV for both simulation settings (see Section 4.3). All computations were performed in R version 4.0.2, and for CV, we used the implementation provided by the glmnet package (Friedman, Hastie and Tibshirani 2010). The results (Table 4) demonstrate that PAV edr offers feasible and competitive performance in both simulation settings. While PAV edr is much slower than Fridge in the patientindependent computation, its patient-wise run time is faster for high (n, p). It is important to note, that since cross-validation only computes a single tuning parameter that is used for all new patients z i , it does not have a patient-wise run time. Hence, for a high number of new patients, CV may be faster in total -however, still much less accurate (compare Section 4.3).

Appendix D: SIMULATION STUDY
In this section, we perform a simulation study for correlated covariates and, hence, apply our method to non-orthogonal design matrices. We sample each row of the design matrix X from a p-dimensional normal distribution N [μ p , Σ p ].
Here, the mean vector μ p ∈ R p is defined as μ p := (μ, . . . , μ) such that μ is sampled from N [0, 100 2 ] and the covariance matrix Σ p ∈ R p×p is given by Σ p := (1 − k)I + k1, where 1 := (1, . . . , 1) (1, . . . , 1) is the matrix of ones and k ∈ {0, 0.2, 0.4} is the magnitude of the mutual correlations. All other settings including the dimensions of X, regression vector β * , noise vector u, signal-to-noise ratio, testing vectors, and tuning parameters are the same as in Section 4.3. The personalized prediction error |z (β * −β[r])| is averaged over 100 data vectors z where the tuning parameter r ∈ T is calibrated using PAV edr , 5-fold CV, and 10-fold CV. We run 100 experiments for each set of parameters and report the averaged results in Tables 5, 6, and 7 where the run time is scaled rel-  ative to PAV edr . We observe that PAV edr clearly outperforms 5-fold, 10-fold CV, and Fridge both in terms of accuracy as well as in standard error in all considered cases. This demonstrates that PAV edr can effectively account for individual heterogeneity of the data vectors even in cases of correlated covariates.