On asymptotic equivalence of the NPMLE of a monotone density and a Grenander-type estimator in multi-sample biased sampling models

Abstract: In this article, we show that the nonparametric maximum likelihood estimator (NPMLE) of the decreasing density function in the s-sample biased sampling models, is asymptotically equivalent to a Grenander-type estimator, namely the left-continuous slope of the least concave majorant of the NPMLE of the distribution function in the larger model without imposing the monotonicity assumption. Since the two estimators favor different proof directions in establishing weak convergence, we require additional results for both estimators so that the two estimators can be considered jointly in a unified approach. For instance, we employ an analytic argument for showing the tightness of an inverse processes associated with the NPMLE, since a conventional geometric approach used in the literature cannot be employed due to multiple biased samples. We demonstrate other results using numerical simulation and a real data illustration.


Introduction
Biased sampling problems have long been an important issue in a wide array of scientific studies. One of the most popular types of biased sampling is lengthbiased sampling which has been recognized in statistics for almost half a century in the studies of ecology [20,21], fiber length [5] and economic duration data [15,13]. This kind of biased sampling arises when a positive-valued outcome variable is sampled with selection probability proportional to its size/survival time, which occurs usually in cross-sectional studies. Another example of biased sampling is discussed in [22] who considered a generalized version of length-bias in a melanoma study. [23] and [4] considered observed blood alcohol concentration of drivers in traffic accidents as bias samples of the blood alcohol concentration of all drivers, with different biasing functions for various age groups. [7] considered the distribution of amino acid strain distance in vaccine studies as a multisample biased sampling problem.
A general formulation of s-sample (s ≥ 2) biased sampling problem is given as follows. Let G 0 be an unknown distribution function on R and there are s positive known weight functions w i (i = 1, . . . , s). Suppose s independent samples X i1 , . . . , X ini (i = 1, . . . , s) are observed, where each X ij independently follows the biased distribution F i (i = 1, . . . , s, j = 1, . . . , n i ) given by: where 0 < W i ∞ −∞ w i (y)dG 0 (y) < ∞ for i = 1, . . . , s. In the absence of the assumption on the shape of the distribution function or its underlying density, [28] established the unique existence of the nonparametric maximum likelihood estimator (NPMLE) G n of the unbiased distribution function G 0 in s-sample biased sampling models. Large sample theory of this NPMLE was investigated in [8]. Recently, [3] established the unique existence of the decreasing NPMLÊ g n in s-sample biased sampling models and also gave its asymptotic distribution at a fixed interior point where the underlying density has a strictly negative derivative; such a problem has been open in the literature due to certain non-standard structures of the likelihood function, such as non-separability and a lack of strictly positive second order derivatives of the negative of the loglikelihood function. Formally, denote G to be the set of all decreasing densities. For any g ∈ G, the likelihood evaluated at this g of the s-sample is proportional to The decreasing NPMLEĝ n ∈ G is defined such that L n (ĝ n ) ≥ L n (g) for all g ∈ G.
From an alternative perspective, another natural estimator for such a decreasing density is a Grenander-type estimator, which is the left-continuous slope, denoted byg n , of the least concave majorant of G n , i.e., the NPMLE of G 0 without the monotonicity assumption. In general, suppose that r is the underlying function of interest, for example, a density, hazard rate function or regressor, andR(t) is an estimator of t −∞ r(s)ds. A Grenander-type estimator is an estimator of r being monotone decreasing (resp. increasing), which is the left-continuous (resp. right-continuous) slope of the least concave majorant (resp. greatest convex minorant) ofR when r is decreasing (resp. increasing). There have been ongoing works related to Grenander-type estimators; see also [11] and the references therein. In the case when there is only one unbiased sample under the decreasing density assumption, [9] already showed that the NPMLE is exactly the left-continuous slope of the least concave majorant of the empirical distribution function, i.e., the usual NPMLE of the distribution function without constraint. Generally speaking, one may not expect this kind of correspondence to hold exactly, but perhaps just asymptotically; for instance, in the random right censorship model, [14] showed that the NPMLE of a decreasing density function is asymptotically equivalent to the left-continuous slope of the Kaplan-Meier estimator, the NPMLE of the distribution function in the absence of the monotonicity assumption. A similar result was also shown in [14] to hold for the NPMLE of a decreasing hazard rate, where the NPMLE is asymptotically equivalent to the estimator obtained as the left-continuous slope of the least concave majorant of the Nelson-Aalen estimator, i.e., the NPMLE of the cumulative hazard function without the monotonicity assumption. More recently, [18] showed that similar results are true in the Cox model; the NPMLE of an increasing baseline hazard and the left-hand slope of the greatest convex minorant of the Breslow estimator are asymptotically equivalent.
It would be of both theoretical and practical interests to see if this asymptotic equivalence also holds in other models such as that in [3]. Practically, for the s-sample biased modeling models, the computation of the NPMLE is done iteratively based on a self-characterization given in [3], where an initial consistent estimator is required since the corresponding optimization problem is non-convex. In [3], we suggested to use the Grenander-type estimator as an initial estimator, where a numerically efficient method of finding G n has already been discussed in [28]. The asymptotic equivalence of the two estimators implies that that this initial estimator is already as good as the NPMLE asymptotically. The main goal in the present article is to show thatg n andĝ n in s-sample biased sam-pling models are asymptotically equivalent in the sense that n 1/3 [g n (t 0 )−ĝ n (t 0 )] converges to 0 in probability as n → ∞. The rate n 1/3 is sought here because n 1/3 [ĝ n (t 0 ) − g 0 (t 0 )] converges in distribution to a nondegenerate limiting distribution (Theorem 1.1 in [3]), where g 0 is the true underlying density. To study the asymptotic equivalence of the estimators, one needs to consider the two estimators jointly. The Grenander-type estimator favors a geometric method by considering a switching relation and an inverse process, which is similar to [14]. On the other hand, the approach we took in [3] for the NPMLE made use of the continuous mapping argument for slopes of least concave majorant as illustrated by [1]. To establish the asymptotic equivalence of the estimators using a unified approach of making use of the switching relation and inverse process, we develop additional results for both estimators. In particular, we show the tightness of the corresponding inverse process of the NPMLE, by a pure analytic argument instead of a commonly-employed geometric argument.
The organization of our work is as follows. In Section 2, we describe the setting and notation. Consistency ofg n will be established in Section 3. The main results on asymptotic equivalence ofg n andĝ n will be proven in Section 4. The main results depend on the tightness of two inverse processes which will be shown in Section 5. In Section 6, numerical studies including simulation and an analysis of a real data set are performed to compare the effectiveness of the two estimators. Some concluding remarks are given in Section 7.

Setting and notation
Assume that the distribution function G 0 has a density function g 0 with respect to the Lebesgue measure which is known to be decreasing. Denote f i to be the density of the biased distribution F i with respect to the Lebesgue measure. The total sample size is n n 1 + . . . + n s . To study the asymptotic behavior of the estimatorsg n andĝ n for a nondegenerate model, we shall also assume that λ ni n i /n → λ i > 0 as n → ∞, corresponding to Assumptions 2.1 (A) in [3].
. Equality in distribution and convergence in distribution will be denoted by d = and d → respectively; convergence in probability will be denoted by P →; almost sure convergence will be denoted by a.s. −→. For any function K on [a, b] with K(a) = 0, the least concave majorant K of K is defined to be the smallest concave function that dominates over the domain so that u 1 , . . . , u s > 0. Without imposing the decreasing assumption on the unbiased density, [28] showed that the NPMLE G n can be written as Let G n be the least concave majorant of G n . Then the Grenander-type estimator g n is the left-continuous slope of G n . An important condition for the asymptotic properties of G n is the connectedness of the graph G on the s vertices i = 1, . . . , s formed by defining an edge between i and j if and only if 1(w i > 0)1(w j > 0)dG 0 > 0.

Consistency of Grenander-type estimatorg n
The consistency of the NPMLEĝ n has been shown in [3]. Before we proceed to the main results on asymptotic distribution in the next section, we show the consistency of the Grenander-type estimatorg n . Define || · || ∞ to be the supremum norm. Theorem 2.1 of [8] implies that ||G n − G 0 || ∞ a.s.
−→ 0 when G is connected. The same result is true forG n according to Marshall's Lemma ( [19]) as stated in Lemma 3.1.
To connect the consistency of G n with that ofg n , recall the following elementary convex analysis result (see e.g., p.330 in [25]): uniformly in x ∈ I, then for all x ∈ I, where H − and H + denote the left and right derivatives of H respectively.
The following proposition is an immediate consequence of Lemma 3.1 and Lemma 3.2. Proof. By considering an open interval, a subset of the interior of the support of G 0 , slightly larger than [σ, τ ], the pointwise result follows from Lemma 3.1 and Lemma 3.2. The uniform result holds as {g n } is a sequence of decreasing functions converging to the continuous decreasing density function g 0 , for example, see the auxiliary results as found in [2] or [24].

Asymptotic equivalence ofg n andĝ n
In the rest of this article, we assume that the support of g 0 is [a, b], where −∞ < a < b < ∞, and t 0 will be a fixed interior point of [a, b]. The following regularity conditions are assumed for establishing the asymptotic equivalence of g n andĝ n .

Assumption 1.
(i) The unbiased decreasing density g 0 is differentiable on the interior of its are Lipschitz continuous. [3] for establishing the asymptotic distribution ofĝ n . A direct consequence of Assumption (ii) warrants that G is a complete graph and hence connected. As a result, the s-sample biased sampling model without the monotonicity assumption is identifiable (Proposition 1.1 in [8]) and √ n(V n − V 0 ) follows asymptotically a multivariate normal distribution (Proposition 2.3 in [8]). In this section we will show the following main result of this paper.

Theorem 4.1. Under Assumption 1, we have
We first provide a roadmap on how Theorem 4.1 will be proven. One of the main difficulties is that bothĝ n andg n are implicitly defined without explicit simple analytic forms to analyse the difference between them directly. For example, it is not clear if some terms will cancel when we consider n 1/3 (ĝ n (t 0 )−g n (t 0 )) directly. Our strategy, on the other hand, is to first show that the joint distribution (n 1/3 {ĝ n (t 0 ) − g 0 (t 0 )}, n 1/3 {g n (t 0 ) − g 0 (t 0 )}) converges to the same limit, serving as a proxy, so that the difference of them converges to 0 in probability.
Sinceg n is defined as the slope of the least concave majorant of G n , to study its asymptotic distribution, we define, for t ∈ R, the local processes . The corresponding local processes forĝ n were defined in [3] and are given by: where T 1 , . . . , T n are the order statistics of all the samples X ij 's and c ik whereG n ,Ũ n ,Û n,ĝn andĜ n,ĝn are defined in (4.2) and (4.3). The proof of Theorem 4.1 will further depend on the following facts which will be proven: where for a > 0, Then by the definition ofg n and Proposition 3.2 in [3], with probability one, we have the following switch relations [see [10]]: (4.7) By (4.6) and (4.7), the convergence of (g n ,ĝ n ) and (S n ,Ŝ n ) can be related by the following argmax continuous mapping theorem. Let B loc be the space of all locally bounded real functions on R endowed with the topology of uniform convergence on compacta. That is, for , and x achieves its maximum at a unique point in R.
Proposition 4.2 (Theorem 6.1 in [14]). Let (J 1n , J 2n ) be a sequence of a pair of random mappings valued in B loc (R)×B loc (R) and (T 1n , T 2n ) be another sequence of random mappings into R × R such that: .
The remaining gap is to show the weak convergence of (Z n (t, a),Ẑ n (t, b)), which is closely related to that weak convergence ofŨ n andÛ n,ĝn which is given in the following proposition: The proof of Proposition 4.3 will require an additional lemma:

Lemma 4.4. Under Assumption 1 (ii), for any measurable set A, we have
and the proof of Lemma 4.4 will require an additional result: An interpretation of these lemmas are as follows. From Proposition 2.3 in [8], (1). Using this result, in Lemma 4.5, we shall show that the denominator of G n is essentially W −1 s . As a result, we can write G n as a sum of two terms, where the first involves only the weighted empirical distribution F n and the true quantity W k instead of the estimator V n , and the second term is an error term (see Lemma 4.4). This more convenient expression of G n will facilitate the proof of Proposition 4.3. In addition to Proposition 4.3, the proof of Theorem 4.1 will require the tightness of the inverse processesS n andŜ n which will be given in Section 5 and further results on the process S α,β (x) arg max t {αW (t) + βt 2 − xt} for any α ∈ R and β < 0: The results in Lemmas 4.6 and 4.7 are known but proofs of them are given in the Appendix for completeness.
Since the results are nested, we will show the proofs in the following order: Simple algebra gives that Note the class of functions involved in C 2 is, for large n, a subset of Define also Note that H −1 is a Donsker class as each function in H −1 is a finite sum of products of bounded functions from Donsker classes. That H is also a Donsker class follows as the functions in H −1 are bounded away from zero. Therefore, we have C 2 = O p (n −1/2 ). For C 3 , using the bounds on w i and the fact that Proof of Lemma 4.4. By telescoping the terms, we can write . Clearly, For A 1 , by Lemma 4.5, For A 2 , by using the same argument and similar calculation leading to bounding C 3 in Lemma 4.5, we also have A 2 = O p (n −1/2 ) A dF n (y).
Proof of Proposition 4.3. Fix K > 0. It suffices to show that sup t∈[−K,K] |Ũ n (t)− U n,ĝn(t) | P → 0. The second statement follows from the first statement and the fact thatÛ n,ĝn converges weakly in B loc (R) to U as shown in Lemma 6.5 in [3]. Consider t ∈ [−K, K]. Using Lemma 4.4, we have where the O p term is independent of t since } is a subset of a Donsker class. The second term is where t n is between t 0 + tn −1/3 and t 0 after applying mean value theorem. Hence, by (4.8), We first rewrite B 1 (t) as On the other hand, from the proof of Lemma 6.5 in [3], we havê , q n,t (y) n 1/6 1 λ and A 3 (t) = o p (1) which is independent of t.
ni n n 1/2 (p n,t (y) − q n,t (y))d(F i,ni − F i )(y), we shall show that, for each i = 1, . . . , s, n 1/2 (p n,t − q n,t )(y)d(F i,ni − F i )(y) converges to a Gaussian process with covariance function K i (u, t) = 0 on l ∞ ([−K, K]), where l ∞ (T ) denotes the space of all real-valued bounded functions onT equipped with the uniform norm. To this end, it suffices to show the validity of the three items in Condition (2.11.21) and the entropy integral condition in Theorem 2.11.22 in [26]. These can be checked in the same way as in the proof of Lemma 6.4 in [3] and are therefore omitted. We then see that, n 1/2 {p n,t (y) − q n,t (y)}d(F i,ni − F i )(y) for i = 1, . . . , s is asymptotically tight in l ∞ ([−K, K]) and converges in distribution to a Gaussian process with covariance function Similarly, it is also straightforward to see that lim n→∞ E Fi (p u,n − q u,n )E Fi (p n,t − q n,t ) = 0. Similarly, when u < 0 and t < 0, K i (u, t) = 0. When u and t are of opposite signs, K i (u, t) is also 0. In summary, to 0. By the independence of different samples, as an independent sum, From the proof of Lemma 6.5 in [3], we know A 2 also converges uniformly to 1 2 g 0 (t 0 )t 2 . Therefore, B 2 − A 2 converges uniformly to 0 on [−K, K].

Tightness of the inverse processesS n andŜ n
To close the final gap in the proof of Theorem 4.1, we show in this section the tightness of the inverse processesS n andŜ n in Lemmas 5.1 and 5.2.

Lemma 5.1. Under Assumption 1 (i)-(ii)
, for all ε > 0 and M 1 > 0, there is an M 2 > 0 such that The proof of Lemma 5.1 is similar to the proof of Lemma 5.3 in [12] (see also Lemma 7.1 in [14]), we therefore put it in the Appendix. The proof of Lemma 5.2, however, employs a different argument than those in the literature, by making the direct uses of the switch relation and the analytic properties ofĝ n through its accompanied Karush-Kuhn-Tucker condition, without the need to study the geometric relationship between the points {G n,ĝn (T i ), U n,ĝn (t)} and its least concave majorant. In particular, we will first show the following result: Lemma 5.3. Under Assumption 1, for any ε > 0, there exists C 0 > 0 such that for all C ≥ C 0 and all large n, To prove Lemma 5.3, we make use of Lemma 5.4 which assures that a certain event related to the Karush-Kuhn-Tucker condition happens with a small probability.

Lemma 5.4.
Under Assumption 1, for any ε > 0, there exists C 0 > 0 and R 0 > 0 such that for any C ≥ C 0 and 0 < R ≤ R 0 , we have for all sufficiently large n, Lemma 5.4 is similar to Lemma 5.11 and Lemma 5.12 in [3], but now on a different shifted interval. The proofs of Lemmas 5.3 and 5.4 will be given in the appendix.
Proof of Lemma 5.2. Fix ε > 0 and M 1 > 0. We shall show only that there exists M 2 > 0 such that as another case can be established similarly. By observing thatŜ n (a) is decreasing in a, we have Simple algebra and the switch relation (4.7) give that We claim in Lemma 5.3 that exists C 0 > 0 such that for all C ≥ C 0 and all large n, By definition of g 0 (t 0 ), there exists δ > 0 such that for all 0 < u < δ,

This implies that
Note that for all large enough n, we have Cn −1/3 < δ and so by the choice of C. Therefore, P(n 1/3 (g 0 (t 0 ) − g 0 (t 0 + Cn −1/3 )) ≤ M 1 ) = 0 for all large enough n and so the result of the lemma follows.

Simulations
In this section, we illustrate the finite sample performance of the proposed monotone MLE and compare it with the Grenander-type estimator. For the Grenander-type estimator, we first obtained the NPMLE G n for the unbiased distribution function G 0 [27,28]. The Grenander-type estimatorg n is then obtained as the slope of the least concave majorant of G n . To obtain the monotone MLE via the self-characterization introduced in [3], a set of initial guesses for pointwise density values is required. Given the initial guesses, sayẑ (0) , an updated set of estimates, denoted asẑ (1) , is defined as the solution of the right hand side of (3.5) of [3]. These updated valuesẑ (1) will then serve as the initial values for the next iteration and the procedure will continue iteratively until convergence.
Our simulation studies analyzed the performance of (i)g n , (ii)ĝ n,V , the monotone MLE based ong n as the initial guess, and (iii)ĝ n,R , the monotone MLE based on the density estimated from randomly drawn samples as the initial value, denoted byĝ (0) n,R . For the two-sample setting considered, we simulated n 1 exponentially-distributed samples with rate 0.5, which represented the true unbiased distribution, i.e. w 1 (t) = 1. In addition, we also generated n 2 = n 1 samples from the length-biased variation of the unbiased distribution, i.e. the weight function is set to be w 2 (t) = t. Define n n 1 + n 2 , we generated samples with n = 100, 200, 500 and 1000 with a balanced design. 500 iterations were carried out for each of the simulation exercises.
As we can see from the Table 1, the MLE procedure, regardless of which initial values chosen, produces virtually unbiased estimates whose standard errors are smaller than its counterpart estimated via the Grenander-type approach. This is sensible as the maximum likelihood procedure should give the most efficient estimates. Our numerical experience also suggests that the convergence of the self-characterization can be achieved after three to five iterations. Figure 1 demonstrates the initial values, the maximum likelihood estimator obtained after four iterations and the true density in one simulation scenario.
We also conducted simulation results to examine the validity of Theorem 4.1. We simulated samples from each of the following two distributions, namely (a) an unbiased exponential distribution (f 1 = g) with mean η = 2 and (b) a length biased exponential distribution f 2 whose density can be written as In such a setting, n 1 = n 2 (i.e. λ 1 = λ 2 = 0.5). Different sample sizes were generated to examine the corresponding finite sample performances. To verify the convergence properties of Theorem 4.1, namely (4.1), we also calculated the Kolmogorov-Smirnov (KS) test statistics for two-sided tests between the distribution of Δ n evaluated at t 0 = 0.5 and 2Y based on 500 repetitions with different sample sizes, where Δ n denotes the LHS of (4.1); see Table 2. All the KS tests are not rejected at α = 0.05.  (3) n,R (·).

Data analysis
We apply our methodology to a real-life application. As discussed in [23] and [4], the blood alcohol concentration (BAC) of drivers involved in fatal car accidents in the US demonstrate size bias between younger (< 30 years old) and older (≥ 30 years old) population. Drivers with higher BACs usually incur higher chances of being involved in fatal traffic accidents. We illustrate using a dataset obtained from the National Highway Traffic Safety Administration Department of Transportation (NHTSA) in the United States. In particular, the Fatality Analysis Reporting System (FARS) provides a collection of raw statistics recording all the qualifying fatal car crashes that occurred with the 50 states and the District of Colubmia. The FARS database (https://www.nhtsa.gov/crash-data-systems/fatality-analysis-reporting-system) stores three sections of incidents including the accident, the vehicle and person files in which our variable of interest, namely the BAC, measured in grams /deciltre (g/dL) and the ages of the associated drivers can be found. We focus our analysis to whole blood test results valued at or above 0.08 g/dL, the legal limit to define driving under influence after 2004, using drivers involved in accidents in all 50 states during 2009. The total number of samples considered in this analysis is 5,385 in which 3,086 (57.3%) of them were aged 30 or above.
Since the drivers with a higher BAC are more likely to be involved in an accident, the observed BAC in FARS data are biased towards larger values. To study the distribution of BAC among all drivers, [4] and [23] showed that it is reasonable to use the biasing functions w 1 (x) = √ x for younger drivers (age < 30) and w 2 (x) = x for older drivers (age ≥ 30). It is also reasonable to assume that BAC density is decreasing past the legal limit (BAC ≥ 0.08) since a high BAC is potentially lethal. The estimates for the population density of BAC conditional on BAC ≥ 0.08 is given in Figure 2. The Grenander-type estimator and the MLE gave almost identical results.

Discussion
This paper establishes the asymptotic equivalence between the nonparametric maximum likelihood estimate (NPMLE) of the decreasing density function and the Grenander-type estimator without any shape constraint under the s-sample biased sampling models considered in [3]. In particular, we show thatg n andĝ n in s-sample biased sampling models are asymptotically equivalent in the sense that n 1/3 [g n (t 0 ) −ĝ n (t 0 )] P → 0 as n → ∞. Instead of adopting the traditional approach of showing the tightness of the corresponding inverse processes, we develop a new technique of direct use of the switch relation for the NPMLE case. One possible extension is to investigate the corresponding inference procedure when the biasing functions w i (·), i = 1, . . . , s contain unknown parameters. We shall explore this matter in the future.
A reviewer suggested to consider the absolute difference betweenĝ n andg n . While the absolute difference between the Grenander estimator and true density is studied in [6], such results are not readily extendable to study the absolute difference betweenĝ n andg n . Moreover, as the Grenander estimator is known to be inconsistent at the boundaries of the support, it is impossible for sup t∈ [a,b] |ĝ n (t) − g 0 (t)| = o p (1). However, it is possible that sup t∈ [a,b] |ĝ n (t) − g n (t)| = o p (1). To obtain a rate of convergence of 0 of |ĝ n (t) −g n (t)| for t in a neighborhood of 0 and 1, additional analysis onĝ n andg n near the boundaries are required. This is possibly related to [17], where they have established some results for the Grenander estimator near the boundaries of the support. We expect that the technique of proofs developed in [17] and [6] together with the results in this paper and [3] are required to tackle this problem.

Appendix A: Proofs of auxiliary lemmas
Proof of Lemma 4.6. Using the fact that adding a constant will not affect the location of the maximum of a process and the stationarity of Brownian motion, Proof of Lemma 4.7. Let δ and γ be real constants. Using the scaling property of Brownian motion, we have Therefore, if 1−δ/2 = −2δ and −γ/2 = 1−2γ, we have S α,β (0) d = α −δ |β| −γ Y as multiplying a constant will not affect the location of the maximum of a process. Finally, the two equations imply that δ = −2/3 and γ = 2/3.
−→ 0 and Lemma 3.3 gives |g n (t 0 + u0 2 ) − g 0 (t 0 + u0 2 )| a.s. −→ 0. Hence, for each sample point ω, Thus, by the definition of the least concave majorant and the concavity ofG n , G n t 0 + n −1/3 t − G n t 0 + u 0 2 ≤G n t 0 + n −1/3 t −G n t 0 + u 0 2 + o ω (1) For all sufficiently large n, using the definition ofZ n (t, a), Z n (t, −M 1 ) −Z n (n 1/3 u 0 , −M 1 ) = n 2/3 G n (t 0 + n −1/3 t) − G n t 0 + u 0 2 − g 0 (t 0 )tn −1/3 + g 0 (t 0 ) u 0 2 +M 1 t − M 1 n 1/3 u 0 the present article at their home. Although he lost his father with the deepest sadness at the final stage of the review of this work, his father will never leave the heart of Phillip Yam; and he used this work in memory of his father's brave battle against liver cancer.