Stereological determination of particle size distributions for similar convex bodies

Consider an opaque medium which contains 3D particles. All particles are convex bodies of the same shape, but they vary in size. The particles are randomly positioned and oriented within the medium and cannot be observed directly. Taking a planar section of the medium we obtain a sample of observed 2D section profile areas of the intersected particles. In this paper the distribution of interest is the underlying 3D particle size distribution for which an identifiability result is obtained. Moreover, a nonparametric estimator is proposed for this size distribution. The estimator is proven to be consistent and its performance is assessed in a simulation study.


Introduction
In the classical Wicksell corpuscle problem [1] spheres are randomly positioned in an opaque body.The problem is to estimate the size distribution of the spheres using the circular profiles observed in a planar section.The motivation of the problem originated from anatomy as well as astronomy.In the anatomical setting it is of interest as so-called follicles may be observed in slices of organs during post-mortem studies.Such follicles are approximately spherical, resulting in approximately circular section profiles from the intersected follicles.We may then wonder what is the distribution of the radii of the follicles.Questions of similar nature appear in the field of materials science.An important feature of the so-called microstructure of a steel are the grains.Knowing the size distribution of the 3D grains allows for studying the relationship between grain size distribution and mechanical properties of the metal.It is much simpler to obtain 2D information by observing a planar cross section of the metal, compared to obtaining 3D information of the steel's microstructure.Hence, it is of interest to use the 2D observations for estimating 3D information.These problems belong to the field of stereology, which deals with the estimation of higher dimensional information from lower dimensional samples.
We study a generalization of the Wicksell problem.Consider 3D particles, convex bodies to be precise, which are randomly positioned in an opaque body and randomly oriented.A convex body is a compact and convex set with non-empty interior.These particles all have the same known shape, but they do not have the same size.The particles cannot be observed directly, we only observe 2D section profiles of these particles in a planar section.We address the statistical problem Figure 1: Left: Random spatial system of convex dodecahedra intersected with a plane.Right: Observed section profiles. of estimating the size distribution of the particles, using a sample of observed areas of the section profiles.A visualization of the problem setting is given in Figure 1.In this particular example each particle is a convex dodecahedron.
An overview of estimators for the size distribution in the spherical setting is presented in [2].The problem has been studied for shapes other than spheres as well.In [3] the case of cubic particles is considered.In [4] a variation of the problem is studied: the particles are random polyhedra, and therefore not all particles have the same shape in this setting.A system of oriented cylinders is considered in [5].
The main contributions of this paper are as follows.A key insight in our instance of the problem highlights that we can separate the shape of the particles from their sizes in the sense that an observed area may be interpreted as the product of two independent random variables, one related to the particle size and the other related to the known particle shape.The density function of the shape-related random variable is explicitly known only in exceptional cases, therefore we rely on the simulation procedure proposed in [6] that can be used to approximate it arbitrarily well.
Using that shape-related distribution as ingredient, we design a maximum likelihood procedure to estimate the size distribution of the particles, a procedure that can be used for a large class of possible shapes.Furthermore, we show consistency of the resulting estimator and provide algorithms that can be used to compute it.Additionally, we assess the proposed estimator in a small simulation study in which we focus on convex polyhedra for the shape of the particles.
The paper is organized as follows.In section 2 we introduce necessary notation and definitions.In section 3 an integral equation is derived which describes the problem.Via this equation we obtain an identifiability result in section 4 stating that the profile area distribution uniquely determines the 3D size distribution.We define an estimator for the so-called biased size distribution in section 5.In section 6 we prove consistency of this estimator.Algorithms for computing the proposed estimator are discussed in section 7.In section 8 we describe how to estimate the particle size distribution via the biased size distribution.In section 9 some simulations are performed and at the end of the paper we provide some conclusions in section 10.

Preliminaries
In this section we introduce necessary notation and collect some preliminary results which are needed in the rest of this paper.We consider a system of randomly positioned particles, and each particle is a convex body.Formally, a convex body is a convex and compact set with non-empty interior.Given a λ > 0 and a set A ⊂ R 3 the scalar multiplication of λ with A is defined as: λA = {λx : x ∈ A}.Let SO(3) denote the rotation group of degree 3. It contains all 3 × 3 rotation matrices, which are orthogonal matrices of determinant 1.Some definitions are necessary to precisely describe what is meant by a random plane section of a particle.The sphere in R 3 is given by: S 2 = {(x 1 , x 2 , x 3 ) ∈ R 3 : x 2 1 + x 2 2 + x 2 3 = 1}.The upper hemisphere in R 3 is given by: S 2 + = {(x 1 , x 2 , x 3 ) ∈ S 2 : x 3 ≥ 0}.Let σ 2 denote the spherical measure on S 2 , also known as the spherical Lebesgue measure on S 2 .In integrals over (a subset of) S 2 the notation dθ should be interpreted as dσ 2 (θ).A plane in R 3 may be parameterized via a unit normal vector θ ∈ S 2 + and its signed distance s ∈ R to the origin: with •, • being the usual inner product in R 3 .
We define what is meant by an Isotropic Uniformly Random (IUR) plane hitting a given convex body in R 3 .The notion of IUR planes was introduced in [7], we follow the definition as in section 5.6 in [8].
Definition 1 (IUR plane).An IUR plane T hitting a given convex body K ⊂ R 3 , is defined as T = T Θ,S where (Θ, S) has joint probability density, given by: with T θ,s as in (1) and b(K) is the mean caliper diameter, or mean width of K: Here, p θ (K) represents the orthogonal projection of K on the line through the origin with direction θ.L(p θ (K)) is the length of this orthogonal projection, and may also be called the width of K in direction θ.Convexity of K ensures L(p θ (K)) is the length of an interval.
Loosely speaking this means that for an IUR plane through K, every plane which has a nonempty intersection with K has equal probability of occurring.We need the following lemma, which appears as proposition 1 in [7]: 1. Hitting probability: 2. Conditional property: Given that T hits K, i.e.T ∩ K = ∅, T is an IUR plane hitting K.

Derivation of the stereological integral equation
In this section we give a formal description of the model and derive a stereological integral equation.
As mentioned in the introduction, stereology deals with estimating higher dimensional information from lower dimensional samples.The stereological equation in this section directly relates the distribution of the 3D particle sizes to the distribution of observed 2D section profile areas.We derive an expression for the density f A of the observed section areas.Another derivation of this density appears in chapter 16 of [9].The derivation has two purposes, it provides an intuitive understanding of the problem and the equation is used for defining an estimator.Let Q ⊂ R 3 be the opaque convex body containing the randomly positioned particles.The intersection of Q with a random plane yields a sample of observed section profile areas.For now, assume that Q contains just one particle, a convex body K 1 .Assume that K 1 is similar to a known convex body K ⊂ R 3 , which we refer to as the reference particle.This means that there exists a rotation M ∈ SO(3), a point x ∈ R 3 and a scalar Λ > 0 such that K 1 = ΛM K + x := {ΛM k + x : k ∈ K}.We refer to the scalar Λ as the size of K 1 , which is distributed according to an unknown size distribution with CDF H and PDF h.As such the size is the scaling with respect to the reference particle, which has size 1.The mean size is denoted by: and we assume 0 < E(Λ) < ∞ throughout.Let T be an IUR plane hitting Q.Let B := {T ∩K 1 = ∅} be the event that K 1 is hit by T .By Lemma 1, the probability that K 1 is hit by T given that it has size λ is given by: Here, we use the fact that b(λK) = λ b(K) and b(K) is invariant under rotations and translations of K.While Λ is drawn from H, the size of a particle which appears in the plane section follows a different distribution from H. By this we mean that Λ|B is not distributed according to H. Note that the probability in ( 2) is proportional to λ, via Bayes' rule the density of Λ|B, denoted by h b is computed as: Throughout this paper we refer to h b as the density of the length-biased size distribution associated with h.Let H b be the CDF corresponding to h b and note that H and H b are related via: and We refer to H b as the length-biased size distribution function, or the length-biased version of H.For an elaborate overview of length-biased and more generally size-biased distributions we refer to [10].
The authors also prove the following general property of length-biased distributions: if Λ b ∼ H b and Λ ∼ H, then: P(Λ b ≥ λ) ≥ P(Λ ≥ λ).Hence, as H b is the size distribution of the particles which appear in the plane section, this means that larger particles are more likely to appear in the cross section.
We can now derive the distribution of an observed section area, resulting from K 1 being hit by the section plane.Conditional on K 1 being hit let A := area(K 1 ∩ T ).By the conditional property of IUR planes in Lemma 1, given that T hits K 1 it is an IUR plane hitting K 1 .Therefore, if K 1 with size λ appears in the section plane, its section area is distributed according to G λK .Using the rules of conditional probability we find: Using point 2 of Theorem 1, F A may be written as: Suppose now that we randomly position and orient non-overlapping particles K 1 , K 2 , . . . in Q, each similar to K.More specifically, the centers of the particles are distributed according to a homogeneous Poisson point process.As for the orientations, all orientations of the particles are equally likely and independent.The sizes of the particles Λ 1 , Λ 2 , . . .are independent and identically distributed (iid) according to H. Intersecting Q with an IUR plane yields an iid sample A 1 , . . ., A n from F A of observed section areas, for some random n.Let K be a convex body such that G K has a density g K , recall Theorem 1.Let a max be the largest possible section area of K, such that g K has support (0, a max ).Then, F A has a density given by: The stereological equation ( 5) directly relates the sizes of the 3D particles to the areas of the observed 2D section profiles.
Example 1 (Wicksell's corpuscle problem).Choose for the reference particle K = B(0, 1) = {x ∈ R 3 : x ≤ 1}, the ball with radius 1. Then: g K (z) = 1/(2π 1 − z/π), 0 < z < π.We may interpret H as the distribution function of the radii of the 3D balls.Note that any plane section of a ball yields a circular disc.Given A ∼ f A set A = πR 2 , the density of the observed circle radii satisfies: f R (r) = f A (πr 2 )2πr, r > 0. Combining this with (5) yields: which corresponds to the well-known Wicksell's integral equation [1].
Remark 1.By taking an appropriate choice for the reference particle, the size distribution may be directly related to a more convenient distribution.For example, if the reference particle has diameter 1, then the size distribution corresponds to the distribution of the diameters of the particles.When choosing a reference particle with volume 1, then a particle with size λ has volume λ 3 .The volume distribution function is then given by 3 ).
The derived stereological equation also holds under different assumptions.The random system of particles may be defined by choosing an isotropic typical particle, and then positioning the particles using a stationary point process on R 3 .This model is also known as a germ-grain model.Relevant references are sections 6.5 and 10.5 in [2], as well as [11] and [3].Hence, there is no need to restrict the particles to an opague body or to position the particles via a Poisson point process.
In this setting, let N V denote the expected number of 3D particles per unit volume, which corresponds to the intensity parameter of the point process.Intersecting the system of particles with a plane, let N A denote the expected number of observed 2D section profiles per unit area.By combining the well known stereological equation (Theorem 10.1 in [2]): and (4), yields: A derivation of a slightly more general version of (6) may be found in chapter 6 of [12].We specifically mention (6) since it appears more frequently in the literature than (5).
In order to obtain a better understanding of the problem it helps to apply a transformation.We apply a square root transformation to (4).For A ∼ F A , set S = √ A such that S ∼ F S and This expression may be recognized as the distribution function corresponding to a product of two independent random variables.This is a key insight which is made precise in the following lemma.Proof.Let X, Y, Z be non-negative random variables, with CDF F X , F Y and F Z respectively.If X = Y Z with Y and Z independent, then their distribution functions are related via: Comparing this with (7), the result is immediate.
Let us provide some further intuition for Lemma 2. Note that point 2 of Theorem 1 means that for a given size λ > 0, if Z ∼ G K then Zλ 2 ∼ G λK .As the sizes of the particles in the section plane are distributed according to H b , this hints towards the relationship given in Lemma 2.
Therefore, there are two main considerations in this problem.First, the size distribution of particles appearing in the cross section is a length-biased version of the actual size distribution.Second, we can separate the common shape of the particles and their sizes in some sense.Taking a random size from H b , and independently taking an IUR section of the reference particle yields a sample from F A via the relationship given in Lemma 2.

Identifiability of the particle size distribution
In this section we present a general identifiability result for our model.This means that under appropriate conditions, given a known reference particle, there are no two size distributions which yield the same distribution of observed section areas.For this result we need the Mellin-Stieltjes transform, which we will also refer to as the Mellin transform.While characteristic functions appear naturally when studying sums of independent random variables, the Mellin transform is appropriate when studying products of independent random variables.We collect some properties of the Mellin transform, for details we refer to section 7.8 in [13] and [14].We note that the use of the Mellin transform for this problem was already considered in [15].The authors obtain a slightly different expression due to the fact that an inversion formula for the density h was derived and because the density f A in (5) was studied up to a normalization constant.The identfiability result in this section is new, a sufficient condition for identifiability in this context has not been derived before.
Definition 2 (Mellin-Stieltjes transform).Given a non-negative random variable X, with CDF F , the Mellin-Stieltjes transform of X is defined as: for s ∈ C, whenever the integral is absolutely convergent.
Note in particular, that whenever x c−1 dF (x) < ∞ for some c ∈ R, then the Mellin transform exists for all s = c + it, t ∈ R. Hence, existence of the Mellin transform corresponds to the existence of moments of a distribution.Let St(α, β) := {s ∈ C : α < Re(s) < β} denote the open strip parallel to the imaginary axis.Analogously, St[α, β] := {s ∈ C : α ≤ Re(s) ≤ β} denotes the closed strip.If we find α < β such that the Mellin transform of X converges absolutely on St[α, β], then M X is analytic on St(α, β).Taking α as small as possible and β as large as possible, this open strip is referred to as the strip of analyticity of M X .A Mellin transform uniquely determines a distribution in the following sense: The proof is given in Appendix A. A similar statement is proven in Theorem 8 in [16] for the case that the CDF has a Lebesgue density.Finally, we recall the Mellin convolution theorem.Let X, Y, Z be non-negative random variables, such that X = Y Z with Y and Z independent.For any s ∈ C such that M Y (s) and M Z (s) are finite: Having collected these properties we now state the identifiability result.
Theorem 2 (Identifiability).Suppose we are given densities f A , g K such that f A can be expressed as in (5) for some CDF H.
Proof.We first consider statement 1 of the theorem.Let Λ b ∼ H b , with H b as in (3), and let Λ ∼ H. Let Z ∼ g K and A ∼ f A .We first determine on which strips the Mellin transforms of the random variables of interest are analytic.Since for all s ∈ St(1/2, 1) and M Λ 2 b is analytic on St(1/2, 1).Choose α > 0 such that E(Z −α ) < ∞.Because g K has bounded support, all non-negative moments of Z exist and therefore M Z is analytic on St(1 − α, ∞).By Lemma 2 and the Mellin convolution theorem we obtain: ).Moreover, this also means that M A is analytic on St(max{1 − α, 1/2}, 1).Let c ∈ (max{1 − α, 1/2}, 1).Define: For all s ∈ L Z we find: As a result there is a unique analytic continuation of f to L. The uniqueness of this analytic continuation implies: b (s), for all s ∈ L. Suppose H also satisfies ( 5), with Hb denoting its length-biased version and Λb ∼ Hb .Then, following the same steps as before, we obtain: b and Λ2 b have the same CDF.Therefore, for all x ∈ R: By (3) this also implies H = H.
The proof of the second statement of the theorem is analogous, we simply highlight the differences.Let δ > 0 be such that In this case we take c ∈ (1, 1 + δ/2) and the remainder of the proof is as before.
We obtain as a consequence: Corollary 1.In the following cases the distribution function H is identifiable: 1.The Wicksell corpuscle problem.

G S
K has a bounded density g S K .
Proof.Recall the expression for g K in Example 1. Then: This integral may be computed by substituting t = z/π and recognizing the resulting integral as an integral of a constant times the density of a Beta distribution.Hence, condition 1 of Theorem 2 is satisfied.If G S K has a density g S K with g S K ≤ B, then: Therefore, in this case condition 1 of Theorem 2 is also satisfied.
Identifiability for the Wicksell problem is a classical result, in this case there is also a well-known explicit inverse relation.

Remark 2. The condition in Theorem 2:
∞ 0 x 1+δ dH(x) < ∞, for some δ > 0, is also implied by the assumption H(M ) = 1 for some M > 0. Recall the derivation of (5) in section 3, a maximum size of the particles is clearly enforced by that fact that they are contained within the body Q.This is a typical assumption in stereological problems.
Note that the proof of Theorem 2 also presents (a rather implicit) inversion formula for H b .Assume H b is continuous.Let c be as in the proof of Theorem 2. Since analytic functions only have isolated zeros, M Z (c + it) = 0 for almost all t ∈ R. By using the Mellin inversion formula as in the proof of Lemma 3: H can then be retrieved via (3).
5 Estimator for the length-biased particle size distribution In this section we propose an estimator for the length-biased size distribution H b .The proposed estimator is inspired by the approach taken in [17], for Wicksell's corpuscle problem.Given the random fraction interpretation of Lemma 2, first estimating H b seems a natural intermediate step.
We note that biased or weighted distributions frequently appear in stereology, see also section 7.5 in [11].
The reference particle K is considered to be known and we assume that it satisfies one of the conditions in Theorem 1 such that G S K has a density g S K .We stress that this also means that we consider g S K to be known.While there are very few shapes for which an explicit expression is known for g S K , in [6] a Monte Carlo simulation scheme is proposed which can be used to approximate such a density arbitrarily closely.To give some insight in how these densities look, see Figure 2 for approximations of these densities for the cube, dodecahedron and tetrahedron.These approximations are obtained by computing a kernel density estimator with boundary correction, based on a sample of size N = 10 7 .
Recall the square root transformation and the resulting expression (7).Because G S K has density g S K , F S has a density f S given by: Keep in mind that g S K is supported on (0, √ a max ), such that the lower bound of the integration region is effectively s/ √ a max .
Suppose we have a sample of observed section areas: (9).Now, let s 1 < s 2 < • • • < s n be a realization of the order statistics of S 1 , . . ., S n .We use (9) to implicitly define an estimator for H b via nonparametric maximum likelihood.This is achieved by considering a large class of distribution functions for H b .
Let F + be the class of all distribution functions on (0, ∞).Define: for some 0 < s 0 < s 1 .This means that F + n contains all piece-wise constant distribution functions with jump locations restricted to the set of observations, the s i 's.Note that as n → ∞ the set of observed s i 's becomes dense in the support of f S and the class F + n grows to the class of all distribution functions with the same support as f S .
When choosing the size of the reference particle K, it is important that a max ≥ 1.This is due to the choice of the sieve F + n .Then, as n tends to infinity F + n grows to the class of distribution functions which also contains the true CDF H b , since M √ a max ≥ M .Taking a very large K means a max is large, such that M √ a max is much larger than M .Then, the s i 's will be quite sparse in [0, M ], which is also undesirable.For the sake of interpretability of H, recall Remark 1, we choose a K with volume 1.
For the shapes considered in simulations we observed a max ≥ 1.
For H b ∈ F + n we define the (scaled by 1 n ) log-likelihood: A maximum likelihood estimator (MLE) Ĥb n for H b is defined as a maximizer of the log-likelihood L, which may be written as:

Ĥb
n ∈ arg max The following theorem shows that this estimator is well-defined, and provides a sufficient condition for uniqueness: Theorem 3 (Existence and uniqueness of Ĥb n ).A maximizer of The log-likelihood L in F + n always exists.The maximizer is unique if the matrix A = (α i,j ), with α i,j = g S K (s i /s j )/s j , i, j ∈ {1, . . ., n}, is full-rank.
Proof.For H b ∈ F + n define: β j = H b (s j ) and write β = (β 1 , β 2 , . . ., β n ) T .Consider the closed convex set: The maximization problem ( 11) is equivalent to maximizing l : C → R ∪ {−∞} with l given by: where α i,j = g S K (s i /s j )/s j and β 0 = 0.The set C is closed and bounded, and therefore compact.Because of the continuity of l on C, it has a maximum.We now show that l is strictly concave if and only if A = (α i,j ) is full-rank.Strict concavity implies uniqueness of the maximum as well as the maximizer.Fix β ∈ C such that l(β) > −∞.Let j, k ∈ {1, . . ., n}, computing the partial derivatives and Hessian of l yields: Here, we have set α i,n+1 = 0 for all i ∈ {1, . . ., n}.Since l(β) > −∞, there are no divisions by zero in ( 14) and (15).Note that the following holds for k ∈ {1, . . ., n}: Using this fact we show that the Hessian of l is negative definite if and only if A is full-rank.Let γ ∈ R n and set γ 0 = 0, then: Clearly, H(β) is negative semidefinite.Note that the following holds: α i,j (γ j − γ j−1 ) = 0, for all i ∈ {1, . . ., n}.
Recall that g S K is supported on (0, √ a max ).Suppose we choose the reference particle such that √ a max = 1 + for some small > 0.Then, g S K (1) > 0 ensuring that the diagonal of A contains positive entries.Whenever s i /s j > 1 + for all i > j, A is an upper triangular matrix, because all entries below the diagonal are zero.It is well-known that such matrices are of full-rank.If > 0 is chosen sufficiently small, then with high probability s j+1 /s j > 1 + for all j ∈ {1, . . ., n}, such that the MLE is unique with high probability.For the sake of convenience we will refer to Ĥb n as the MLE, even though we cannot always guarantee uniqueness.Note especially for the consistency result in the next section that consistency of the MLE should be interpreted as consistency of any sequence of MLE's.
From the proof of Theorem 3 it is clear that Ĥb n may be computed by maximizing l.Because l is a concave function, computing Ĥb n can be done efficiently as we will discuss in section 7.

Consistency of the maximum likelihood estimator
In this section we show that the MLE Ĥb n (11) for H b is uniformly strongly consistent.In order to prove this, we transform the problem into a deconvolution problem.Deconvolution problems have been studied before quite extensively, see for example [18] and [19].We use some results on estimators in deconvolution problems to show consistency of the MLE.In deconvolution problems it is typical that assumptions are made on the so-called noise kernel to ensure consistency.For this problem this translates into assumptions on the density g S Z by: f Y (y) = f S (e y )e y , f (z) = g S K (e z )e z .The distribution function of X is given by F X (x) = H b (e x ).We then obtain: with X and independent.Note that f Y is the convolution of f ε and F X : In this setting, F X is the distribution function of interest.We do not have direct observations from F X , there is additive noise from the known distribution of .Let F be the class of all distribution functions on R. Define: The observed order statistics s 1 , . . ., s n are transformed as well: y i = log(s i ), i ∈ {0, 1, . . ., n}.We proceed similarly as before, the log-likelihood may be written as: A maximum likelihood estimator Fn for F X is defined as: Fn ∈ arg max We now show that we may assume Ĥb n (x) = Fn (log(x)).The likelihoods of the two problems are related as follows.Let If we find a distribution function which provides a better likelihood in one of the problems, we immediately obtain a distribution function which provides the same improvement in likelihood in the other problem.So indeed, there exist MLE's which are related via: Ĥb n (x) = Fn (log(x)).The estimator Fn was studied in [18] and shown to be strongly uniformly consistent under some conditions on f ε .This result may be used to show strong uniform consistency of Ĥb n , since: Let us now specify the assumptions we require for f .We assume it belongs to the class G of upper semicontinuous functions that are of bounded variation on a compact interval and monotone outside this interval.Let V b a (f ) denote the total variation of the function f on the interval [a, b], a < b.The class G may be written as: This corresponds with the following assumptions on g S K : Lemma 4. Assume that g S K is upper semicontinuous and of bounded variation on its support.Then, the density f : R → [0, ∞) given by f (z) = g S K (e z )e z belongs to G. The proof of this lemma can be found in Appendix A. We now collect some lemmas to obtain a consistency result for Fn .The following result can be found in [18], as Corollary 1: Lemma 5. Let Fn be the MLE for F X defined in (18).Assume f ∈ G. Set fn := f * d Fn , f Y := f * dF X , then almost surely: The following lemma is a generalization of Lemma 3 in [18].The proof is given in Appendix A. Lemma 6.Let f ε be a Lebesgue density on R. Let (F n ) n≥1 be a sequence of distribution functions on R, converging weakly to a distribution function F X .Then, for f n := f * dF n and f Y := f * dF X : We now state the following theorem, which closely follows the proof of Theorem 3 in [18].
Theorem 4 (Consistency of Fn ).Let f ∈ G. Assume that the deconvolution problem with this f is identifiable.Then, with probability one, lim n→∞ Fn (x) = F X (x) for each x where F X is continuous.If F X is continuous then with probability one: Proof.Let (Ω, A, P) be a probability space supporting a sequence Y 1 , Y 2 , . . . of iid random variables, distributed according to f Y as in (17).Set fn := f * d Fn .By Lemma 5 we know there exists a set Ω 0 ∈ A with P(Ω 0 ) = 1 such that for all ω ∈ Ω 0 we have fn ( Fix ω ∈ Ω 0 and choose an arbitrary subsequence (n l ) l≥1 ⊂ (n) n≥1 .By Helly's selection principle there exists a further subsequence (n k ) k≥1 ⊂ (n l ) l≥1 such that Fn k (•, ω) converges weakly to a distribution function F .By Lemma 6 this implies that fn k converges to f * dF in L 1 .Because the whole sequence fn converges to f * dF X in L 1 this implies F = F X by identifiability of the deconvolution problem.Therefore, every subsequence of MLE's contains a further subsequence converging weakly to F X .This implies weak convergence of the whole sequence to F X .Finally, the uniform result follows from the monotonicity of all distribution functions in the sequence and F X , and continuity of F X .
Turning to a consistency result for Ĥb n , we need to make sure that g S K satisfies the conditions in Lemma 4. If g S k satisfies these conditions, its boundedness implies the problem is identifiable by Corollary 1.Note that identifiability in the original problem implies identifiability in the corresponding deconvolution problem.
Corollary 2 (Consistency of Ĥb n ).Assume g S K is upper semicontinuous and of bounded variation on its support.Then, with probability one, lim n→∞ Ĥb n (λ) = H b (λ) for each λ where H b is continuous.If H is continuous, then so is H b , and with probability one:

Algorithms
In this section we describe some algorithms for computing the maximum likelihood estimator Ĥb n (11).Since a distribution function in F + n is discrete, it may be described by a probability vector.Let P n be the class of probability vectors in R n : p i = 1 and p i ≥ 0 for all i ∈ {1, . . ., n} .

A distribution function H b ∈ F +
n may be associated with the probability vector p ∈ P n defined as p j = H b (s j ) − H b (s j−1 ) (recall: H b (s 0 ) = 0).We can switch between probability vectors and distribution functions via:

Expectation Maximization (EM)
The EM algorithm was first thoroughly studied in [20].While it is typically used in parametric settings, it may also be used for non-parametric estimation.It is especially appealing due to its ease of implementation and its interpretation for incomplete data models.For the application of EM to our problem, we follow the description of EM in [21].The authors describe the EM algorithm for problems similar to the one we are facing.The class of problems they consider is the following.
Suppose we aim to estimate a distribution function F .We cannot directly observe a sample X from F .Instead, we observe Y = T (X, C), with X ∼ F and C some random variable independent of X.Clearly, given Lemma 2 the problem of estimating H b belongs to this class of problems with T (x, c) = xc and C ∼ g S K .Suppose we have an initial estimate H b 0 ∈ F + n of the CDF H b .Let p (0) be the associated probability vector as in (19).Let X 1 , . . ., X n ∼ H b .In [21] it is shown that in their general context the EM algorithm yields the following update rule: We use the notation P p to indicate the probability measure associated with the probability vector p.For our 'random fraction' setting, we use Bayes' rule to obtain: Plugging this into (20) yields: When terminating the EM algorithm after an appropriate number of iterations we obtain Ĥb n from p (k) via (19).The EM algorithm may for example be terminated when successive iterations do not meaningfully change the log-likelihood anymore.We do not provide a specific stopping criterion for EM, since we do not use it directly.We only use it in hybrid form with the Iterative Convex Minorant algorithm (ICM) which is described in the next section.For the ICM algorithm and the hybrid ICM-EM algorithm we do provide explicit termination conditions.

Iterative Convex Minorant (ICM)
The ICM algorithm was first introduced in [19].The version of ICM we discuss is described in [22] and is sometimes called the modified ICM algorithm.This modification of ICM ensures convergence under fairly general conditions.The algorithm is designed to minimize a convex function φ over the closed convex cone: Recall equation ( 13), computing the estimator Ĥb n is equivalent to solving the following optimization problem: With β j = H b (s j ) and β = ( Ĥb n (s 1 ), . . ., Ĥb n (s n )).From (22) it is clear that for any β ∈ C + with β n < 1, the likelihood can be increased by setting β n = 1, since α i,j ≥ 0. Hence, we may incorporate the constraint β n = 1 instead of β n ≤ 1.We achieve this via a Lagrange multiplier.Define the convex function φ : C + → R ∪ {∞} as: Hereby we have incorporated the constraint, with a Lagrange multiplier equal to one.Also, the problem is now written as a convex minimization problem since β ∈ arg min β∈C+ φ(β).Therefore, the ICM algorithm may be used to compute the MLE.Suppose we have some initial estimate β (0) .The idea of ICM is to locally approximate φ with the following quadratic form in iteration k: (23) The notation ∇φ is used for the gradient of φ, the vector of partial derivatives of φ.The matrix W is a diagonal matrix, its diagonal is often chosen equal to the diagonal of the Hessian matrix of φ: In the ICM algorithm, φ (k) is minimized over C + instead of φ to obtain a candidate β for β (k+1) .If this candidate β sufficiently decreases φ it is accepted, and we set β (k+1) = β.Otherwise, a linesearch is performed to obtain β (k+1) , which is then given by a convex combination of β and β (k) .A precise description of the algorithm is given in Appendix B. We remark that minimizing (23) is equivalent to computing the weighted least-squares estimator of a monotone regression function.This can be done efficiently, for more details see [22].The partial derivatives of φ are related to those of l as in ( 14) and (15), via: For ICM we use the following stopping criterion, stop whenever: for 10 successive iterations.In simulations we set ε = 10 −4 .The interpretation of this criterion is that we stop whenever the largest change in probability mass is below ε for 10 successive iterations.We note that this criterion could be inappropriate if ICM approaches the optimum very slowly.In simulations (section 9) this was not an issue.

Hybrid ICM-EM
In [21] it was proposed to combine ICM and EM into a hybrid algorithm.The idea is that a single iteration of this hybrid algorithm consists of first performing one iteration of the ICM algorithm followed by one iteration of the EM algorithm.The ICM algorithm appears somewhat slow initially, if it is started far from the MLE, whereas close to the optimal value it converges quickly.On the contrary, the EM algorithm seems quicker at the start but has trouble converging when close to the optimum.Moreover, when performing an EM step after an ICM step, ICM ensures that many of the p j 's are zero.From (21) we see that EM will never set such a p j to a positive value, hence EM only needs to operate in a lower dimensional space.In practice it seems that the hybrid ICM-EM algorithm inherits the strengths of both algorithms and is quicker than both ICM and EM.This was for example observed in simulations in [17] and [21].As with ICM, the same termination condition (24) is used.

Regularization of the maximum likelihood estimator
In this section we describe how the MLE Ĥb n may be used to estimate H, the distribution function of interest.At first glance it seems reasonable to plug in Ĥb n for H b in equation ( 3).Unfortunately, simulations indicate that this yields a poor estimate of H.In section 9 we describe in detail how simulations are performed.For now, Figure 3 shows the result of a single simulation run.This simulation corresponds to the case where each particle is a dodecahedron, n = 1000, and H corresponds to a standard exponential distribution.For this H, H b corresponds to a gamma distribution.Figure 3 (a) shows that Ĥb n closely resembles H b .Meanwhile, in Figure 3 (b) we observe that plugging in Ĥb n for H b in equation ( 3) yields a poor estimate of H.This is due to the influence of the behavior of Ĥb n near zero.We propose a regularization technique to resolve this issue.Let t n > 0, truncating Ĥb n at t n yields:

Plugging this truncated version of Ĥb
n into (3) we obtain: Therefore, we introduce a new parameter t n , which we refer to as the truncation parameter.In the following lemma we show that for an appropriate choice of the truncation parameter t n , a sequence of approximating CDFs converging to H b may be de-biased to obtain a close approximation of H.
Lemma 7. Let H be a continuous CDF on (0, ∞), with finite first moment and length-biased version H b .Let (t n ) n≥1 , t n > 0 be a sequence such that lim n→∞ t n = 0. Let (H b n ) n≥1 be a sequence of CDFs.Assume H b n converges uniformly to H b with rate at least t n , that is: The proof is given in Appendix A. Lemma 7 shows that truncation is a viable approach for consistent estimation of H.Note that in Lemma 7, we may take In practice the result cannot directly be applied to Ĥb n since the quantity Ĥb n −H b ∞ is unknown.We propose a rule of thumb for t n .Let s ∈ R and define: Hence, tn minimizes the L 1 -distance between the CDF of the observed square root section areas, induced by the estimated (biased) size distribution, and the empirical CDF of observed square root section areas.We minimize over {s 1 , . . ., s n } for computational convenience.

Simulations
In the previous sections we have introduced the MLE Ĥb n , and shown that under reasonable assumptions it is a consistent estimator of H b .Also, a regularization technique was introduced to consistently estimate the size distribution function H using the MLE.In this section some simulations results are presented to assess the performance of these estimators for H b and H.The code used for the simulations may be found at https://github.com/thomasvdj/pysizeunfolder.Using this code the simulation and estimation procedure can be carried out in principle for any choice of convex polyhedron for the reference particle K.
Let us start by describing how to generate an iid sample of observed section areas, for a given H and a chosen reference particle K. Lemma 2 shows that it is sufficient to draw Z ∼ G K and independently draw Λ b ∼ H b , followed by setting A := ZΛ 2 b .A may be considered a random section area, and repeating these steps n times yields an iid sample A 1 , . . ., A n distributed according to f A .Taking the square root yields a sample of observed square root section areas.A sampling scheme for generating IUR planes through K is described in [7], see [6] for drawing from G K .Finally, we consider some well-known parametric distributions for H, for these choices H b corresponds to some other well-known parametric distribution.Hence, drawing from H b is straightforward.The following choices for H are considered, with the corresponding H b : 1. Exponential distribution: For H we consider a standard exponential distribution, such that H b corresponds to a gamma distribution.
2. Lognormal distribution: For H we consider a lognormal distribution with parameters µ and σ.For this H, H b corresponds to a lognormal distribution with parameters µ + σ 2 and σ.
Here, Φ denotes the CDF of a standard normal distribution.
For the simulations we consider the following shapes for the particles: the dodecahedron, cube and tetrahedron.As for the specific choice of the reference particle K, each of the shapes are scaled such that they have volume 1.Note that for the dodecahedron each edge is parallel to exactly one other edge and the tetrahedron does not have any parallel edges.Therefore these shapes are such that G K has a Lebesgue density by Theorem 1.This is not the case for the cube.For the cube we could consider a perturbed cube by slightly tilting each of its faces, the resulting shape does not have any parallel edges.Note that we rely on Monte-Carlo approximations of g S K , we refer to [6] for further discussion on why it is reasonable to apply this density approximation procedure to the cube, even though it is not covered by Theorem 1.Now that we covered the simulation of iid samples we discuss the computation of estimators.For a given choice of n, H and shape for the particles we generate a sample of n observed (square root) section areas.The MLE Ĥb n is computed using the hybrid ICM-EM algorithm.The computation of the MLE requires that we can evaluate g S K in given points.As mentioned before, there is typically no explicit expression for g S K and we use the Monte Carlo simulation scheme described in [6] for approximating g S K (recall Figure 2).For estimating H we compute Ĥn (•, tn ) as in (25), with tn as in (27).Throughout this section we refer to this estimator simply as Ĥn .Note that for the computation of tn we require G S K , which is also not explicitly known.Hence, similarly to g S K we use a Monte-Carlo approximation of G S K .In this case we use an empirical distribution function based on the same sample used for approximating g S K .We perform repeated simulations as follows.For various choices of n we generate a sample of n observed section areas.This is repeated 100 times for each choice of n, H and shape for the particles.Simulation results for the cube are shown in Figure 4.These results correspond to n = 1000.Each of the blue lines corresponds to one of the 100 estimates, each estimate based on a different sample of size n = 1000.The black line is the point-wise average of all estimates.Further simulation results for the other shapes are summarized in Tables 1, 2 and 3. We quantify the error of the estimate as the supremum distance between the true H b and Ĥb n , and similarly for the error of the estimates of H.The mean error is then the mean taken over the 100 resulting errors of the estimates.For these 100 resulting errors the 2.5% and 97.5% quantiles are also shown.
Let us discuss the content of Tables 1, 2 and 3.As expected, as n increases the average error decreases, for all chosen shapes and size distributions, both for the estimates of H and H b .Comparing the average supremum error for a fixed n, and a fixed size distribution, it is clear that the errors are smallest for the dodecahedron, followed by the cube and finally the average error is largest for the tetrahedron.This is the case for both the average errors for estimating H as well as   Finally, we briefly touch upon computational efficiency of the algorithms for computing Ĥb n .We take for the shape of the particles the dodecahedron and for H the previously introduced lognormal distribution.In Table 4 the average run-times and iteration counts of the ICM and ICM-EM algorithms are shown, averaged over 10 simulation runs.The EM algorithm is not included in the table, in simulations it was several orders of magnitude slower than the other algorithms.Clearly, ICM-EM is considerably faster than ICM.

Concluding remarks
In this paper we have studied a generalization of the classical Wicksell corpuscle problem, considering an arbitrary convex shape for the particles instead of spheres.In particular, for the problem of estimating the CDF H of the particle size distribution an identifiability result is derived.We also obtain an inversion formula via the Mellin transform.A nonparametric maximum likelihood estimator is proposed for the biased size distribution H b and it is proven to be uniformly strongly consistent.Moreover, this estimator can be computed efficiently in practice.In a simulation study the proposed estimators for H b and H perform well for various choices of particle shapes and particle size distributions.
Proof of Lemma 4. f is upper semicontinuous, as it is given by a product, and a composition of an upper semicontinuous function and a continuous function.By Theorem 1, g S K is non-decreasing on (0, τ K ) for some 0 < τ K ≤ √ a max .Choose M > √ a max large enough such that e −M < τ K .It now |g S (e zi ) − g S (e zi−1 )| (28) Note that the first sum in (28) telescopes.In the final step we use the fact that g S K is bounded and is of bounded variation on its support.Because the above computation holds for arbitrary partitions of [−M, M ] we find: V M −M (f ) < ∞, which finishes the proof.Proof of Lemma 6.Because f ≥ 0 is a Lebesgue density, for every m ∈ N there exists a bounded continuous probability density function Via the triangle inequality and Fubini, the first term in (29) is bounded by: The second term in (29) may be written as: with ϕ n,m and ϕ m defined as: ϕ n,m (z) = f m (z − x)dF n (x), and ϕ m (z) = f m (z − x)dF X (x).
Because f m is a probability density, so are ϕ m and ϕ n,m for all n ∈ N. By the continuity of f m and the weak convergence of F n to F X we obtain that ϕ n,m converges pointwise to ϕ m as n → ∞.By Scheffé's Theorem pointwise convergence of probability densities to another probability density implies that these densities also converge in L 1 .Combining all results yields: Letting m → ∞ we obtain the desired result.
Lemma 8. Let f be a Lebesgue density on R, for every ε > 0 there exists a bounded continuous probability density function g such that g − f L 1 < ε.
Proof.Recall that the space of compactly supported continuous functions is dense in L 1 .For n ∈ N choose a continuous, compactly supported and non-negative function g n such that g n − f L 1 ≤ 1/(n + 1).By the reverse triangle inequality: Define: gn = g n / g n L 1 .Note that by (30), g n L 1 > 0. Hence, gn is a bounded and continuous probability density function.Combining all results: Because this holds for all n ∈ N we obtain the desired result.
Proof of Lemma 7. We first note the following: The numerator of (32) may be written as: The integral in (34) may be computed via integration by parts: Note that the bound in (34) is greater than H(t n ), by (31) this means that the bound also holds when taking the supremum over λ ≥ 0 instead.Combining (34) and (35) we finally obtain: Letting n go to infinity, H(t n ) converges to zero by the continuity of H. Using this and the fact that (H b n ) n≥1 converges uniformly to H b with rate t n (by assumption) the RHS of (36) converges to zero.k := k + 1 17: end while 18: return β (k)

Lemma 2 .
Consider a distribution function H with length-biased version H b .Suppose Z ∼ G K and Λ b ∼ H b with Z and Λ 2 b independent.Set A = ZΛ 2 b .Then, A ∼ F A , and F A , G K and H b are related via (4).

Figure 2 :
Figure 2: Left: Monte Carlo approximations of g S K , for various shapes K. Right: Approximations of g K , obtained via g K (z) = g S K ( √ z)/(2 √ z).
K .We start by rewriting the problem of estimating H b into a deconvolution problem.Recall Lemma 2, for S ∼ f S , √ Z ∼ g S K and Λ b ∼ H b we have: S = d √ ZΛ b , with √ Z and Λ b independent.Let us now perform a log-transformation, define: Y = log(S), = log( √ Z) and X = log(Λ b ).The densities of Y and are related to those of S and √

Figure 3 :
Figure 3: Left: MLE of H b .Right: Direct plug-in estimate of H.

Figure 4 :
Figure 4: Simulation results for the cube, n = 1000.Top left: H b is a gamma distribution.Top right: H is an exponential distribution.Bottom left: H is a lognormal distribution.Bottom right: H is a lognormal distribution.

Table 4 :
Algorithms mean run-times and mean number of iterations.