Species Abundance Distribution and Species Accumulation Curve: A General Framework and Results

We build a general framework which establishes a one-to-one correspondence between species abundance distribution (SAD) and species accumulation curve (SAC). The appearance rates of the species and the appearance times of individuals of each species are modeled as Poisson processes. The number of species can be finite or infinite. Hill numbers are extended to the framework. We introduce a linear derivative ratio family of models, $\mathrm{LDR}_1$, of which the ratio of the first and the second derivatives of the expected SAC is a linear function. A D1/D2 plot is proposed to detect this linear pattern in the data. By extrapolation of the curve in the D1/D2 plot, a species richness estimator that extends Chao1 estimator is introduced. The SAD of $\mathrm{LDR}_1$ is the Engen's extended negative binomial distribution, and the SAC encompasses several popular parametric forms including the power law. Family $\mathrm{LDR}_1$ is extended in two ways: $\mathrm{LDR}_2$ which allows species with zero detection probability, and $\mathrm{RDR}_1$ where the derivative ratio is a rational function. Real data are analyzed to demonstrate the proposed methods. We also consider the scenario where we record only a few leading appearance times of each species. We show how maximum likelihood inference can be performed when only the empirical SAC is observed, and elucidate its advantages over the traditional curve-fitting method.


Introduction
Estimating the diversity of classes in a population is a problem encountered in many fields. We may be interested in the diversity of words a person know from his/her writings [18], the illegal immigrants from the apprehension records [3], the distinct attributes in a database [24,16], or the distinct responses to a crowdsourcing query [48]. Among different applications, species abundance is the one that receives most attention. For this reason, it is chosen as the theme of this paper with the understanding that the proposed framework and methods are applicable in other applications as well.
Understanding the species abundance in an ecological community has long been an important task for ecologists. Such knowledge is paramount in conservation planning and biodiversity management [35]. An exhaustive species inventory is too labor and resource intensive to be practical. Information about species abundance can thus be acquired mainly through a survey.
Let N = (N 0 , N 1 , ...) where N k is the number of species in a community that are represented exactly k times in a survey. We do not observe the whole N , butÑ = (N 1 , N 2 , ...) which is the zero-truncated N . In other words, we do not know how many species are not seen in the survey. We call the vectorÑ , the frequency of frequencies (FoF) [21].
A plethora of species abundance models have been proposed forÑ . Comprehensive review of the field can be found in Bunge and Fitzpatrick [6] and Matthews and Whittaker [34]. A typical assumption in purely statistical models isÑ | N + ∼ Multinomial(N + , p), (1.1) where N + = ∞ k=1 N k is the total number of recorded species, and p = (p 1 , p 2 , . . .) is a probability vector, such that p k is the probability for a randomly selected recorded species to be observed k times in the survey for k = 1, 2, . . .. We call this vector p, the species abundance distribution (SAD). The great significance of (1.1) relies on three assumptions: (A1) the species names are noninformative; (A2) the observed data for different recorded species are independent and identically distributed (iid), and (A3) for each recorded species, its observed frequency contains all useful information.
Species accumulation curve (SAC) is another popular tool in the analysis of species abundance data. The survey is viewed as a data-collection process in which more and more sampling effort is devoted. The individual-based SAC is the number of recorded species expressed as a function of the amount of sampling effort.
Despite the different emphases of SAD and SAC, the two approaches have an overlapped target: Estimating D, the total number of species in the community. For SAD, it means estimating N 0 , the number of unseen species as D = N 0 +N + . For SAC, D is the total number of seen species when the sampling effort is unlimited.
As SAD depends largely on the sampling effort, it is necessary to include sampling effort explicitly in the model in order to make comparison of different SADs possible. This addition establishes a link between SAD and SAC. Sampling effort can be of continuous type, such as the area of land or the volume of water sampled, or the duration of the survey. Discrete type sampling effort can be the sample size. To emphasize the sampling effort considered, the SAC is called species-time curve, species-area curve, or species-sample-size curve when the sampling effort is time, area, or sample size respectively. Species-area curve has been studied extensively in the literature. Review of it can be found in Tjørve [46,47], Dengler [15] and Williams et al. [50]. In this paper, we use time as the measure of sampling effort. We consider the vector N to be a function of the time t, denoted as N (t). Notations N k ,Ñ , N + , p and p k are likewise denoted as N k (t),Ñ (t), N + (t), p(t) and p k (t) respectively. The (empirical) SAC is N + (t). Throughout the paper, we assume without loss of generality that the survey starts at time 0 and ends at time t 0 > 0. Because of the great similarity of the species-time relationship and species-area relationship [39,32], time and area are treated as if they are interchangeable in this paper. When SAC is a species-area curve, we refer to the Type I species-area curve [43], where the areas sampled are nested (smaller areas are included in larger areas), resulting in a curve that is always nondecreasing.
The study of the relation between the (empirical) SAD and the (empirical) SAC started early since the introduction of rarefaction curve ( [1,42]) which describes how we interpolate SAC using the observed FoF. Let n k (t 0 ) be the observed N k (t 0 ) for k = 1, 2, . . ., n + (t 0 ) = ∞ k=1 n k (t 0 ), andñ(t 0 ) = {n k (t 0 )} k≥1 . The rarefaction curve for species-time relationship iŝ The rationale of (1.2) is that 1 − (1 − t/t 0 ) k is the probability for a species with frequency k in time interval [0, t 0 ] to be observed before time t if the appearance times of it in [0, t 0 ] are iid U [0, t 0 ] distributed. Equivalent formula for speciesarea curve appears earlier in Arrhenius [1]. Good and Toulmin [22] proposed an extrapolation formula for species-sample-size curve which when expressed as species-time curve iŝ N + (t) = n + (t 0 ) + (1.4) whereĥ t (s) is an estimator of h t (s), the probability generating function of the SAD at time t. Equation (1.4) delineates a simple and yet convincing formula to find SAC from SAD and vice versa. The aim of this paper is to propose and study a statistical framework under which model in (1.1) and the relation in (1.4) hold whenN + (t), n + (t 0 ) andĥ t0 (s) are replaced by E(N + (t)), E(N + (t 0 )) and h t0 (s) respectively. Our framework assumes (A1), (A2) and that the appearance times of each species follow a Poisson process which is sufficient for (A3) and the validity of (1.2).
Finite D is commonplace in SAD approach although no consensus has been reached. Empirically, log-series distribution, an SAD that assumes infinite D, is one of the most successful models. Many SACs do not have asymptote. It is common that the observed number of rare species is large, and shows no sign to decrease. In Bayesian nonparametric approach in genomic diversity study, Poisson-Dirichlet process which assumes infinite D is used (see for example Lijoi et al. [31]). In reality, the existence of transient species (species which are observed erratically and infrequently [38]), and the error in the species identification process (a well-known example is the missequencing in pyrosequencing of DNA (see for example Dickie [17])) are continual sources of rare species making the species number larger than expected. In our framework, D is random and can be finite or infinite with probability one. We use species richness to refer to E(D) when D is random, and D when D is deterministic.
A special feature of our framework is that species can have zero detection probability. We call such species, zero-rate species, and all other species, positive-rate species. Zero-rate species can either be seen only once or unseen in a survey. If there are finite number of zero-rate species, the probability of observing any of them is zero. Therefore, if zero-rate species is observed in a survey, almost surely there are infinite number of them. We interpret observed individuals of positive-rate species as outcomes of a discrete distribution, where individuals of the same species can appear any nonnegative number of times in a survey. On the other hand, individuals of zero-rate species are outcomes of a continuous distribution, and no two such individuals belong to the same species. In our framework, the distribution is allowed to be a mixture of the two. Suppose we want to estimate the population of a town through recording each person we meet on street. Then tourists from distant countries can be viewed as "zero-rate species". In the first example in Section 9, it is suspected that the sequencing error in pyrosequencing of DNA may be a source for zero-rate species.
Poisson distribution is a main component in our framework. Though Poisson distribution is common in existing SAD models, time t scarcely plays a role. In mixed Poisson model (see for example Fisher et al. [20], Bulmer [5]), the observed frequencies of the species are independent and each follows a Poisson distribution with its own rate, say λ i for species i. The value D is a fixed unknown finite value and {λ i } are iid sample from a mixing distribution. Another related model was proposed in Zhou et al. [51]. The paper focuses mainly on finite D. Neither time nor zero-rate species are included in the model.
The outline of this paper is as follows. We propose in Section 2 a new model for the sampling process, called the mixed Poisson partition process (MPPP). We emphasize on the parametric approach where additional assumptions are made on top of the framework so that the process depends only on a few parameters. Once the parameters are estimated, we can make inference on different characteristics of the population, say the SAD at any fixed time, the Hill numbers, or the expected future data. We studyÑ (t) in Section 3. In Section 4, we consider the expected species accumulation curve (ESAC). We prove the one-to-one correspondence among (i) an MPPP, (ii) an ESAC which is a Bernstein function that passes through the origin, and (iii) the expected number of recorded species and the SAD at time t 0 such that the first derivative of its probability generating function is absolutely monotone in (−∞, 1). In Section 5, we introduce LDR 1 , a parametric family of ESAC. The SAD of LDR 1 is the Engen's extended negative binomial distribution. This family has an attractive property that the ratio of the first and the second derivatives of the ESAC is a linear function of t. We extend LDR 1 to LDR 2 which allows zero-rate species. In Section 6, a D1/D2 plot for LDR 1 and some diagnostic plots for specified LDR 1 distributions are proposed. Extrapolation of the curve in D1/D2 plot is considered in Section 7. Estimator of species richness is suggested basing on a modified first-order extrapolation. In Section 8, LDR 1 is generalized so that the derivative ratio is a rational function instead of a linear function of t. Four real data are analyzed in Section 9 to demonstrate the applications of the proposed models and the suggested plots. In Section 10, we propose and study a design where only a few leading appearance times of each species are recorded. In Section 11, we consider the maximum likelihood approach on the empirical SAC. We give a discussion in Section 12.

Mixed Poisson Partition Process
Poisson process is the backbone of the framework. A Poisson process is a point process characterized by an intensity measure over the n-dimensional space R n (we usually have n = 1 in this paper). The intensity measure which we denote as ω delineates how many points are present on average in different parts of R n . More precisely, the number of points in a set A ⊆ R n follows the Poisson(ω(A)) distribution. Furthermore, for any finite collection of disjoint subsets A 1 , . . . , A k ⊆ R n , S 1 , . . . , S k are mutually independent, where S i is the number of points in A i . If ω(dx) = f (x)dx, we call f (x) the intensity function. A simple example is the homogeneous Poisson process where ω(dx) = λdx for a constant λ. The parameter λ is called the rate of the process.
If ω is finite (i.e., ω(dx) < ∞), simulation of a Poisson process can be performed in two steps: (i) simulate the total number of points W ∼ Poisson( ω(dx)), and (ii) simulate X 1 , . . . , X W iid from the probability measure ω/ ω(dx). Nevertheless, if ω is infinite, then the number of points is infinite, and it is impossible to simulate all the points in the process. In this case, we can only simulate a selected finite subset of points of the process using the thinning property of Poisson processes [29]. For each point x in the Poisson process, let α(x) ∈ [0, 1] be the probability for the point x to be selected, and 1 − α(x) be the probability for it to be discarded. Then the selected points form a Poisson process with intensity measure αω, where αω(A) = A α(x)ω(dx). We can simulate the selected points when the measure αω is finite using the aforementioned method. For example, if we are interested only in the points that lie in a bounded region A ⊂ R n of a homogeneous Poisson process with rate λ, then α(x) = 1{x ∈ A} where 1{.} is the indicator function. The number of selected points follows Poisson(λ A dx) distribution, and the selected points are iid uniformly distributed in A.
We model the observations in a species abundance survey by a stochastic process where rates of the species follows a Poisson process.
Definition: (Mixed Poisson partition process) A mixed Poisson partition process (MPPP), G, is characterized by a nonzero species intensity measure ν, which is a measure over R ≥0 , the set of all nonnegative real numbers, satisfying The definition of an MPPP consists of three steps: 1. (Generation of positive rates of species) Given ν, defineν to be a measure over R >0 (R >0 is the set of positive real numbers) byν(dλ) = ν(dλ)/λ, (i.e., dν/dν = 1/λ for λ > 0). Let λ 1 , λ 2 , . . . (a finite or countably infinite sequence) be a realization of a Poisson process with intensity measureν. 2. (Generation of individuals of positive-rate species) For each λ i in Step 1, we generate a realization η i (independently across i) of a Poisson process with rate λ i . The realization η i represents the arrival times of a species with rate λ i . 3. (Generation of individuals of zero-rate species) We generate a realization η 0 of a Poisson process with rate ν({0}), independent of η 1 , η 2 , . . .. This represents the times of appearance for all the zero-rate species.
Finally, we take G = {η 1 , η 2 , . . .} ∪ η 0 . For any i ≥ 1, all points in η i are arrival times of the same species, whereas each point in η 0 is from a different zero-rate species (we use a slight abuse of notation to treat each point in η 0 as a point process with only one point).
Measures ν andν may be finite or infinite. Let Λ = ν(dλ). As the expected number of individuals seen in time interval [0, t] is equal to tΛ (see (3.4) in Section 3), we can interpret Λ as the expected total rate. When ν is finite (i.e., Λ < ∞), conditional on the event that there is an individual observed exactly at time t, the distribution of the rate of the species of that individual is ν/Λ (see Proposition A in Appendix A for a proof). Therefore, we can regard ν as the intensity measure of λ of the observed individuals. Ifν is finite, the expected number of positive-rate species in the community is finite. From Step 3 in the definition, if ν({0}) > 0, there are infinite number of zero-rate species and they arrive at a constant rate. With probability one, D is finite if and only if ν({0}) = 0 andν(R >0 ) < ∞. In such case, D follows a Poisson(ν(R >0 )) distribution. Measureν specifies the distribution of the rates of positive-rate species. More precisely,ν([λ 0 , λ 1 ]) with λ 0 > 0 is the average number of species with rate in [λ 0 , λ 1 ]. Measure ν specifies the distribution of the rates of the species of the individuals including those of the zero-rate species. More precisely, ν([λ 0 , λ 1 ]) is the average number of individuals (per unit time) belonging to species whose rates lie in [λ 0 , λ 1 ]. Condition (2.1) is essential because it is equivalent to the finiteness of the ESAC (i.e., E(N + (t)) < ∞ for any finite nonnegative t). A proof of it is given in Appendix B. Figure 1 illustrates the generation of the MPPP.
Whenν is infinite, we cannot simulate all λ i 's in Step 1 in practice. We can use the following method to simulate a realization of the process in inter- An illustration of the MPPP. In step 1, we generate the rates λ i of the positiverate species according to a Poisson process with intensity measureν. Ifν(R >0 ) < ∞, then this is equivalent to first generating npos, the number of positive-rate species according to Poisson(ν(R >0 )), and then generating a random sample λ 1 , , . . . , λn pos of size npos from the probability measureν/ν(R >0 ). In step 2, we generate the individuals of species i according to a Poisson process with rate λ i for i = 1, . . . , npos. In step 3, we generate the individuals of zero-rate species (species 6 to 9 in the figure) according to a Poisson process with rate ν({0}).
val [0, t 0 ]. The probability for a species with rate λ to be recorded in [0, t 0 ] is 1 − exp(−λt 0 ). Applying the thinning property, the intensity of the recorded positive-rate species is (1 − exp(−λt 0 ))ν which is always finite. To include also the recorded zero-rate species, the intensity is (1 − exp(−λt 0 ))λ −1 ν (we take (1 − exp(−λt 0 ))λ −1 = t 0 when λ = 0). After generating λ 1 , λ 2 , . . . according to a Poisson process with this intensity measure, we simulate for each i, a realization η i (independently across i) of a Poisson process with rate λ i , conditional on the event that η i has at least one point in [0, t 0 ] (if λ i = 0, then η i contains one uniformly distributed point in [0, t 0 ]). An alternative equivalent definition that unifies the generation of individuals of zero-rate species and positive-rate species is given in Appendix C.
A special feature of the framework is that D is random and can be infinite with probability one. This change necessitates modification of biodiversity measures, among which Hill numbers are popular. When D is deterministic, Hill number of order q [25] for q ≥ 0 and q = 1 is q D = i (p sp (i)) q where p sp (i) is the relative abundance of species i in an assemblage (the number of individuals of species i divided by the total population). When q = 1, 1 D is defined as lim q→1 q D = exp(− i p sp (i) log(p sp (i))). Hill numbers are interpreted as the effective number of species. The order q determines the sensitivity of the measure to species relative abundances. When q = 0, all species regardless its abundance are treated equally. Larger q imposes more weight to abundant species. Hill numbers encompass three important diversity measures as its special cases. They are the species richness (q = 0), the exponential of Shannon entropy (q = 1), and the inverse of Simpson index (q = 2). A notable property of Hill numbers is that they obey the replication principle: The diversity of an assemblage formed by pooling m equally abundant and equally large assemblages with no species in common is m times the diversity of a single assemblage. Under our framework, each species corresponds to a Poisson process, and the relative abundance of a species is its rate divided by the expected total rate, Λ. Reasonable modifications to the definition of Hill numbers are to replace p sp (i) with λ/Λ, where λ is the rate of a species, and to replace summation over species with integration with respect to the measure λ −1 ν so that the integral of λ/Λ is one. It works well when Λ is finite, but fails when Λ is infinite. To fix the problem, we need a meaningful surrogate rate which fulfills two requirements: (i) the expected total rate is finite, and (ii) it approaches the true rate λ as a limit. The first appearance time of a species with rate λ follows the exponential distribution with density function λ exp(−λt), which can be regarded as the instantaneous rate of its first appearance at time t. This rate approaches λ when t decreases to zero. The expected total instantaneous rate of first appearances over all species at time t is Λ t = exp(−λt)ν(dλ) (= E(N 1 (t))/t from (3.1)) which is always finite for positive t. Replacing p sp (i) with λ exp(−λt)/Λ t and taking t → 0, we define, the Hill numbers, q D ν for our framework as When ν({0}) > 0 and 0 ≤ q ≤ 1, q D ν = ∞. When Λ is finite, Diversity q D ν is non-increasing with respect to q. Unlike the classical Hill numbers, q D ν can be less than one. For instance, from (3.3), 0 D ν = E(D) which can be any positive value. It can be proved that for any positive constant γ, q D ν = γ( q D ν/γ ), an analogue of the replication principle for Hill numbers. contributes one to one of the counts N k (t), where k follows a Poisson distribution, Poisson(λt). By the splitting property of Poisson processes [29], their total contribution to N k (t) follows Poisson( (λt) k (k!) −1 e −λtν (dλ)) distribution independent across k. The zero-rate species only increase N 1 (t) by a Poisson(ν({0})t) random variate. Therefore, for k ≥ 1, .

Frequency of Frequencies
By the thinning property of Poisson processes, is the intensity measure of the rate for the species represented k times in [0, t 0 ]. We can interpret (k + 1)E(N k+1 (t 0 ))/t 0 as the expected total rate for all species represented k times in [0, t 0 ]. As pointed out in Section 2 and from (3.4), Λ = E(S(t 0 ))/t 0 is the expected total rate for all species. Conditional on the event that we will observe an individual at a given future time t 1 > t 0 , the probability that this individual belongs to a species represented k times in [0, t 0 ] is equal to (formal proof is given in Appendix D) which corresponds to the renowned Good-Turing frequency estimator in Good [21]. It is worth pointing out that the above equation holds for individual observed at a given future time rather than the next observed individual. From the well-known relation between independent Poisson random variables and multinomial distribution, model (1.1) is valid under the framework with Formal proof is given in Appendix E. If E(D) is finite, lim t→∞ p k (t) = 0 for any fixed k. The joint probability mass function ofÑ (t 0 ) is In terms of the expected FoF, the log-likelihood function is (3.6) In terms of p(t 0 ) and E(N + (t 0 )), it is log(L(p(t 0 ), E(N + (t 0 )) |ñ(t 0 ))) = −E(N + (t 0 )) + n + (t 0 ) log(E(N + (t 0 ))) + ∞ k=1 n k (t 0 ) log(p k (t 0 )).
If the unknown vector p(t 0 ) and the quantity E(N + (t 0 )) are unrelated, the above log-likelihood function implies that the maximum likelihood estimator (MLE) of p(t 0 ) is the conditional maximum likelihood estimator (conditional on the observed n + (t 0 )) for the multinomial distribution in (1.1). The MLE of E(N + (t 0 )) is n + (t 0 ).

Expected Species Accumulation Curve
Denote the expected (empirical) SAC (ESAC) as ψ(t) = E(N + (t)). Condition (2.1) guarantees that ψ(t) is finite for any finite t. For a real-valued function g(t), let g (m) (t) stand for the m-order derivative of function g(t). Clearly ψ(0) = 0. From (3.2), (4.1) Note that ψ (1) (0) = ν(dλ) = Λ. From (3.1) and (4.1), Analogous expression for (4.2) appears in Béguinot [2] as an approximate formula under the multinomial model for fixed total number of observed individuals for the species-sample-size curve with the derivative operator replaced by the difference operator. Before studying the link between ESAC and SAD, two mathematical terms The relation between absolutely monotone function and probability generating function is well known (see for example Strook [45]).
c. An MPPP is uniquely determined by its ψ(t), or (h t0 (s), ψ(t 0 )) for the probability generating function h t0 (s) of p(t 0 ) for a fixed positive t 0 . d. For any MPPP, and t > 0, we have Proof of Proposition 1 is given in Appendix F. Equation (4.3) is the population version of (1.4).
From (3.6), the log-likelihood function can be re-expressed as a function of ψ(t).
It can be shown that the Taylor expansion of ψ(t) at t 0 converges to ψ(t) when 0 ≤ t < 2t 0 : It signifies that Good-Toulmin estimator in (1.3) performs satisfactorily in shortterm extrapolation when t 0 < t < 2t 0 . We deduce from (4.5) the following unbiased estimator for different order of derivative of ψ(t) We use (4.6) only for 0 ≤ t ≤ t 0 1 because outside this interval,ψ (j) (t) may not have the correct sign (−1) j+1 .
Estimator in (4.6) is useful. For example, a concave downward curve when we plot 1/ψ (1) From (4.2), a parallel result of Equation (4.6) is the following relation among expected FoFs

Linear First Derivative Ratio Family
MPPP is nonparametric in nature. It is defined by an intensity measure, an ESAC or an SAD. When parametric approach is preferred, we restrict our interest to a family of distributions in MPPP, say by putting constraints on the SAD. A way to portray an SAD is to delineate its probability ratio, , which we call the jth derivative ratio. From (4.1) and the Cauchy-Schwarz inequality, for j = 1, 2, . . . and It deduces that jth derivative ratio is always a nonnegative nondecreasing function of t. Among all derivative ratios, the first derivative ratio is most important be- )/t, the expected total rate for unseen species at time t. The following proposition gives a sufficient condition for it.
Proposition 2: A sufficient condition for a function ξ(t) on [0, ∞) to be the first derivative ratio for an MPPP is that (i) ξ(t) is a Bernstein function, and We prove Proposition 2 in Appendix G. Hereafter we use to denote the first derivative ratio. The simplest nontrivial Bernstein function is the positive linear function on [0, ∞). It suggests the following fundamental family of distributions in MPPP.
Definition: (Linear First Derivative Ratio Family) An MPPP is said to belong to the linear first derivative ratio family (denoted as LDR 1 ) if its first derivative ratio ξ(t) takes a linear form Family LDR 1 encompasses some common SADs and ESACs. We list the characteristics of all models in LDR 1 in Table 1. When b = 0, c must be larger than 1 (see condition (ii) in Proposition 2), otherwise from the second row of Table 1, lim b↓0 ψ(t) = ∞ for finite t. In Table 1, equality of transformed ψ (1) (t) for some models is presented. Such equalities can be used to produce diagnostic check for specified SAD in LDR 1 when ψ (1) (t) is replaced by its estimator in (4.6). LDR 1 has three parameters a, b and c. Parameters b and c determine the SAD, and a = ψ (1) (1) is a scale parameter. As pointed out at the end of Section 3, the MLE of b and c is equivalent to the conditional MLE (conditional on the observed n + (t 0 )) of the multinomial model in (1.1). The MLE of a is chosen to makeÊ(N + (t 0 )) equal to n + (t 0 ).
From Table 1, we can find E(N k (t)) and ψ (k) (t) using the relations E(N k (t)) = p k (t)ψ(t), and . It can be shown that for LDR 1 , ν({0}) = 0, and The intensity measureν takes the form as a gamma distribution with extended shape parameter 1/c − 1 for nonnegative c. Therefore, p(t) is the Engen's extended negative binomial distribution [19] with support {1, 2, . . .}. The parameter c determines the shape of the gamma distribution.
ESAC (Negative exponential law): The Hill number of order q for LDR 1 is An ESAC is called following a power law, if ψ(t) ∝ t τ for 0 < τ ≤ 1. From Table 1, the SAD at time t for a power law has the form where 0 ≤ β < 1 (β is 1/c in Table 1), and (a) i = a(a + 1) . . . (a + i − 1) for i ≥ 1 and (a) 0 = 1 is the rising factorial (this distribution appears in Zhou et al. [51]). This SAD distribution does not depend on t. We call this discrete distribution, the power species abundance distribution (PSAD). If X follows a PSAD with parameter β, then X − 1 follows the generalized hypergeometric distribution, 2 F 1 (β, 1; 2; 1) distribution [28,27].
Proposition 3: Under the MPPP, power law is the only ESAC which has SAD independent on t. Furthermore, if an SAD under MPPP has a proper limiting distribution when t approaches infinity, then the limiting distribution must be a PSAD.
The proof of Proposition 3 is given in Appendix H. Three distributions in LDR 1 are exceptional. They stand for three boundary situations. The first one is the zero-truncated Poisson distribution (c = 0). It models the extreme case when all species have equal abundance. In the approach that assumes finite D, it is a common reference distribution, say in interpreting q D ν and deriving nonparametric estimator of D.
The second one is the log-series distribution (c = 1). It is where E(D) jumps from finite value to infinite value in LDR 1 . Therefore, if we are interested only in finite D models, it is a boundary case. Because of this property, it is used in this paper as a reference distribution in graphical check for the finiteness of E(D) (refer to the end of Section 4, and the checking of slope 1 in the D1/D2 plot introduced later).
The last one is the power law (b = 0). It is the only possible limiting distribution of SAD in MPPP as t approaches ∞. All SADs in LDR 1 with c > 1 converge to it when t increases without bound. It is also the only distribution in LDR 1 that has E(S(t)) = ∞ for any t > 0.
We extend the linear first derivative ratio family to linear jth derivative ratio family, which we denote as is a linear function of t. We prove in Appendix I that LDR 2 = LDR 3 = . . ., and LDR 2 is simply a mixture of zero-rate species and LDR 1 (i.e., theν of LDR 2 satisfies (5.1), but ν({0}) can be positive).

Diagnostic Plots
An advantage of LDR 1 is that it has a simple diagnostic plot: (4.6). If the curve in the plot is almost linear, LDR 1 is an appropriate model. Approximate intercept and slope of the curve can be used as initial estimate of b and c in finding the MLE. We call the plot, D1/D2 plot, and the curve forξ(t) in the plot, the D1/D2 curve.
Similarly, to investigate how well LDR 2 fits a data, we can plot the function −ψ (2) (4.6). We call the plot, D2/D3 plot, and the curve in it, the D2/D3 curve.
As N 1 (t 0 ), N 2 (t 0 ), . . . are independent and Poisson distributed, by the delta method, we can approximate V ar(ξ(t)) by We useξ(t)±1.96 V ar(ξ(t)) as an approximate 95% pointwise confidence band forξ(t) in the D1/D2 plot. Similar confidence band can be constructed in D2/D3 plot (see Appendix J). The diagnostic checks in Table 1 suggest graphs to examine the fitness of zero-truncated Poisson distribution, geometric distribution, log-series distribution, and power law to the FoF. Graphical check for the leading three distributions exist in the statistical literature. See for example Hoaglin and Tukey [26]. Our plots are new additions from a totally new point of view. The plot focuses onψ (1) (t), the estimated expected total rate of unseen species at time t. The graphical check detects discrepancy between the estimated function and its expected pattern for t ∈ [0, t 0 ] when a specified distribution is assumed. Since the plotted y-value in the diagnostic plot has the form g(ψ (1) (t)), an approximate pointwise confidence band for the curve can be obtained using the approximation V ar(g(ψ (1) Currently a standard diagnostic plot for power law is the log-log plot which plots log(ψ(t)) against log(t) for 0 ≤ t ≤ t 0 . As d log(ψ(t))/d log(t) = p 1 (t), loglog plot detects whether p 1 (t) is a constant function. As suggested by the diagnostic check of power law in Table 1, we can plot log(ψ (1) (t)) against log(t) for 0 ≤ t ≤ t 0 . We call it log(D)-log plot. Log(D)-log plot checks whether . Log(D)log plot is more sensitive to discrepancies with the power law because p 1 (t) changes very slowly with respect to t for many SADs. It is well-known in speciesarea relationship studies that the curve in log-log plot is approximately linear for various dissimilar SADs [39,40,36,33].
In Figure 2, we consider four LDR 1 distributions for which the diagnostic graphs are designed. The parameter (b, c) are (1.5, 0) (zero-truncated Poisson distribution), (1, 0.5) (geometric distribution), (0.5, 1) (log-series distribution) and (0, 1.5) (power law). Parameter a is chosen so that ψ(5) = 200. We draw the ESAC in panel (a) and the SAD at t = 5 in panel (b). We can discriminate the power law (the black curve in the panel (a)) from other distributions as its ψ (1) (0) = ∞ (ψ (1) (0) = Λ is the expected total rate). Other than this, we learn little about the parameters b and c from visual inspection of the plots in panels (a) and (b). We need special plots to discriminate different parameter values. In D1/D2 plot in panel (c), the four distributions correspond to four straight lines with different slope c. The graphical checks in panels (d)-(g) are designed for each special distribution so that when FoF follows that distribution, the curve in the plot is a straight line, which is drawn as a heavy line in Figure 2. Therefore, how straight the curve is can be used to assess how well the distribution fits the data.

Species Richness via Extrapolation
Assuming the whole curve ξ(t) to be linear may be too ambitious. If species richness is our main concern, our aim is to estimate E(N 0 (t 0 )). It is convenient to concentrate only on the rare species. We classify species into two groups: rare species and abundant species. Assume that (B1) all species represented 0, 1, 2, and 3 times in time interval [0, t 0 ] are rare, and (B2) rare species follows LDR 1 distribution. 3 Observations n 1 (t 0 ), n 2 (t 0 ) and n 3 (t 0 ) are data from this model truncated at both ends. We do not know how many rare species are represented four times or more in [0, t 0 ] because the observed n j (t 0 ) for j ≥ 4 may include abundant species. It can be shown that for LDR 1 , when j = 1, 2, . . . and is independent on t 0 . When j = 1, we have The information about the rare species from N 1 (t 0 ), N 2 (t 0 ) and N 3 (t 0 ) is barely enough to make c estimable. When N 2 (t 0 ) > 0, a plug-in estimator of c iŝ Estimatorĉ * needs modification to take into account two inequality constraints: ) is finite, and (7.1) remains true when j = 0. We have Using estimatorĉ F and (7.2), we have the following estimator of E(N 0 (t 0 )).
3) From (B1), all unseen species are rare. EstimatorÊ(N 0 (t 0 )) depends only on rare species data N 1 (t 0 ), N 2 (t 0 ), and N 3 (t 0 ). Whenĉ F = 0,Ê(N 0 (t 0 )) reduces to Chao1 estimator [7] of E(N 0 (t 0 )), which is N 2 Chao1 estimator [7] is a lower bound estimator of D, and works well when the rare species have equal abundance, which corresponds to the LDR 1 model with c = 0. Our estimatorÊ * (D) is an extension of Chao1 estimator in the sense that we assume the rare species follow LDR 1 model, and estimate the parameter c using the FoF of rare species. It reduces to Chao1 estimator when c F = 0.
A better way to understand the estimatorÊ * (D) is to relate it to an extrapolation ofξ(t) because we can assess its suitability in the D1/D2 plot. Abundant species usually appear early. Species that show up late are likely rare. The rear part ofξ(t) depends mainly on rare species. Extrapolation is a technique to extend our knowledge about rare species to the unseen species. (B1) and (B2) imply that ξ(t) is approximately linear when t ≥ t 0 . 4 Consider two simple linear extrapolation methods in D1/D2 plot. The zeroth-order extrapolation useŝ is the first-order extrapolation ofξ(t) whenĉ F =ĉ * . We call (7.4) a modified first-order extrapolation. It is always true that If we estimate ψ(t 0 ) by its MLE N + (t 0 ) and ψ (1) (t 0 ) = E(N 1 (t 0 ))/t 0 by its plug-in estimator N 1 (t 0 )/t 0 , then we can estimate ψ(t) for all t > t 0 using (7.5) once an extrapolation rule for ξ(t) is chosen. Species richness is just the limiting value of this ψ(t). If zeroth-order extrapolation is used, the estimator of E(D) is Chao1 estimator. If modified first-order extrapolation rule is used, the estimator of E(D) isÊ * (D). As ξ(t) is nondecreasing, zeroth-order extrapolation is a lower bound of all reasonable extrapolations. It explains why Chao1 estimator estimates a lower bound of E(D). All ξ(t)'s that fulfill the sufficient condition in Proposition 2 are asymptotically linear because its derivative is nonnegative and non-increasing and thus must have a finite limit. Furthermore, for concave ξ(t), the modified first-order extrapolation is probably the first-order extrapolation. Therefore, under the sufficient condition in Proposition 2, modified first-order extrapolation is an adequate extrapolation curve when t 0 is large, andÊ * (D) should give a plausible estimate. Equation (7.5) is useful. If ξ(t) ≥ β + t for a β > 0 when t ≥ t * ≥ 0, then Similarly, if ξ(t) < β + γt for a β > 0 and 0 ≤ γ < 1 when t ≥ t * ≥ 0, then where the last expression has finite limit when t increases to infinity. These two facts can be used to detect the finiteness of E(D) in D1/D2 plot. If we can judge from theξ(t) in the D1/D2 plot that there are positive values β and t * such thatξ(t) ≥ β + t whenever t ≥ t * , then E(D) is likely infinite. On the other hand, ifξ(t) ≤ β + γt for a 0 ≤ γ < 1, it is reasonable to believe that E(D) is finite. To assist judging whether the rear part ofξ(t) has slope larger than or smaller than 1, it is helpful to add grid lines with slope 1 in the D1/D2 plot as demonstrated in Figure 3. We can assess the adequacy of Chao1 estimator through investigating how well N 1 (t 0 ), N 2 (t 0 ) and N 3 (t 0 ) fit a truncated Poisson distribution by testing If the null hypothesis is not rejected, Chao1 estimator gives reasonable estimate of E(D), otherwise it only estimates a lower bound of E(D). An unbiased estimator of 3E( The approximate p-value of the test is P (N (0, 1) ≥ T / V ar(T ) ), where N (0, 1) stands for standard normal distribution. We reject H 0 at significance level α when the p-value is less than α.

Rational First Derivative Ratio Family
When the curve in the D1/D2 plot is not linear, but concave, it is natural to model ξ(t) as a rational function. Quotient of two linear functions is too restrictive because it is in general asymptotically flat. It not only implies that E(D) is usually finite, but also that the rare species are homogeneously abundant. Therefore, we consider a ratio of a quadratic polynomial and a linear function. This form of ξ(t) is asymptotically linear with flexible slope. After simple manipulation, we obtain the following expression for ξ(t).
Definition: (Rational First Derivative Ratio Family) A first derivative ratio, ξ(t) belongs to the rational first derivative ratio family (denoted as RDR 1 ) if where b 2 and c 1 are positive parameters, and b 1 and c 2 are nonnegative parameters. For uniqueness we assume b 1 < b 2 . If b 1 = 0, we require c 1 < 1. This additional restriction is to ensure that ψ(t) is finite for all finite t.
To calculate the log-likelihood function using (4.4), we need to compute ψ(t 0 ) and ψ (k) (t 0 ) for all k such that n k (t 0 ) > 0. The following equality which can be proved by mathematical induction for k ≥ 1 is helpful, To use it, choose a value for t 0 . Given a = ψ (1) (t 0 ) which is a scale parameter, and the values of the parameters b 1 , b 2 , c 1 and c 2 , we can find ψ (2) (t 0 ) = −ψ (1) (t 0 )/ξ(t 0 ). Then apply the above recurrence relation for t = t 0 to compute all necessary ψ (k) (t 0 ) in (4.4). The ESAC is Thus E(D) < ∞ if and only if c 1 + c 2 > 1. It can be proved that ν({0}) = 0. We give the expression for q D ν in Appendix L.

Examples
Four real FoF data are presented in Table 2. Nonparametric analysis of the data can be found in Böhning and Schön [3], Lijoi et al. [31], Wang [49], Chee and Wang [12], Norris and Pollock [37], and Chiu and Chao [13]. In this section, we fit the data using the parametric models in Sections 5 and 8, demonstrate the use of various diagnostic plots, and compare different diversity estimates. Without loss of generality, we set t 0 = 1. The significance level of the tests is fixed to 5%.
The first data is the swine feces data which appeared and was analyzed in Chiu and Chao [13]. It is for the pooled contig spectra from seven non-medicated swine feces. The large n 1 (t 0 ) relative to other frequencies is viewed as a signal for sequencing errors. Chiu and Chao [13] proposed a nonparametric estimate of the singleton count basing on the other counts, and the difference of this estimate and the observed singleton count is interpreted as outcome of missequencing. An implicit assumption of this approach is that sequencing errors inflate only the Table 2 Four real FoF data and the MLE for the selected model using AIC singleton count, and all other frequency counts are unaffected. It is equivalent to claim that there are sequencing errors which create solely zero-rate species.
To investigate whether zero-rate species really exist, we draw the D1/D2 and D2/D3 plots with 95% pointwise confidence bands in panels (a) and (b) respectively in Figure 3. The approximate linear curve in both plots indicates that both LDR 1 and LDR 2 are reasonable models. The heavy dashed lines in panels (a) and (b) are the lines fitted by MLE under LDR 1 . The fitted line is close to the curve in D1/D2 plot, but is not so in the D2/D3 plot. As the dashed line lies inside the confidence bands in panel (b), the disagreement between the singleton count and other counts is not strong enough to reject that they come from the fitted LDR 1 . The Pearson's chi-square test statistic after grouping all cells with expected frequency less than 5 is 4.325 with 4 degrees of freedom. The estimated E(N 1 (t 0 )) under this LDR 1 is 8027.6 which is marginally larger than n 1 (t 0 ). AsÊ(N 1 (t 0 )) > n 1 (t 0 ), the MLE of ν({0}) under LDR 2 should be zero. There is no significant evidence for the existence of zero-rate species. The heavy green curve in panel (a) is the fitted curve under RDR 1 . The heavy green curve is very close to the D1/D2 curve. RDR 1 performs better than LDR 1 in terms of Akaike information criterion (AIC) (the AIC for LDR 1 is -136060.6, and that for RDR 1 is -136061). 5 In panel (a) blue grid lines with slope 1 are drawn. From Section 7, the slope of the rear part ofξ(t) larger (or smaller) than 1 is an indication that E(D) is infinite (or finite). Apparently the slope ofξ(t) has slope always larger than 1. We are confident that E(D) is infinite. The modified first-order extrapolation line forÊ * (D) (the purple line) and the zeroth-order extrapolation line for Chao1 estimator (the orange line) are drawn. Asĉ F > 1,Ê * (D) = ∞. The orange extrapolation line does not look good. Chao1 estimator does not perform satisfactorily.
The second data come from 9461 accident insurance policies issued by an insurance company. It was used in Böhning and Schön [3], Wang [49], and Chee and Wang [12]. The species corresponds to the policies, and the frequency count to the number of claims during a particular year. Theξ(t) in the D1/D2 plot for the accident data in panel (c) is concave but not far from linear. The heavy dashed line is the fitted line under LDR 1 . The p-value for the Pearson chi-square test for LDR 1 is 0.0979. The MLE of c is 1.1146 which is larger than 1. Therefore, the MLE of E(D) is infinite. Compared with the blue grid lines with slope 1, the average slope of the D1/D2 curve is close to one showing uncertainty about the finiteness of E(D). The heavy green curve in panel (c) is the fitted curve under RDR 1 . It is preferred to LDR 1 because it has smaller AIC (the AIC for LDR 1 and RDR 1 are -18691.95 and -18692.15 respectively). The MLE of E(D) under RDR 1 is 6354. It is less than the true value D = 9461, and is comparable to the nonparametric estimates presented in Chee and Wang [12] which ranges from 4016 to 7374. The purple line is the modified first-order extrapolation used byÊ * (D). As the slope of the purple line is less than one,Ê * (D) is finite. For this data,ĉ F = 0.4525 andÊ * (D) = 8249.2 which is closer to the true value. The orange extrapolation line for Chao1 estimator is also drawn. The difference of the two extrapolation lines is not small. The third data come from a CDNA library of the expressed sequence tags of tomato flowers. It was studied in Böhning and Schön [3] and Lijoi et al. [31]. The D1/D2 plot in panel (d) shows that LDR 1 model does not fit the data well (the p-value of the Pearson chi-square test is 0.0059). The curve looks like a Bernstein function. An RDR 1 model is fitted and the fitted curve is shown in the plot by a heavy green curve. The model fits the data well (Pearson chisquare statistic after grouping all cells with expected frequency less than 5 is 1.470 with 2 degrees of freedom). Asĉ 1 +ĉ 2 = 0.767 < 1, the estimated E(D) is infinite. This estimate of E(D) agrees with the finding when we compare the slope of the rear part of the curve with one under the assistance of the blue grid lines. The modified first-order and zeroth-order extrapolation lines are drawn in purple and orange respectively. The difference of the slopes of the lines is not small.
The fourth data is the bird abundance data for the Wisconsin route of the North American Breeding Bird Survey for 1995. Totally 645 birds from 72 species are recorded. The data was studied in Norris and Pollock [37] where a mixture of five Poisson models was fitted, and the estimated D is 76. Comparing with the blue grid lines, the slope of the D1/D2 curve is less than one. Species richness should be finite. Theξ(t) in the D1/D2 plot does not look like a straight line. However, the LDR 1 model is not rejected as the p-value of the Pearson's chi-square test is 0.116. The estimated E(D) is 124.50. We also fit a RDR 1 model to the data. The fitted line for this model is shown in the figure as a heavy green curve. The MLE of E(D) under RDR 1 is 80.7. LDR 1 is preferred to RDR 1 as its AIC is -25.63 which is less than that for RDR 1 which is -24.53. RDR 1 does not fit the rear part ofξ(t) well because of an abrupt change of slope around t = 0.5. After t = 0.6,ξ(t) looks quite linear. It suggests thatÊ * (D) may be a good estimate. For this data,Ê * (D) = 77.90, which is close to the Chao1 estimate 77.04.
We investigate the performance of various special graphical checks in Figure 4. If the curve in the plot is close to a straight line, we assume that the corresponding distribution fits the data. The red dashed curves are the 95% approximate pointwise confidence band. The five columns correspond to five graphical checks: namely the plot for zero-truncated Poisson distribution (check c = 0), for geometric distribution (check c = 0.5), for log-series distribution (check c = 1), for power law (check b = 0; it is the log(D)-log plot), and the log-log plot for power law. The first four rows of plots are for the four real FoF data. The last row is for data simulated from the specified distribution which the diagnostic plot is designed to check. The curves in the last row are close to a straight line as expected. The log-log plot in the last column does not perform satisfactorily. The first three graphs shows almost perfect line. Comparatively, the log(D)-log plot correctly declare discrepancy of power law to the four real data. The graphical check for log-series distribution correctly detects that acci-    Table 1. They check c = 0 (zero-truncated Poisson distribution), c = 0.5 (geometric distribution), c = 1 (log-series distribution) and b = 0 (log(D)-log plot for power law). The last column is the log-log plot used to check the power law. The two red dashed curves are the 95% approximate pointwise confidence band. There are five rows. The first four rows are for the real data: swine feces data, accident data, tomato flowers data and bird abundance data respectively. The last row is for data simulated from the  dent data follow closely to a log-series distribution (the MLE of c under LDR 1 is 1.1146). The curves for swine feces data and the tomato flowers data are concave. It implies that E(D) is infinite as pointed out at the end of Section 4. The plot for geometric distribution and zero-truncated Poisson plot look reasonable except for the accident data. Consider estimation of Hill numbers. In Table 3, we compare the model-based estimator developed in the paper, and Chao-Jost estimator proposed in Chao and Jost [10]. For our model-based estimator, 95% confidence intervals are constructed using parametric bootstrap method. Details are given in Appendix K. The difference between our estimators and Chao-Jost estimator is mild when q = 2. In other situations, the difference can be large. Sometimes the two confidence intervals do not overlap. The width of the 95% confidence interval for our parametric estimator is larger than that of the corresponding interval for Chao-Jost estimator. The interval estimate for Chao-Jost estimator is obtained basing on the assumption that D is finite and all unseen species have equal abundance. For the accident data, the true D is contained in the model-based 95% confidence interval, but not in that for the Chao-Jost estimator.
Consider estimation of species richness. In Table 4, we compareÊ * (D), Chao1 estimator, iChao1 estimator in Chiu et al. [14], and the abundance-based coverage estimator (ACE) in Chao and Lee [9] and Chao et al. [11]. We construct the 95% confidence intervals for E(D) basing onÊ * (D) using parametric bootstrap method. Details are given in Appendix K. EstimatorÊ * (D) usually gives larger estimate. For the bird abundance data, all estimators give similar point estimate. For the accident data,Ê * (D) gives the best estimate of the true D. For the remaining two data, the differences are huge. Our estimate is infinite, while other estimates are finite as assumed.
The difference betweenÊ * (D) and the other three methods can well be explained by the test on whether the observed N 1 (t 0 ), N 2 (t 0 ) and N 3 (t 0 ) fit a truncated Poisson distribution. The p-value of the test at the end of Section 7 for the bird abundance data is 0.374. For this data, all estimates are close to each others. The p-value for accident data is 0.040. The difference is significant, but not huge. The p-value of the other two data are smaller than 6×10 −6 implying that the deviation is huge. The 95% confidence interval for swine feces data is [∞, ∞], a much stronger signal when compared to the confidence interval in Table 3. ForÊ * (D), only the 95% confidence interval for bird abundance data has finite width. It has the lower endpoint close to the lower endpoint of other confidence intervals, but it has much larger upper endpoint.
EstimatorÊ * (D) gives wider confidence interval than that for Chao1 estimator because it uses a modified first-order extrapolation that requires the estimation of slope, whereas zeroth-order extrapolation does not have this requirement. Nevertheless, first-order extrapolation looks more rational when compared to zeroth-order extrapolation which can only estimate a lower bound.

ρ-appearance Design
Usually we collect all available information within the survey period. Under our framework, all useful information is in FoF. If labor saving is our concern, we may neglect some minor information, say halt recording a species as soon as its observed frequency reaches a fixed positive integer ρ in the study period [0, t 0 ]. In this case, the observation period for each species varies. The period is short for abundant species, and long for rare species. We call this design, the ρ-appearance design. This design places more emphasis on rare species than abundant species which is in line with the common understanding that information about the rare species is critical when our interest is in D. In bird survey, species can be identified by distant sightings or short bursts of song. Stop recording abundant species early helps the researcher concentrating more on the rare species. When ρ = ∞, we obtainñ(t 0 ). When ρ = 1, we record only the first appearance-time of each seen species. It is exactly the information available in the empirical SAC.
(10.1) A numerical advantage of (10.1) is that it does not require a general expression of ψ (k) (t) for all k ≥ 1. A small simulation experiment in Appendix N shows that the loss in information of ρ-appearance design when compared to the standard design is marked when ρ = 1, and minor when ρ = 4.
Given a parametric form of ψ(t), a traditional approach is to fit it to the empirical SAC by linear or non-linear least-squares method. Two differences between the MLE approach and the curve-fitting method are noteworthy. First, the MLE of ψ(t 0 ) is equal to n + (t 0 ) whenever ψ(t) has a free scale parameter, and it is not the case in the curve-fitting approach. Second, the MLE method fits the density function to the 1-appearance times, while the curve-fitting method fits a function to the empirical SAC. The curve-fitting methods do not take the interdependence among the points in the empirical SAC into consideration. Since such interdependence is present in species-time curves and Type I speciesarea curves [43], the curve-fitting approach is theoretically flawed. Although improvements can be made in curve-fitting approach through transformation and/or adding weights, maximum likelihood approach is preferred for its proven effectiveness under the assumption that the model is correct.
In certain situations, only the values of the empirical SAC at a finite set of times are available. For example, only the cumulative number of species observed after day 1, day 2, and so on are recorded. Suppose the observed N + ( i ) is n + ( i ) for i = 1, . . . , m with 0 = 0 < 1 < . . . < m = t 0 . The log-likelihood function is In the simulation study in Appendix O, MLE has smaller root mean squared relative error in extrapolation when compared to the curve-fitting method.
The distribution function of the 1-appearance times in time interval [0, t 0 ] is ψ(t)/ψ(t 0 ). This fact holds in general for any nondecreasing ψ(t) with ψ(0) = 0, including sigmoid function such as the cumulative Weibull function. If we assume that the 1-appearance times are independent, we can perform maximum likelihood inference conditional on n + (t 0 ) when a parametric form of ψ(t) is given. Full maximum likelihood calculation is possible when further assumption on the distribution of N + (t 0 ) is made. As the empirical SAC is proportional to the empirical distribution function of the 1-appearance times, statistical tools for empirical distribution function can be used. For example, we can apply the Dvoretzky-Kiefer-Wolfowitz inequality to construct confidence bands for the distribution function ψ(t)/ψ(t 0 ).

Discussion
A general framework, MPPP, is proposed for species abundance data. This framework has the following novel features: (1) it delineates the relation between two analytic tools -SAD and SAC, and an MPPP can be characterized by either one of them; (2) it characterizes the class of possible ESACs to be the class of Bernstein functions, giving solid theoretical support to the empirical observation that SACs are nondecreasing concave functions; (3) it allows the existence of zero-rate species; (4) it contains the LDR 1 model, a parametric model where the first derivative ratio of the ESAC is linear, which admits several graphical checks, and two generalizations, namely LDR 2 which includes zero-rate species, and RDR 1 model where the first derivative ratio is a rational function; (5) it admits a new species richness estimatorÊ * (D); (6) it admits a natural generalization of Hill numbers as measures of species diversity, which are well-defined even when the number of species is random and infinite; and (7) it allows inference to be performed when only the first ρ appearance times of a species is recorded, which we refer as ρ-appearance design.
Compared to the conventional curve-fitting approach to SAC, our framework has a more solid theoretical underpinning, as an MPPP is uniquely characterized by an ESAC that is a Bernstein function. In the study of species-area curve, power law is popular. Nevertheless, under our model, power law is an extreme case. Its E(D) and E(S(t)) for any positive t are infinite, and its PSAD has a heavy right tail. Even though curve fitting can give reasonable estimates, the MPPP framework provides a parametric generative model in which maximum likelihood estimates can be obtained. Matthews and Whittaker [34] suggests using maximum likelihood methods instead of least-squares approaches in SAD fitting. We would like to extend their advice to ESAC fitting.
One possible future direction is to apply MPPP in a nonparametric setting, which lessens the need for model selection. Another future direction is to understand the tradeoff that arises in the ρ-appearance design. A small ρ can reduce the sampling effort, at the expense of less accurate estimates of parameters. Investigating this tradeoff in a theoretic and empirical setting can provide guidance to the design.
Proposition A: Fix t ≥ 0, and assume Λ < ∞. Conditional on the event that there is at least one individual in the time interval [t, t + ] for > 0, then the probability that there is exactly one individual in [t, t + ] tends to 1, and the rate of its species converges in distribution to the distribution ν/Λ as → 0.
Proof. Let A be the event that there is at least one individual in the time interval [t, t + ], and B be the event that there is exactly one individual in the time interval [t, t + ] for > 0. The probability that a species with rate λ is observed during [t, t+ ] is 1−exp(−λ ). By the thinning property, the rates of the positiverate species observed during [t, t + ] form a Poisson process with intensity measure (1 − exp(−λ ))ν(dλ). Considering also the zero-rate species, the rates of the species observed during [t, t + ] form a Poisson process with intensity measure (1 − exp(−λ ))λ −1 ν(dλ) (where (1 − exp(−λ ))λ −1 = when λ = 0). Event A is that this Poisson process has at least one point.
Similarly, the rates of the species observed exactly one time during [t, t + ] form a Poisson process with intensity measure Event B is that this Poisson process has exactly one point. Thus which is equal to Λ(1 + o(1)) when → 0 because Λ is finite. Therefore, As probability cannot be larger than 1, it implies that P (B | A) → 1 as → 0. Given that exactly one individual appears in [t, t + ], let X be the rate of the species that the individual belongs and F be its distribution function. For any positive λ 0 , let C λ0 be the event that X is less than or equal to λ 0 . Then It follows that Therefore, X converges in distribution to the distribution ν/Λ as → 0.
For the case Λ = ∞, as → 0, the rate of the species observed in [t, t + ] diverges to infinity.
Proposition B: Fix t ≥ 0, and assume Λ = ∞. Conditional on the event that there is at least one individual observed in the time interval [t, t + ] for > 0, then the minimum rate among observed species converges in probability to ∞ as → 0.
Proof. Let A be the event that there is at least one individual in the time interval [t, t + ]. As in the case Λ < ∞ in Proposition A, for any fixed λ 0 ≥ 0, we have P (at least one individuals in [t, t + ] with rate ≤ λ 0 )

Appendix C: Equivalent definition of the MPPP
In the definition of the MPPP in the paper, the individuals of the zero-rate species and the individuals of the positive-rate species are generated separately.
Here we present an alternative definition where all individuals are generated in a unified manner.
Definition: (Equivalent definition of MPPP) An MPPP G is characterized by a nonzero species intensity measure ν, which is a measure over R ≥0 satisfying ∞ 0 min{1, λ −1 }ν(dλ) < ∞. Defineν to be a measure over R 2 ≥0 bẙ for any measurable set A ⊆ R 2 ≥0 . Generate (λ 1 , t 1 ), (λ 2 , t 2 ), . . . (a finite or countably infinite sequence) according to a Poisson process with intensity measureν. For each simulated (λ i , t i ), we generate a realization η i (independently across i) of a Poisson process with rate λ i , conditioned on the event that the first point is at time t i . (That is, it contains the point t i together with a Poisson process starting at time t i . If λ i = 0, then η i contains only one point t i .) Each η i stores the appearance times of one species with rate λ i . Finally, we take G = {η 1 , η 2 , . . .}.
This definition models the species that will eventually be observed in a study. The first appearance time for each of such species (i.e., the first point of each η i ) is explicitly included in the definition ofν.
The last expression holds also when Λ = ∞ (i.e., the above probability is zero) because from Proposition B in Appendix A, the rate of the future individual approaches infinity and thus that species should have been seen infinite number of times in [0, t 0 ].

Appendix H: Proof of Proposition 3
If an SAD is time invariant, then its probability generating function, h t (s) does not depend on t. From (4.3), there is a positive function g such that ψ(ty) = ψ(t)g(y) for y ∈ [0, ∞). Take logarithm on both sides, take derivative with respect to t, and then set t = 1. We have The solution of the above differential equation is ψ(y) = ψ(1)y ψ (1) (1)/ψ (1) . As ψ(t) is a Bernstein function, 0 < ψ (1) (1)/ψ(1) ≤ 1. Hence power law is the only law with p(t) independent on t.

For any positive integer
Clearly φ(t) ∈ LDR 2 but not in LDR 1 . Therefore, LDR 1 = LDR 2 . It can be shown that every element in LDR 2 has the form αt + ψ(t) for a ψ(t) ∈ LDR 1 . It means that LDR 2 is a mixture of zero-rate species and LDR 1 .
Appendix J: Pointwise Confidence Band for D2/D3 plot Similar to the confidence band for D1/D2 plot, an approximate 95% pointwise

Appendix K: Parametric Bootstrap Confidence Interval for Hill Numbers
We haveθ, an estimator of θ which is E(N 0 (t 0 )) or a Hill number of order q. We want to construct a 100(1 − α)% confidence interval for θ. Let B be the total number of bootstrap samples such that α(B + 1)/2 is an integer. In this paper, we use α = 0.05 and B = 2999. Asθ can be infinite, we avoid using methods that require arithmetic onθ, such as the basic method. The procedure to construct a parametric bootstrap confidence interval is as follows: For i = 1 to B do { Generate a bootstrap sample from a parametric model. Computeθ, which we denote asθ i basing on the bootstrap sample. } Sortθ 1 ,θ 2 , . . . ,θ B in ascending order. Denote the sortedθ 1 ,θ 2 , . . . ,θ B asθ (1) ≤ θ (2) ≤ . . . ≤θ (B) .
Appendix M: Proof of the log-likelihood function for ρ-appearance design Let Y i be the length of the observation period for species i, and J i be the observed frequency of species i in its observation period. For species i, we observe For a species with rate λ, the probability function of J is Therefore, the log-likelihood function is log(L(ψ | {y i , j i } i )) = −ψ(t 0 )+ ρ−1 j=1 n j (t 0 ) log(| ψ (j) (t 0 ) |)+ yi<t0 log(| ψ (ρ) (y i ) |).
Appendix N: Simulation experiment on the loss of information of the ρ-appearance design Let us reconsider the bird abundance data for the Wisconsin route of the North American Breeding Bird Survey for 1995 in Section 9. We choose ν(dλ) = γλf (λ | µ, σ)dλ where f (λ | µ, σ) is the density function of Lognormal(µ, σ 2 ) distribution. The parameter γ is the expected total number of species. From Section 3, the MLE of (µ, σ) is identical to the conditional maximum likelihood estimate of the corresponding Poisson-lognormal model [5]. The fitted lognormal mixing distribution is Lognormal(μ,σ 2 ) distribution withμ = 1.23 andσ = 1.30.
Let ω i (µ, σ) = P (Y = i) where Y is a Poisson-lognormal random variable with parameters µ and σ. We use the function "dpoilog" in R-package "poilog" [23] to compute this probability. The estimated γ isγ = n + (t 0 )/(1 − ω 0 (μ,σ 2 )) = 85.2. Without loss of generality, set t 0 = 1. We use this data to investigate the information loss of the ρ-appearance design. The ρ-appearance data are simulated from the data using the following procedure: Simulation procedure: Suppose species i has observed frequency m i in time [0, 1]. If m i < ρ, our data for this species is m i , the frequency of it in time [0, 1]. If m i ≥ ρ, we simulate the ρ-appearance time of the species, r i from Beta(ρ, m i + 1 − ρ) distribution, which is the distribution of the ρ order statistic of m i samples from the U (0, 1) distribution.
The mean of the estimate is close to that basing onñ(1). The standard deviation of the estimator decreases as ρ increases. From the simulation results, the standard deviation of the estimators when ρ = 1 is considerably worse than those when ρ = 2. Value ρ = 4 performs well for this data. The total number of species with frequency less than 4 is 33, around 46% of the seen species.
O. Simulation comparison of MLE method and curve-fitting method in extrapolation when finite number of points in the empirical SAC are available Without loss of generality, we set t 0 = 1. Our data is {n + (0.1), n + (0.2), . . . , n + (1)}. We consider three distributions: power law, log-series distribution and geometric distribution. The curve-fitting methods for the distributions are described below: 0* * When ρ = ∞, we always observe the full data, and the sample standard deviation (sd) of the estimator across simulation is zero.   (a) Power law: ψ(t) = τ t 1−1/c . For curve fitting, we regress log(n + (t)) on log(t).
The parameter τ = ψ(1) is the expected number of recorded species at time t 0 = 1. In the experiment, τ can take value 200 and 1000. The distribution parameter can take 6 values. For power law, the value of c can be 1.25, 1.5, 2, 3, 4 and 5. For log-series distribution, the value of b can be 0.01, 0.02, 0.03, 0.05, 0.1, and 0.2. For geometric distribution, the value of b can be 0.05, 0.1, 0.2, 0.4, 0.6 and 0.8. For each combination of parameters, we simulate 5000 data. The MLE and the curve-fitting estimate of ψ(2) and ψ(4) are found for each simulated data. We evaluate the performance of an estimator by the root mean squared relative error (RMSRE) (for estimatorθ for θ, RMSRE = n i=1 ((θ i − θ)/θ) 2 /n). The results of the simulation are presented in Tables 6, 7 and 8. The RMSRE for MLE is smaller than that for the curve-fitting method in this simulation study. Table 7 Simulation results for log-series distribution in extrapolation to t for fixed ψ(1) given ten equally spaced observations of SAC in time interval [0, 1]