Noncausal counting processes: A queuing perspective

: We introduce noncausal counting processes, deﬁned by time- reversing an INAR(1) process, a non-INAR(1) Markov aﬃne counting process, or a random coeﬃcient INAR(1) [RCINAR(1)] process. The noncausal processes are shown to be generically time irreversible and their calendar time dynamic properties are unreplicable by existing causal models. In par- ticular, they allow for locally bubble-like explosion, while at the same time preserving stationarity. Many of these processes have also closed form calen- dar time conditional predictive distribution, and allow for a simple queuing interpretation, similar as their causal counterparts.


Introduction
Recently, real-valued linear noncausal processes have raised much interest thanks to their ability to capture irreversible 1 , bubble-like phenomena, that are widely observed in financial applications [see e.g. [26,20]]. However, these models are not suitable for (low) counting processes. To motivate the need for introducing "bubbles" for counting processes, we display in Figure 1 the hourly amount of rainfall, measured by a rain gauge as multiples of mm in an Indian city. 2 To understand and predict (micro-)floods, such high-frequency, high-resolution rainfall data is much more useful than traditional ones, which are often either too temporally (daily or monthly) aggregated, or spatially aggregated across various sites are hence real-valued. We see that the above hourly rainfall amount curve features cycles of "bubbles", or cloudbursts, characterized by a period of steady increase of the rainfall intensity analogous to the accumulation phase of a financial bubble, and then followed by a sharp decrease, which spells the end of the cloudburst, that is the analog of the burst of the bubble. In the aforementioned literature on real-valued time series, it has been shown that unlike standard ARMA type processes, a new class of noncausal processes can capture the bubble phenomenon very well. A similar problem exists for count time series, since the path of a standard, say, INAR(1) process [see equation (1) below for its definition] usually only features occasional abrupt positive jump followed by steady decrease, which is opposite to the phenomenon observed in Figure 1. The aim of our paper is to introduce the notion of noncausality to the counting process literature. To fix the idea, let us first consider Markov, causal processes of the type: where latent counts Z i,t+1 's are independent of X t = {X t , X t−1 , ...}, whereas the shocks t+1 's are i.i.d., independent of the X t and Z i,t+1 's. The decomposition (1) leads to different dynamics depending on the distributions selected for the Z i,t+1 's and for t+1 : a) If the Z i,t+1 's are i.i.d. Bernoulli variables, then process (X t ) is integerautoregressive of order 1 [INAR (1)], [see [30], [1]]. An INAR(1) process can be viewed as the decomposition of the total number of customers X t+1 into the sum of new arrivals t+1 and staying old customers t+1 [see [33,29,38]]. The INAR(1) is the most famous counting process model and a special attention will be paid in the paper to this family, as well as its queuing interpretation. b) The counts Z i,t+1 's can also be i.i.d. only, but non binary. For instance, in [35,18], the Z i,t+1 's have geometric distribution and t+1 is negative binomial, whereas in the INARCH(1) model [ [41]], the Z i,t+1 's are Poisson 3 . These processes are also called Galton-Watson process in the probability lit-erature [see e.g. [25]], where the term ( Xt i=1 Z i,t+1 ) can be interpreted as the number of descendants from the X t existing customers, who can independently give rise to 0, 1, 2... descendants. The major appeal of the INAR(1) model a) and its extension b) is the special form of the conditional probability generating function (p.g.f.) of X t+1 given X t , which characterizes the transition distribution. Indeed, it is conveniently given by: which is an exponential affine function of the conditioning variable X t . In particular (X t ) is a Markov process with respect to the filtration X t . Such Markov processes are called affine (or compound autoregressive) [see [9]]. In the counting process context, [28] shows that they have tractable marginal and predictive distributions at any horizons. c) Alternatively, one can also extend the INAR(1) model a) by relaxing the i.i.d. assumption of the Z i,t+1 's and allow them to be only conditionally i.i.d. given a stochastic probability parameter p t+1 , which itself is an i.i.d. sequence that is independent of t+1 , as well as of past observations X t . This model is called random coefficient INAR(1), or RCINAR (1) [see e.g. [44]]. This model has a similar queuing interpretation as the INAR(1) process, but with stochastic probability of leaving for current customers.
Equation (1) defines a dynamic factor model with a random number of factors t+1 , Z 1,t+1 ,..., and Z Xt,t+1 for date t. Since at each date t + 1, this number, i.e. X t +1, is strictly larger than the number of observables equal to 1, neither shocks t+1 , nor variables Z i,t+1 , can be deterministically recovered (except for dates with X t = 0). Nevertheless, even if the underlying factors are not recoverable, their distributions can be identified from the observation of (X t ) only. Model (1) is called causal, since X t+1 does not depend on the values of the future shocks t+2 , t+3 , ..., or future latent factors Z i,t+2 , Z i,t+3 , i varying. The noncausal counting process introduced in this paper can be defined simply as a Markov process whose reverse time dynamics satisfies (1). Our main contributions are the following ones. First, we characterize all time reversible counting processes satisfying model (1). From a practical modeling perspective, these examples are to be discarded since they cannot capture the (time) asymmetry of the bubble dynamics. Second, we show that when process (1) is time irreversible, its noncausal version is indeed suitable for capturing bubbles. In particular, when an affine process [including INAR (1)] is time reversed, the resulting noncausal process has necessarily a non-affine dynamics. We then derive in closed form this conditional p.m.f., and through various examples we demonstrate that it often features multiple modes, with one near zero (corresponding to the burst of the bubble) and another one away from zero (corresponding to the steady increase of the bubble). Thirdly, we show that for noncausal INAR(1), an alternative, equivalent specification is based on the aforementioned queuing interpretation, but with a different set of distributional assumptions. This new queuing model is specified through the departure cohorts, rather than the arrival cohorts. Under this noncausal specification, the number of arrivals are typically heteroscedastic across time and dependent on the current population size.
The rest of the paper is organized as follows. Section 2 studies time-reversed Markov counting process, including their time (ir-)reversibility, as well as the calendar time predictive distribution. Section 3 relates the causal and noncausal INAR(1) model through a same, infinite server queuing system. Section 4 concludes. Proofs, as well as some results for noncausal RCINAR(1) processes are gathered in Appendices.

Time reversing a Markov counting process
Let us now analyze irreversible noncausal processes and their calendar time dynamics. This provides new families of Markov dynamics for counting process, that are easy to analyze under closed-form, and are appropriate to capture bubble phenomena. Definition 1. The counting process (X t ) is noncausal affine, if it has the representation:

Time reversibility
The noncausal process (3) can be interpreted as a counting process (1) observed in reverse time. Let us derive some general properties of such processes. First, we note that the Markov and stationarity properties are usually preserved under time reversal. Lemma 1 (See Section 1 of [6]). If process (X t ) is stationary and Markov in calendar time, then it is also stationary and Markov in reverse time.
Thus the noncausal INAR(1) process is also Markov. But what is the form of its calendar time conditional distribution of X t+1 given X t ? For instance, if the reverse time dynamics is affine (including INAR(1)), could the calendar time dynamics be also affine, that is, can process (X t ) have both a representation of type (1) and another representation of type (3)? This is equivalent to the time reversibility of the process by the following lemma: Lemma 2 (See Proposition 3 of [19]). If the noncausal affine process (3) is also affine in calendar time, and is weakly ergodic 4 in both time directions, then it is time reversible.
Let us now characterize the set of all affine time reversible processes. This result is important, since, once we exclude these reversible examples, other noncausal affine process will have non-affine calendar time dynamics. • or an affine process in which Z i,t 's and t have p.g.f.: respectively, where parameters δ > 0, θ 1 > 0, θ 2 > −1, and satisfy 0 < θ 1 − θ 2 < 1. This process has the NB stationary distribution with p.g.f.: Proof. See Appendix A.1.
The second reversible process above is introduced by [3]. The first equality in equation (4) defines a count distribution when parameter θ 2 is either positive, or between −1, 0. If it is positive, then this distribution is a zero-inflated geometric distribution since: where probability parameter p is given by p = 1 − θ2 θ1 . Moreover, in the special case where θ 2 = 0 and θ 1 ∈ [0, 1[, this process becomes the NBAR(1) process of [18], with geometric Z i,t 's. By [18], Prop. 2, the h−step-ahead conditional . Thus a temporally aggregated NBAR(1) process, that is (Y t ) = (X ht ) where process (X t ) is NBAR(1), belongs also to affine family (4).
Let us now show that the irreversible, noncausal dynamics defined by (10) has quite unique properties that distinguish them from the standard causal models.

Calendar time dynamics of a noncausal INAR(1)
From now on we will focus on noncausal INAR(1) processes for which ( t ) is not Poisson, i.e., (X t ) is irreversible. Since in practice X t+1 is only observed after X t , the noncausal representation (3) is not directly usable for the forecasting of X t+1 given observation of X t . Let us derive the corresponding calendar time one-step-ahead conditional p.m.f. P (x t+1 |x t ) := P(X t+1 = x t+1 |X t = x t ). By the Bayes formula, we have: where P (x t ), P (x t+1 ) denote the stationary p.m.f. of the process (X t ) evaluated at values x t and x t+1 , respectively, and P (x t |x t+1 ) : is the one-step-backward conditional p.m.f. By simple convolution we have: Thus, so long as the p.m.f.'s of both (˜ t ) and (X t ) have closed form, so does the predictive distribution P (x t+1 |x t ). Moreover, this result can be extended in two important ways. First, we can also deduce close form expression for the predictive distribution P (x t+h |x t ), for any horizon h larger than 1. Indeed, the Bayes' formula can also be used on the joint distribution of (X t , X t+h ), and the h-step-backward (resp. forward) predictive distribution of a noncausal (resp. causal) INAR(1) process is known to allow for closed form expressions [see [28]]. Second, instead of focusing on INAR(1) process, we can extend this result to affine Markov counting processes, whose transition distribution and stationary distribution are also generically available in closed form [see [28]].
Let us now provide two examples of noncausal, irreversible INAR(1) processes.

Example 1: Noncausal geometric INAR(1)
We first consider the noncausal analog of geometric INAR(1) process [see [31]]. This process has marginally the geometric distribution with p.g.f.: E[u Xt ] = 1 1+β(1−u) , with mean β ∈]0, ∞[, or alternatively probability parameter 1 1+β , which is not necessarily equal to p. Then the p.g.f. of˜ t is given by: This is a zero-inflated geometric distribution with p.m.f.:  The reverse time conditional p.m.f. is: Thus the predictive p.m.f. has closed form: In both panels, the two conditional p.m.f.'s coincide at argument x = i. This is a direct consequence of the Bayes formula (6) and of the stationarity of the process. The noncausal conditional p.m.f. assigns more weights immediately right to i, as well as towards zero. In other words, under the noncausal model, process (X t ) has a larger probability of growing larger and larger, that is the expansion of the bubble. On the other hand it has also a larger probability of hitting zero, which corresponds to the collapse of the bubble. Figure 4 displays the corresponding conditional expectations in calendar time (E[X t+1 |X t = i]) and reverse time (E[X t |X t+1 = i]), as a function of i.  In reverse time, the conditional expectation E[X t |X t+1 = i] = pi + E[ t ] is affine in i, but this affine property no longer holds for the causal expectation E[X t+1 |X t = i]. This is expected, since process (X t ) is irreversible. We also remark that the difference between the two conditional p.m.f.'s is much more important than that between the two conditional expectations, suggesting the limit of the conditional expectation in summarizing the nonlinear dynamics.

Example 2: Noncausal discrete stable INAR(1) [DS-INAR(1)]
Let us now consider another noncausal INAR(1) model, in which the distribution of (˜ t ) is discrete stable DS(β, ν) [see e.g. [39,7]] 5 , with p.g.f.: where scale parameter β > 0, and shape parameter ν ∈]0, 1[. 6 The corresponding p.m.f. is: This distribution is such that the mean E[˜ t ] is infinite [see [7]], and thus the resulting model is suitable for heavy-tailed counting process data. Figure 5 plots a trajectory of a noncausal DS-INAR(1) process with ν = 0.5, β = 0.05, and p = 0.5. Let us first derive the stationary distribution of this process. We have: By iterating we get: which corresponds to the DS( β 1−p ν , ν) distribution. Thus by equation (8), the marginal p.m.f. is: Let us now plot the analog of Figure 3. Using equations (6), (8) and (9), we can compute the conditional p.m.f. P (x t+1 |X t = i). Since a discrete stable distribution is heavy-tailed, we will consider two large values for the conditioning observation i = 10, and 25. These conditional p.m.f.'s are plotted in Figure 6. The comparison between the causal and noncausal conditional p.m.f.'s is quite similar as that for Figure 3. In particular, they have two and one local modes, respectively. Indeed, for a large value of X t , the causal conditional distribution assigns a negligible probability to the conditional probability P[X t+1 > X t ], whereas, under the noncausal model, this probability is quite significant.

Other examples
Besides the two examples given in Section 2.2, in general the stationary p.m.f. of a (causal or noncausal) affine process has no known parametric form. However, under mild assumptions, this p.m.f. remains computable through inexpensive, closed form matrix operations [see [28]]. In this subsection, we briefly review and illustrate this method using two further examples of noncausal INAR(1) processes with respectively negative binomial and Poisson-Inverse Gaussian distributed shocks.

The stationary distribution
Let us first show under what conditions a general noncausal INAR(1) process has a simple p.g.f.. We have: where h is the log p.g.f. of˜ t . Thus, if h has a simple Taylor's expansion everywhere 7 in [0, 1], we can expand each term h(p i u + 1 − p i ) around u = 0 and obtain the Taylor's expansion of log E[u Xt ]: where the coefficients c j , j = 0, ..., n can be computed in closed form and we choose an order n such that the probability of X t being larger than n is negligible. Then we have: Although the above algorithm involves a Taylor's expansion, it is an exact formula, as it is obtained by matching two Taylor's expansions.
Finally, although equations (10) and (11) are derived for INAR(1) processes, the case of general affine counting processes with non-binary Z i,t+1 's is similar [see [28] for more details].
Let us now give the expression of (c i ) ∞ i=0 for two additional examples of (causal or noncausal) INAR(1) processes.

Negative binomial (NB) shock
Let us assume that˜ t is NB distributed, with p.g.f.: where parameter p 0 ∈]0, 1[. Then the p.g.f. of the stationary distribution of the process is: Thus we have: ∀k ≥ 1.

Poisson-inverse Gaussian (PIG) shock
The negative binomial distribution can be viewed as a mixture of Poisson distributions with a gamma mixing distribution. [43] introduced an alternative, mixed Poisson distribution with inverse Gaussian mixing distribution. The resulting distribution is called PIG 8 and is better suited for heavy-tailed count distributions. The causal version of the PIG INAR(1) process has been introduced by [4]. Below we plot a simulated trajectory of the noncausal PIG INAR(1) process, in which the inverse Gaussian distribution has mean 0.5 and dispersion 10. The p.g.f. of the shock distribution is: Hence the p.g.f. of the stationary distribution is given by: where the binomial coefficient k

Extremal behavior
Let us now show that the dynamics of a noncausal INAR(1) process can be totally different from its causal counterpart, especially for large current values of X t . For illustration purpose, we re-focus on the noncausal DS-INAR(1) process introduced in Section 2.2.2.
Since the discrete stable distribution has an infinite mean, the noncausal conditional expectation E[X t |X t+1 ] does not exist. This might explain why the causal, DS-INAR(1) process has never been proposed in the literature. Interestingly, the calendar time dynamics of a noncausal DS-INAR(1) has thin tail as shown in the next proposition.

Proposition 3.
For fixed X t , the conditional p.m.f. is asymptotically equal to: , when x t+h goes to infinity, and the conditional moment E[X φ t+h |X t ] is finite for any power φ > 0 and any horizon h.
The finiteness of E[X t+h |X t ] does not contradict the non existence of E[X t ]. Roughly speaking, the causal prediction E[X t+h |X t ] is finite for any fixed horizon h, but increases to infinity when h goes to infinity. Figure 8 plots E[X t+1 |X t = i] as a function of i. The values are computed numerically from those of the conditional p.m.f., by truncating the identity: at a large, finite order. On the contrary to Figure 4, we have not plotted the corresponding reverse time conditional expectation, as this latter is infinite. The conditional expectation seems to increase nearly linearly in i, with a slope that is larger than 1. This impression is confirmed by Proposition 4, which is the discrete analog of a result for (real-valued) noncausal linear alpha-stable AR(1) processes [see [6,20]], for which the convergence in equation (13) is simply replaced by an equality.

Proposition 4. For the noncausal DS-INAR(1) process, the conditional expectation is such that:
when i goes to infinity.
Thus, for large values of X t , the conditional expectation is linear with an "asymptotic" autoregressive coefficient equal to p ν−1 , which is larger than 1, since ν, p ∈ [0, 1[. In particular this ratio is different from the autoregressive coefficient p in reverse time, which is constant and smaller than 1. Therefore, conditional on X t being large, X t+1 is in average even larger than X t . Such a locally explosive feature cannot be replicated by a standard INAR(1) process, since in model (1) the autocorrelation coefficient is p < 1 and hence the corresponding limit of E[Xt+1|Xt=i] i is p < 1. As shown in Figure 6, the conditional distribution of X t+1 given X t has two (local) modes, one near zero and the other larger than the current value X t . A further investigation reveals that we have the following result: Proposition 5. For a given > 0, when X t goes to infinity, Thus, when the process is currently in the bubble episode with X t large, the conditional distribution of the future scaled value Xt+1 Xt given X t is approximately binary. With a fixed probability p ν , the bubble continues to grow at the geometric rate of 1 p , whereas with fixed probability 1 − p ν , the bubble collapses. This also echoes Proposition 4, which says that the conditional ex- Proposition 5 is linked to the recent literature on (linear) noncausal AR(1) processes, which is the real-valued analog of the noncausal DS-INAR(1) [ [15], Prop. 2.2]. This similarity is due to the fact that the joint distribution of (X t , X t+1 ) is within the domain of attraction of a certain bivariate alpha-stable vector (Y t , Y t+1 ), where process (Y t ) is a noncausal alpha-stable linear AR(1) process. Thus the joint distribution of (X t , X t+1 ) is asymptotically "close to" that of (Y t , Y t+1 ), for which the result of [15] applies.
The presence of several local modes also implies that standard point forecasting tools for time series counts, based on the conditional expectation and the (single) conditional mode, only, [see [14]] can be misleading. Hence the importance of the closed form conditional p.m.f. derived in this paper. Whereas the marginal distribution of a (causal or noncausal) INAR(1) process is necessarily uni-modal [see [39]], the conditional distribution of a noncausal INAR(1) process is instead often multi-modal. This is the reason for bubble detection by bivariate pattern recognition developed in [17].

The queuing interpretation
Besides its tractability for estimation and (nonlinear) prediction purposes, another advantage of the causal INAR(1) process is its intuitive, queuing interpretation. In this section, we show that a duality between arrival and departure events leads to a similar queuing interpretation for the noncausal INAR(1) process.

A Lexis diagram
We temporarily leave the laws of count variables unspecified, and focus on the deterministic relationships between these variables. We do not assume model (1) in this section.
Imagine a bar with an infinite capacity. 9 It has existed since the infinite past and will stay indefinitely open. Time is discrete and Z−valued. This double direction is essential later on when we study the reverse time property of the queuing system. In particular, the date t = 0 is not the origin of the bar, which is open since the infinite past. Moreover, the choice of this time origin is not important, as we will focus on stationary queuing systems.
When a customer arrives at date t, she is immediately served and counts as a customer. In other words, we have an infinite server queue. At the earliest she can leave the bar at date t + 1, and she will no longer be a customer at the date of departure. Hence, the duration spent by each customer at the bar (or service time) is at least equal to one. We are interested in the counts of customers with various arrival and/or departure dates, as well as the total number of customers at each date. To this end, for each couple (s, t) ∈ Z 2 such that s < t, we denote by η s,t the number of customers arriving at date s and leaving at date t. This definition prioritizes neither the two dates s, t, nor the two time directions. Indeed, for a given s, the sequence (η s,t ) t is indexed in calendar time t, which increases from s + 1 to +∞; on the other hand, given t, the sequence (η s,t ) s is indexed in reverse time, s decreasing from t − 1 to −∞. Let us now report these count variables in the following Lexis diagram [see e.g. [10]]: This is an infinite, upper triangular matrix, in which variable η s,t appears on the s−th row and the t-th column. 10 In particular, only terms strictly above the diagonal s = t can be non zero due to the constraint s < t.

The arrival and departure cohorts
The doubly indexed sequence (η s,t ) will now serve as the building block of other count sequences allowing to follow the movements of customers by either arrival, or departure cohort.
The arrival cohorts. For a given date s and each posterior date t ≥ s, we denote by s (t) the number of customers arriving at date s and staying at least until date t, with the convention that s (s) = s is the number of customers arriving at date s regardless of their departure date. Here index s is prioritized, whereas integer t appears merely as an argument of the s−indexed process. We call index s the arrival cohort of these individuals, and s the (initial) size of this cohort. The sequence s (t), where t ≥ s, is nonincreasing in t, and we have: Thus this sequence converges to a nonnegative integer limit, and η s,t is necessarily zero for large t. Hence, in each row of the Lexis diagram (14), there are at most a finite number of positive terms. Throughout the paper we assume that: Assumption 1. For each given cohort s, sequence s (t) is equal to zero for sufficiently large t.
In other words, there are no indefinitely staying customers. Then equation (15) can be equivalently rewritten into: that is, those arriving at date s and staying until date t will ultimately leave at one of the following dates: t + 1, t + 2, .... In particular, taking s = t, the above equality yields: Equations (16) and (17) are easy to interpret using the Lexis diagram (14). Equation (17) says that variables on the s−th row sum up to s . Moreover, if we truncate this summation and keep only terms on the right hand side (RHS) of η s,t in the Lexis diagram, excluding η s,t , we get s (t) by equation (16).
The departure cohorts. Let us now consider the departure time t of the customers. More precisely, for a given date t and each s ≤ t, we denote bỹ t (s) the number of customers leaving at date t and arriving before date s. This sequence is nondecreasing in s, and we have: Each column of the Lexis diagram contains also a finite number of non zero terms and similarly as Assumption 1, we assume that: Assumption 2. For each given t, sequence˜ s (t) goes to 0, when s goes back to −∞.
In other words, no customers have been in the bar since the infinite past. Thus, for small s, both˜ t (s) (and hence also η s (t)) are equal to 0 and we have: In terms of the Lexis diagram,˜ t (s) is obtained by summing all the terms above η s,t . In particular, when s = t, we get: In the following we will denote by˜ s the RHS of the above equation, which is the total number of customers leaving at date t, or the size of the departure cohort t. It is equal to the sum of all elements on the t−th column of the Lexis diagram.
An arrival/departure duality.
Let us now relate definitions (18), (19), (20) to equations (15), (16), (17). Instead of observing customers' movements in calendar time (CT), alternatively we can observe them in reverse time (RT), that is by looking at the time-reversed video. 11 In other words, for any t ∈ Z, the picture of the bar taken at CT date t is observed at RT date −t. Therefore: • When a customer arrives at CT date s ∈ Z, s is also the earliest CT date when she is at the bar. When the video is reversed, −s becomes her last RT date at the bar, hence she leaves the bar at RT date −s + 1; • Similarly, when the customer leaves at CT date t ∈ Z, the last CT date when she is at the bar is t − 1; when the time is reversed, −t + 1 becomes her first RT date at the bar. In other words she arrives at the bar at RT date −t + 1.
To summarize, we have the following duality table: Reversing the time direction is equivalent to interchanging the arrivals and departures, taking the opposite, then shifting by +1 all the dates. As a consequence, counts defined in Section 2.2.2, which concern the different CT departure cohorts, correspond to RT arrival cohorts. This explains the similarity between equations (18), (19), (20) on the one hand, and equations (15), (16), (17) on the other hand.

The current customer counts
It remains to count the current number of customers X t . We have: according to their arrival cohort t, t − 1, t − 2.... In the Lexis diagram, t is the sum of all the entries in the t-th row that are right to η t,t+1 (inclusively); t−1 (t) is the sum of all the entries in the (t−1)-th row that are right to η t−1,t+1 (inclusively), and so on. Thus we have: where the summation is with respect to both arrival time s with s ≤ t, and departure time τ with τ > t. In other words X t is the sum of all the entries in the northeast of η t,t+1 :  Similarly, by counting the current customers according to their departure dates, we get: Linking the departure and arrival population sizes There exists also a relationship between the sizes of arrival cohorts ( t ), those of departure cohorts (˜ t ), as well as (X t ):˜ that is, at date t+1, the counts of leaving customers,˜ t+1 , plus those currently in the bar, X t+1 , is equal to the previous customer count X t plus the new arrivals t+1 .

C. Gouriéroux and Y. Lu
Summary. The four count sequences (η s,t ) s,t , ( s (t)) s,t , (˜ t (s)) s,t , (X t ) t introduced so far are linked as follows: • There is a one-to-one relationship between (η s,t ) s,t and ( s (t)) s,t [see eq. (15), (16)]. • There is a one-to-one relationship between (η s,t ) s,t and (˜ t (s)) s,t [see eq.
In particular, as in model (1), it is impossible to recover the other underlying latent counts from the knowledge of (X t ) t alone.

The Markov property
Since model (1) is Markov, we will focus on stochastic specifications for the latent ( s (t)) under which the resulting process (X t ) defined by equation (21) is Markov, that is such that: where the symbol P (·) refers to the conditional probability mass function (p.m.f.), that is the conditional density with respect to the counting measure on N. Let us define • (t − 1) as the information set of population sizes of various arrival cohorts at date t − 1: The information set of all realized arrival counts up to time t − 1 is: These information sets are ordered by: Therefore, we have:

Lemma 3. A sufficient condition for (X t ) to be Markov is that:
Intuitively, this lemma says that, if increasing the information set from X t−1 to • (t − 1) does not improve the prediction of X t , then increasing to an intermediate information set X t−1 should not provide extra useful information either. Let us now look for conditions for (26) to hold. Since by (21), X t is the sum of s (t), s varying, it will be convenient to assume: Assumption 3. For two different arrival cohorts s 1 = s 2 , the processes of population sizes of different cohorts ( s1 (s 1 + τ )) τ ≥0 and ( s2 (s 2 + τ )) τ ≥0 are i.i.d.
Under this assumption, s (t) is independent of, and has the same distribution as, s+1 (t + 1). Next, we also assume that: Assumption 4. Within each cohort s, the sequence of sizes ( s (t)) t is itself Markov, that is, In other words the future size of an arrival cohort s at time t depends only on its current size, but not on the previous ones. We use the double subscript s, t in P s,t (·| · · · ), since by definition the conditional p.m.f. of s (t) given s (t − 1) has the support {0, 1, ..., s (t − 1)}, and is thus time-inhomogeneous. Moreover, by Assumption 3, the measurable function P s,t (·| s (t − 1)) depends on s, t and s (t − 1) only through s (t − 1) and the time lag t − s.

Lemma 4. If the queuing system is such that both Assumptions 3, 4 and equation (26) are satisfied, then there exists p ∈]0, 1[ such that for any s < t and integer n, the conditional distribution of s (t) given s (t − 1) = n is binomial B(n, p).
Proof. See Appendix A.6.

Queuing interpretation of the INAR(1) process
The following proposition is a direct consequence of Lemma 4 above. It says that INAR(1) process can be indeed embedded into the above Markov queuing system. (1) has the disaggregate queuing representation (21), that is:

Proposition 6 (Arrival cohort-disaggregated representation). The solution to the causal INAR(1) model
where the joint distribution of ( s (t)) is as follows:

For a given cohort s, sequence ( s (t)) is Markov and the conditional distribution of s (t + 1) given s (t) = n is B(n, p) with a fixed parameter p ∈]0, 1[. That is,
where Z i,s,t+1 are i.i.d. Bernoulli B(1, p) and are independent of s (t), s (t− 1), · · · 2. The sequence ( s ) is i.i.d., and sequences ( s (t)) t≥s are independent for different s.
where Z i,j,t are mutually independent when i, j, t vary, Bernoulli distributed with parameter p i ∈ [0, 1], and are independent of X t−1 , X t−2 , .... The main difference between representations (22) and (29) is that, in the former case, although each component t−i (t) has the same distribution as t−i j=1 Z i,j,t , these terms are dependent within the same cohort, that is when both t and i vary while keeping t − i constant.
The RCINAR(1) process has a similar queuing interpretation as the INAR(1) process, except that Assumption 3 is replaced by the assumption that the population counts s (t), s varying, are conditionally independent given ( s (t − 1)) s and a time-varying factor p t ∈]0, 1[. This is provided in Online Appendix.

Queuing interpretation of the noncausal INAR(1) process
The queuing interpretation of the noncausal INAR(1) is obtained by using the arrival-depature duality. (23), that is:

Proposition 7. The noncausal INAR(1) process has the disaggregate representation
where the joint distribution of˜ t (s), t ≥ s is defined as follows: 1. For a given t, the sequence (˜ t (s)) s≤t−1 is such that the terminal value˜ t (t) is equal to˜ t , whereas previous values are defined backward and recursively by:˜ where Bernoulli B(1, p), and are independent of t (s). 2. The departure counts˜ s , s ∈ Z, are i.i.d., and are independent of Z * i,s,t . Let us consider a statistician observing the reverse time video: she first observes the total number of departures˜ t of the entire departure cohort, then tries to "trace back" the time of arrival of these customers. Given that a customer is in the bar at a certain date s < t, she thinks that there is a probability of p (resp. 1 − p) that the customer was (resp. was not) already at the bar at the previous date.
This proposition is the noncausal analog of Proposition 6, and equation (23) can be viewed as the MA(∞) representation of the noncausal process, that is the analog of (21). However, the two models differ fundamentally in terms of observability. Indeed, the variables on the RHS of (23) involve a departure date that is posterior to t; thus they are fundamentally unobservable at the present date t, even if all the movements of the queuing system are observed up to present.
The causal dynamics of the queuing system for a noncausal INAR(1) process In Section 2.2, we have derived general formulas to study the causal dynamics for a noncausal INAR(1) process. Let us now look at the queuing interpretation of this causal dynamics.
In the noncausal (resp. causal) INAR(1) process, the departure (resp. arrival) process (˜ t ) [resp. ( t )] is i.i.d. Then the arrival (resp. departure) process is defined through equation (24). What properties possesses the arrival process? Can they also be i.i.d.? The following proposition says that this is generally not the case:

Proposition 8 (The dynamics of the arrival cohort size). In the noncausal INAR(1) process (1), the arrival process ( t ) is also i.i.d., if and only if˜ t is Poisson P(λ) distributed for some positive constant λ. In this case, t is also Poisson P(λ) distributed. Similarly, in the causal INAR(1) process (1), the departure process (˜ t ) is also i.i.d., if and only if t (then also˜ t ) is Poisson P(λ) distributed for λ > 0.
Proof. See Appendix A.8.
This result completes Thm 1 in [36], which focuses on the causal INAR(1), and says that (X t ) is truly time reversible if and only if variable ( t ) is Poisson P(λ) distributed. 12 The "if" part of the proposition is easily illustrated using the Lexis diagram: if t is P(λ) distributed, then by the property of the Poisson distribution, variables η t (t+1), η t (t+2),... in equation (17) are independent and Poisson distributed with parameters pλ, p 2 λ,..., respectively. Thus˜ t defined in (20) is still Poisson as the sum of independent Poisson variables.
Note that in the continuous time queuing literature, models with non i.i.d. arrival sequence have already been considered by various authors [see e.g. [12]]. Our model is fundamentally different from theirs since none of these previous models specify the queuing system in reverse time.
Besides the existence of serial correlation, what else can be said about the arrival counting process ( t ) and the departure decisions of existing customers of a noncausal, non-Poissonian INAR(1) process? Let us write: where process (X t ) is noncausal INAR(1) and process ( t ) is defined by (24). This representation is analogous to representation (1), where X t+1 is decom-posed into the sum of the number of newcomers and the number of old customers who stay. By equation (24), X t+1 − t+1 is equal to X t −˜ t+1 , which is nonnegative and non larger than X t and is interpreted as the number of current customers who choose to stay at time t. Proposition 9 below summarizes the major differences between the calendar time dynamics of a noncausal INAR(1) and the causal INAR(1) process: Proposition 9. For a noncausal INAR(1) process defined in Definition 1, the following statements are equivalent: iii) The number of current customers who stay X t+1 − t+1 is binomial given X t ; iv) The number of arrivals t+1 is independent of X t ; v) X t+1 − t+1 and t+1 are independent given X t .
Thus on the contrary to a causal INAR(1) process, for which conditions iii), iv) and v) in Proposition 9 are all satisfied, in a non-Poisson, noncausal INAR(1) model, the calendar time dynamics of (X t ) is non thinning-based, the arrival size process ( t+1 ) is state-dependent 13 and is conditionally dependent of the number of staying, old customers, i.e. X t+1 − t+1 . In other words, this result is the queuing version of Lemma 2.
The distribution of the arrival count Proposition 9 says that in general, the arrival cohort size ( t ) cannot have Poisson stationary distribution. Let us now compute its p.g.f.: (by equation (24)) (by equation (3))  (31) becomes: Hence ( t ) is geometric distributed and we have: 13 That is, t+1 depends on the previous customer count Xt. (31) becomes: Hence the distribution of ( t ) is DS(β (1−p) ν 1−p ν , ν). Since 14 (1−p) ν 1−p ν ≥ 1, the stationary distribution of t has a larger scale parameter than the distribution of t .

Conclusion
In this paper we have introduced the concept of noncausality for counting processes by time reversing a INAR(1), a Markov affine process, or a RCINAR(1). These noncausal processes are very often associated with a new, noncausal queuing interpretation. We have also seen that the resulting noncausal processes, which is generically time irreversible, can feature bubble-like calendar time dynamics characterized by some unique bi-modal conditional distribution. This approach is completely different from the current literature 15 , which often focuses on the autocovariance of the process. This latter linear measure of serial dependence should be used with care, especially since a noncausal process and its causal counterpart share the same autocovariance function 16 , but can have completely different nonlinear dynamics.
Our theoretical analysis of Markov noncausal processes is only the first, but a necessary step towards the empirical implementation of count models with noncausal features. While the parameter estimation of a noncausal INAR(1) process is rather straightforward (indeed, it suffices to estimate a standard causal INAR(1) model in reverse time), there are numerous other important issues that await further research. For instance, given that noncausal processes are different from their causal counterparts only when it is irreversible, can we design (possibly non-parametric) tests of reversibility on count time series data, before any meaning estimation is done? See e.g. [8] for similar tests applied to realvalued process. In a similar spirit, what is the consequence of mis-specifying the causality of the process, that is, estimating a causal INAR(1) process when the real data generating process is noncausal? See e.g. [21] for methods to distinguish between causal and noncausal models for real-valued processes. Finally, a more flexible, possible extension concerns mixed causal-noncausal INAR(1) models. Recently, a mixed causal-noncausal autoregressive process has been introduced by [16], as a natural extension of both the noncausal AR(1) model of [20] and the standard causal AR(1) model. In the context of count processes, such an extension would include the causal and noncausal INAR(1) as special cases and allow the bubble to burst gradually instead of abruptly, at a rate that is potentially different from its rate of explosion.
• Or δ 2 < 0, in which case we get the negative binomial distribution. Then, using the similar calculus as in [9], we get the second reversible affine counting process mentioned in the Proposition.
Thus the updating step involves at most the computation of n coefficients.

A.3. Proof of Proposition 3
Let us first prove the proposition for h = 2. By [7], the tail of the discrete stable distribution satisfies: when k goes to infinity. On the other hand, for a given X t and for large x t+1 such that x t+1 > x t , the noncausal conditional p.m.f. is: where P˜ is the p.m.f. of˜ . For fixed x t , the term P˜ (x t −i) is uniformly bounded for varying i, whereas the term xt+1 i is bounded by: Thus the RHS of equation (a.36) is bounded by a polynomial of x t+1 of degree xt! , when x t+1 goes to infinity. Thus by the Bayes' formula: Finally, let us explain why the above result remains true for higher horizons h. For instance for h = 2, we have, by iterated expectation formula: In other words, the joint distribution of (X t , X t+2 ) has the same parametric form as that of (X t , X t+1 ), but with a different probability parameter p 2 . Thus the above result for h = 1 still applies to the case h = 2.

A.4. Proof of Proposition 4
Let us focus on the proof for h = 1. The formula for larger h is easily deduced by the iterative expectation theorem. The joint p.g.f. of (X t , X t+1 ) is: Then we get, for all u between 0 and 1: (a. 38) Similarly, by considering the marginal p.g.f. L(u) := E[u Xt ] = e −β (1−u) ν 1−p ν , we get: On the other hand, we can expand the LHS of (a.38) and (a.39) and get: (a.42) If we multiply the LHS of this above equation times by (pu + 1 − p), we get the RHS of (a. 38), and, if we multiply it by u, we get the RHS of (a.39). Thus the Thus it suffices to show that when i goes to infinity, we have: ci ci−1 → 1. This latter condition is satisfied since the radius of convergence of expansion (a.42) is equal to 1.

A.5. Proof of Proposition 5
Let us first show that the distribution of (X t , X t+1 ) is within the domain of attraction of a certain bivariate alpha-stable vector (Y t , Y t+1 ), where process (Y t ) is a noncausal alpha-stable linear AR(1) process. For a deterministic sequence a n that goes to infinity when n goes to infinity (at a rate to be determined later), we consider the scaled sample average: where (X .., n, are i.i.d. copies of (X t , X t+1 ). By applying equation (a.37), the Laplace transform of this vector is such that: Straightforward, but tedious algebra, shows that, for a n = n 1/ν , which goes to infinity in n, and the RHS of the above equality converges to: (a.43) Thus the joint distribution of (X t , X t+1 ) belongs to the domain of attraction of some alpha-stable distribution with Laplace transform given by equation (a.43) [see e.g. [34]]. The spectral measure of the limiting distribution has two point masses at (1, 0) and ( p √ 1+p 2 , 1 √ 1+p 2 ). 17 Then we apply Thm 4 of [34] and conclude that process (X t ) satisfies Proposition 5.

A.6. Proof of Lemma 4
We have: where the first term E[u t ] is factored out since, by Assumption 3, t is independent of • (t − 1), whereas the conditional p.g.f. of t−1 (t) + t−2 (t) + · · · becomes the product of conditional p.g.f.'s by the conditional independence (see Assumption 4). By Assumption 3, the s-th element in the infinite product is a measurable function of s (t − 1), u and t − s only, and by equation (21), these conditioning variables should also sum up to X t−1 . Thus, in order for the conditional p.m.f. P (X t | • (t − 1)) to depend on • (t − 1) through X t−1 only, and for all possible values of • (t − 1), there should exist a constant function of u, g 1 (u), say, as well as a function of u and t − s, g 2 (u, t − s), say, such that: should be an exponential affine function of s (t − 1) with a slope that can only depend on u, but not on t − s. Since the LHS of the above equation is a conditional p.g.f., functions g 1 and g 2 are further constrained: 17 Alternatively, we recognize that the limit (a.43) is the Laplace transform of (Yt, Y t+1 ), where (Yt) is a linear noncausal alpha-stable AR(1) process defined by: where ξ t s are i.i.d., and follow alpha-stable distribution, with Laplace transform E[e −uξ t ] = exp(−βu ν ), for all u > 0. By [15], Thm 2.2, this latter process satisfies the proposition.
• First by choosing s (t − 1) = 0 (in this case s (t) is almost surely zero as well), we get g 2 (u, t − s) = 1. • Then we choose s (t − 1) = 1, and conclude that g 1 (·) is necessarily the p.g.f. of a count distribution. Thus given s (t − 1), the conditional distribution of s (t) is the convolution of s (t − 1) identical count distributions with p.g.f. g(·). But since s (t) is bounded by s (t − 1), function g 1 (·) must be the p.g.f. of a Bernoulli distribution, say, B (1, p), i.e. g 1 (u) = pu +1−p.
As a consequence, the conditional distribution of s (t) given s (t−1) is binomial B( s (t − 1), p).

A.8. Proof of Proposition 8
First, by the arrival-departure duality, it suffices to prove the proposition for the causal INAR(1) process. It suffices to show that˜ t and˜ t+1 are independent, if and only if t is Poisson distributed 18 . Let us compute their joint p.g.f.. We have:  (1), Under suitable regularity conditions (such as the continuity of g), the only solution of this functional equation is g(x + 1) = e λx , for some positive 19 constant λ. Thus we deduce that (t) and˜ (t + 1) are independent if and only if g(u) = e λ(u−1) , that is if (X t ) is Poisson P(λ) distributed. Then by stationarity, the p.g.f. of t is: and t is also Poisson P((1 − p)λ) distributed.

A.9. Proof of Proposition 9
The equivalence between i) and ii) is a consequence of [36]. Let us show that conditions i) and iii) are equivalent. Recall that X t+1 − t+1 = X t −˜ t+1 , and that X t −˜ t+1 and˜ t+1 are independent by definition (3). Then using a characterization theorem of the Poisson distribution [see the Theorem in [32]], X t+1 − t+1 = X t −˜ t+1 is Binomial conditional on X t if and only if X t −˜ t+1 and˜ t+1 are Poisson, or equivalently if and only if process (X t ) is Poisson INAR (1).
For the equivalence between conditions i) and iv), we have by definition: where the Z * i,t 's are independent of X t+1 . On the other hand, we have: where˜ t+1 is independent of X t+1 . Thus t+1 and X t are independent if and A simple calculation shows that the joint p.g.f. of these two variables is equal to: By a similar argument as in Proposition 8, this latter p.g.f. is multiplicatively separable if and only if the p.g.f. of X t is Poisson.
Finally, let us prove the equivalence between conditions i) and iv) by using the calendar time conditional p.m.f. [eq. (5)]. X t+1 − t+1 and t+1 are independent, if and only if their joint p.g.f. is separable. Using equation (a.45), we can easily check that this p.g.f. is equal to: By the same argument as in Appendix A.3, this p.g.f. is separable if and only if X t+1 is Poisson.

Online Appendix: Noncausal RCINAR(1) process
We have the following characterization concerning the reversibility of RCI-NAR(1) processes: Proposition 10 (Characterization of time reversible RCINAR(1) processes). If the sequence (p t ) is i.i.d. and independent of ( t ), then the RCINAR (1) N B(β, θ), the marginal distribution of (X t ) is N B(α + β, θ), and the departure process˜ t is also i.i. d. N B(β, θ). This process is introduced by [23] and applied to real data by [40]. If we let α + β go to infinity, while at the same time keeping the ratio α α+β = p 0 and the mean of the N B(β, θ) distribution β θ 1−θ = λ 0 constant, then in the limiting case the beta (resp. NB) distribution tends to the point mass at p 0 [resp. Poisson P(λ 0 ) distribution]. Hence we recover the Poisson-INAR(1) process as the limit.

When (˜ t ) is
When the RCINAR(1) process is time irreversible, its reverse time dynamics is different from its calendar time dynamics. Then a noncausal RCINAR(1) is defined as a time-reversed RCINAR(1) process. Then we can give a queuing interpretation for the noncausal RCINAR(1), in a similar way as in Section 5.1 for noncausal INAR (1). Such an interpretation would be based on the stochastic specification of the sequence (˜ s (t)) instead of ( s (t)) and is omitted due to space constraint.
Proposition 11 (A queuing interpretation of the RCINAR(1) model). The solution to the causal, RCINAR(1) model has the disaggregate queuing representation (21), where the joint distribution of ( s (t)) is as follows: 1. For a given s, sequence ( s (t)) is Markov and the conditional distribution of s (t +1) given s (t) is conditionally binomial given p t ∈]0, 1[, where (p t ) is i.i.d. for t varying. 2. For any s 1 = s 2 , variables s1 (t + 1) and s2 (t + 1) are independent given 3. The sequence ( s ) s is i.i.d.
The proof is obvious and omitted. Finally, the queuing interpretation of the RCINAR(1) process is obtained in a similar way, by reversing the roles of arrival and departure processes.

OA.1. Proof of Proposition 10
We follow the proof of [36] for INAR(1) processes. The "if" part of the proposition is straightforward and we focus on the "only if" part. Suppose that process (X t ) is RCINAR (1). Then the conditional distribution of the process is: where function P is the p.m.f. of t . Under reversibility, we have, for each i ∈ N: P[(X t , X t+1 , X t+2 , X t+3 ) = (0, 1, i, 0)] = P[(X t , X t+1 , X t+2 , X t+3 ) = (0, i, 1, 0)], (a.47) or equivalently 20 , P (1|0)P (i|1)P (0|i) = P (i|0)P (1|i)P (0|1). We plug in these expressions into equation (a.48) to get: or equivalently: . (a.49) Similarly, by considering the paths (0, 2, i, 0) and (0, i, 2, 0) for an i ≥ 2, we get: P (2|0)P (i|2)P (0|i) = P (i|0)P (2|i)P (0|2), (a.50) or equivalently: Let us now expand the parentheses on both sides and check that the first terms (resp. the second terms) on both sides exactly cancel out due to equation (a.49). Then the equality between the third terms shows that: (a.51) is constant when i varies. Let us denote: Then we have: and the term in equation (a.51) becomes: which should be constant in i. In other words, the sequence ( mi mi−1−mi ) i is an arithmetic sequence. If this sequence is constant, then the sequence of integer moments of 1 − p t is geometric. Since this sequence of moments characterizes a distribution on [0, 1], we conclude that p t follows a point mass distribution, that is, process (X t ) is indeed INAR(1). Otherwise, when the increment of this arithmetic sequence is strictly positive, we have mi mi−1−mi = (i − 1)c + d for positive constants c and d, or equivalently: β) . In other words the sequence of integer moments of 1 − p t coincides with those of a beta distribution Beta(β, α). Thus p t follows Beta(α, β) distribution. Let us finally get back to equation (a.49) and show that, in this case, the distribution of ( t ) is negative binomial. Summing up this equation for i = 1, 2, .... yields: βμ = (β + μ) P (1) P (0) , (a.52) 3888 C. Gouriéroux and Y. Lu where μ is the expectation of t . Thus equation (a.49) becomes: (a.53) By iteration we deduce that ( t ) follows a NB distribution with stopping parameter β and mean μ.