Martingale proofs of many-server heavy-traffic limits for Markovian queues

This is an expository review paper illustrating the ``martingale method'' for proving many-server heavy-traffic stochastic-process limits for queueing models, supporting diffusion-process approximations. Careful treatment is given to an elementary model -- the classical infinite-server model $M/M/\infty$, but models with finitely many servers and customer abandonment are also treated. The Markovian stochastic process representing the number of customers in the system is constructed in terms of rate-1 Poisson processes in two ways: (i) through random time changes and (ii) through random thinnings. Associated martingale representations are obtained for these constructions by applying, respectively: (i) optional stopping theorems where the random time changes are the stopping times and (ii) the integration theorem associated with random thinning of a counting process. Convergence to the diffusion process limit for the appropriate sequence of scaled queueing processes is obtained by applying the continuous mapping theorem. A key FCLT and a key FWLLN in this framework are established both with and without applying martingales.


Introduction
The purpose of this paper is to illustrate how to do martingale proofs of many-server heavy-traffic limit theorems for queueing models, as in Krichagina and Puhalskii [37] and Puhalskii and Reiman [53]. Even though the method is remarkably effective, it is somewhat complicated. Thus it is helpful to see how the argument applies to simple models before considering more elaborate models. For the more elementary Markovian models considered here, these many-server heavy-traffic limits were originally established by other methods, but new methods of proof have been developed. For the more elementary models, we show how key steps in the proof -a FCLT ((59)) and a FWLLN (Lemma 4.3) -can be done without martingales as well as with martingales. For the argument without martingales, we follow Mandelbaum and Pats [47,48], but without their level of generality and without discussing strong approximations.
Even though we focus on elementary Markov models here, we are motivated by the desire to treat more complicated models, such as the non-Markovian GI/GI/n models in Krichagina and Puhalskii [37], Puhalskii and Reiman [53], Reed [55] and Kaspi and Ramanan [34], and network generalizations of these, such as non-Markovian generalizations of the network models considered by Dai and Tezcan [14,15,16], Gurvich and Whitt [23,24,25] and references cited there. Thus we want to do more than achieve a quick proof for the simple models, which need not rely so much on martingales; we want to illustrate martingale methods that may prove useful for more complicated models.

The Classical Markovian Infinite-Server Model
We start by focusing on what is a candidate to be the easiest queueing model -the classic M/M/∞ model -which has a Poisson arrival process (the first M ), i.i.d. exponential service times (the second M ), independent of the arrival process, and infinitely many servers. Let the arrival rate be λ and let the individual mean service time be 1/µ. Afterwards, in §7.1, we treat the associated M/M/n/∞ + M (Erlang A or Palm) model, which has n servers, unlimited waiting room, the first-come firstserved (FCFS) service discipline and Markovian customer abandonment (the final +M ); customers abandon (leave) if they have to spend too long waiting in queue. The successive customer times to abandon are assumed to be i.i.d. exponential random variables, independent of the history up to that time. That assumption is often reasonable with invisible queues, as in telephone call centers. Limits for the associated M/M/n/∞ (Erlang C) model (without customer abandonment) are obtained as a corollary, by simply letting the abandonment rate be zero.
For the initial M/M/∞ model, let Q(t) be the number of customers in the system at time t, which coincides with the number of busy servers at time t. It is well known that Q ≡ {Q(t) : t ≥ 0} is a birth-and-death stochastic process and that Q(t) ⇒ Q(∞) as t → ∞, where ⇒ denotes convergence in distribution, provided that P (Q(0) < ∞) = 1, and Q(∞) has a Poisson distribution with mean E[Q(∞)] = λ/µ.
We are interested in heavy-traffic limits in which the arrival rate is allowed to increase. Accordingly, we consider a sequence of models indexed by n and let the arrival rate in model n be λ n ≡ nµ, n ≥ 1 . (1) Let Q n (t) be the number of customers in the system at time t in model n. By the observation above, Q n (∞) has a Poisson distribution with mean E[Q n (∞)] = λ n /µ = n. Since the Poisson distribution approaches the normal distribution as the mean increases, we know that where N (0, 1) denotes a standard normal random variables (with mean 0 and variance 1). However, we want to establish a limit for the entire stochastic process {Q n (t) : t ≥ 0} as n → ∞. For that purpose, we consider the scaled processes X n ≡ {X n (t) : t ≥ 0} defined by To establish the stochastic-process limit in (3), we have to be careful about the initial conditions. We will assume that X n (0) ⇒ X(0) as n → ∞. In addition, we assume that the random initial number of busy servers, Q n (0), is independent of the arrival process and the service times. Since the service-time distribution is exponential, the remaining service times of those customers initially in service have independent exponential distributions because of the lack-of-memory property of the exponential distribution.
The heavy-traffic limit theorem asserts that the sequence of stochastic processes {X n : n ≥ 1} converges in distribution in the function space D ≡ D([0, ∞), R) to the Ornstein-Uhlenbeck (OU) diffusion process as n → ∞, provided that appropriate initial conditions are in force; see Billingsley [8] and Whitt [69] for background on such stochastic-process limits.
Here is the theorem for the basic M/M/∞ model: where X is the OU diffusion process with infinitesimal mean m(x) = −µx and infinitesimal variance σ 2 (x) = 2µ. Alternatively, X satisfies the stochastic integral equation where B ≡ {B(t) : t ≥ 0} is a standard Brownian motion. Equivalently, X satisfies the stochastic differential equation (SDE) dX(t) = −µX(t)dt + 2µdB(t), t ≥ 0 .
Much of this paper is devoted to four proofs of Theorem 1.1. It is possible to base the proof on the martingale functional central limit theorem (FCLT), as given in §7.1 of Ethier and Kurtz [19] and here in §8, as we will show in §9.2, but it is not necessary to do so. Instead, it can be based on the classic FCLT for the Poisson process, which is an easy consequence of Donsker's FCLT for sums of i.i.d. random variables, and the continuous mapping theorem. Nevertheless, the martingale structure can still play an important role. With that approach, the martingale structure can be used to establish stochastic boundedness of the scaled queueing processes, which we show implies required fluid limits or functional weak laws of large numbers (FWLLN's) for random-time-change stochastic processes, needed for an application of the continuous mapping theorem with the composition map. Alternatively (the third method of proof), the fluid limit can be established by combining the same continuous mapping with the strong law of large numbers (SLLN) for the Poisson process. It is also necessary to understand the characterization of the limiting diffusion process via (5) and (6). That is aided by the general theory of stochastic integration, which can be considered part of the martingale theory [19,58].

The QED Many-Server Heavy-Traffic Limiting Regime
We will also establish many-server heavy-traffic limits for Markovian models with finitely many servers, where the number n of servers goes to infinity along with the arrival rate in the limit. We will consider the sequence of models in the quality-and-efficiency (QED) many-server heavy-traffic limiting regime, which is defined by the condition nµ − λ n √ n → βµ as n → ∞ .
This limit in which the arrival rate and number of servers increase together according to (7) is just the right way so that the probability of delay converges to a nondegenerate limit (strictly between 0 and 1); see Halfin and Whitt [26]. We will also allow finite waiting rooms of size m n , where the waiting rooms grow at rate √ n as n → ∞, i.e., so that m n √ n → κ ≥ 0 as n → ∞ .
With the spatial scaling by √ n, as in (3), this scaling in (8) is just right to produce a reflecting upper barrier at κ in the limit process.
In addition, we allow Markovian abandonment, with each waiting customer abandoning at rate θ. We let the individual service rate µ and individual abandonment rate θ be fixed, independent of n. These modifications produce a sequence of M/M/n/m n + M models (with +M indicating abandonment). The special case of the Erlang A (abandonment), B (blocking or loss) and C (delay) models are obtained, respectively, by (i) letting m n = ∞, (ii) letting m n = 0, in which case the +M plays no role, and (iii) letting m n = ∞ and θ = 0. So all the basic many-server Markovian queueing models are covered.
Here is the corresponding theorem for the M/M/n/m n + M model. Theorem 1.2 (heavy-traffic limit in D for the M/M/n/m n + M model) Consider the sequence of M/M/n/m n + M models defined above, with the scaling in (7) and (8). Let X n be as defined in (3). If X n (0) ⇒ X(0) in R as n → ∞, then X n ⇒ X in D as n → ∞, where the limit X is the diffusion process with infinitesimal mean m(x) = −βµ − µx for x < 0 and m(x) = −βµ − θx for x > 0, infinitesimal variance σ 2 (x) = 2µ and reflecting upper barrier at κ. Alternatively, the limit process X is the unique (−∞, κ]-valued process satisfying the stochastic integral equation [µ(X(s) ∧ 0) + θ(X(s) ∨ 0)] ds − U (t) (9) for t ≥ 0, where B ≡ {B(t) : t ≥ 0} is a standard Brownian motion and U is the unique nondecreasing nonnegative process in D such that (9) holds and

Literature Review
A landmark in the application of martingales to queues is the book by Brémaud [12]; contributions over the previous decade are described there. As reflected by the references cited in Krichagina and Puhalskii [37] and Puhalskii and Reiman [53], and as substantiated by Puhalskii in personal communication, Liptser was instrumental in developing the martingale method to prove limit theorems for queueing models, leading to diffusion approximations; e.g., see Kogan,Liptser and Smorodinskii [36] and §10.4 of Liptser and Shiryaev [44].
The specific M/M/∞ result in Theorem 1.1 was first established by Iglehart [29]; see Borovkov [9,10], Whitt [68], Glynn and Whitt [22], Massey and Whitt [50], Mandelbaum and Pats [47,48], Mandelbaum, Massey and Reiman [46] and Krichagina and Puhalskii [37] for further discussion and extensions. A closely related textbook treatment appears in Chapter 6 of Robert [57]. Iglehart applied a different argument; in particular, he applied Stone's [62] theorem, which shows for birth-and-death processes that it suffices to have the infinitesimal means and infinitesimal variances converge, plus other regularity conditions; see Garnett et al. [21] and Whitt [71] for recent uses of that same technique. Iglehart directly assumed finitely many servers and let the number of servers increase rapidly with the arrival rate; the number of servers increases so rapidly that it is tantamount to having infinitely many servers. The case of infinitely many servers can be treated by a minor modification of the same argument.
The corresponding QED finite-server result in Theorem 1.2, in which the arrival rate and number of servers increase together according to (7), which is just the right way so that the probability of delay converges to a nondegenerate limit (strictly between 0 and 1), was considered by Halfin and Whitt [26] for the case of infinite waiting room and no abandonment. There have been many subsequent results; e.g., see Mandelbaum and Pats [47,48], Srikant and Whitt [61], Mandelbaum et al. [46], Puhalskii and Reiman [53], Garnett et al. [21], Borst et al. [11], Jelenković et al. [31], Whitt [72], Mandelbaum and Zeltyn [49] and Reed [55] for further discussion and extensions.
Puhalskii and Reiman [53] apply the martingale argument to establish heavytraffic limits in the QED regime. They establish many-server heavy-traffic limits for the GI/P H/n/∞ model, having renewal arrival process (the GI), phasetype service-time distributions (the P H), n servers, unlimited waiting space, and the first-come first-served (FCFS) service discipline. They focus on the number of customers in each phase of service, which leads to convergence to a multi-dimensional diffusion process. One of our four proofs is essentially their argument.
Whitt [72] applied the same martingale argument to treat the G/M/n/m n + M model, having a general stationary point process for an arrival process (the G), i.i.d. exponential service times, n servers, m n waiting spaces, the FCFS service discipline and customer abandonment with i.i.d exponential times to abandon; see Theorem 3.1 in [72]. Whitt [72] is primarily devoted to a heavy-traffic limit for the G/H * 2 /n/m n model, having a special H * 2 service-time distribution (a mixture of an exponential and a point mass at 0), by a different argument, but there is a short treatment of the G/M/n/m n + M model by the martingale argument, following Puhalskii and Reiman [53] quite closely. The martingale proof is briefly outlined in §5 there. The extension to general arrival processes is perhaps the most interesting contribution there, but that generalization can also be achieved in other ways.

Organization
The rest of this paper is organized as follows: We start in §2 by constructing the Markovian stochastic process Q representing the number of customers in the system in terms of rate-1 Poisson processes. We do this in two ways: (i) through random time changes and (ii) through random thinnings. We also give the construction in terms of arrival and service times used by Krichagina and Puhalskii [37] to treat the G/GI/∞ model. Section §3 is devoted to martingales. After reviewing basic martingale notions, we construct martingale representations associated with the different constructions. We justify the first two representations by applying, respectively: (i) optional stopping theorems where the random time changes are the stopping times and (ii) the integration theorem associated with random thinning of a counting process. The two resulting integral representations are very similar. These integral representations are summarized in Theorems 3.4 and 3.5. In §3 we also present two other martingale representations, one based on constructing counting processes associated with a continuous-time Markov chain, exploiting the infinitesimal generator, and the other -for G/GI/∞ -based on the sequential empirical process (see (44)).
Section 4 is devoted to the main steps of the proof of Theorem 1.1 using the first two martingale representations. In §4.1 we show that the integral representation has a unique solution and constitutes a continuous function mapping R × D into D. In order to establish measurability, we establish continuity in the Skorohod [60] J 1 topology as well as the topology of uniform convergence over bounded intervals. As a consequence of the continuity, it suffices to prove FCLT's for the scaled martingales in these integral representations. For the first martingale representation, the scaled martingales themselves are random time changes of the scaled Poisson process. In §4.2 we show that a FCLT for the martingales based on this first representation, and thus the scaled queueing processes, can be obtained by applying the classical FCLT for the Poisson process and the continuous mapping theorem with the composition map.
To carry out the continuous-mapping argument above, we also need to establish the required fluid limit or functional weak law of large numbers (FWLLN) for the random time changes, needed in the application of the CMT with the composition map. In §4.3, following Mandelbaum and Pats [47,48], we show that this fluid-limit step can be established by applying the strong law of large numbers (SLLN) for the Poisson process with the same continuous mapping determined by the integral representation. If we do not use martingales, then we observe that it is easy to extend the stochastic-process limit to general arrival processes satisfying a FCLT.
But we can also use martingales to establish the FCLT and the FWLLN. For the FWLLN, the martingale structure can be exploited via the Lenglart-Rebolledo inequality to prove stochastic boundedness, first for the scaled martingales and then for the scaled queueing processes, which in turn can be used to to establish the required FWLLN for the scaled random time changes -Lemma 4.2.
Since this martingale proof of the fluid limit relies on stochastic boundedness, which is related to tightness, we present background on these two important concepts in §5. For the proof of Theorem 1.1, we conclude there that it suffices to show that the predictable quadratic variations of the square-integrable martingales are stochastically bounded in R in order to have the sequence of scaled queueing processes {X n } be stochastically bounded in D.
We complete the proof of Theorem 1.1 in §6. In Lemma 5.9 and §6.1 we show that stochastic boundedness of {X n } in D implies the desired fluid limit or FWLLN needed for the scaled random-time-change processes needed in an application of the continuous mapping with the composition function. In §6.2 we complete the proof by showing that the predictable quadratic variation processes of the martingales are indeed stochastically bounded in R. In §6.3 we show that it is possible to remove a moment condition imposed on Q n (0), the initial random number of customers in the system, in the martingale representation; in particular, we show that it is not necessary to assume that E[Q n (0)] < ∞. Finally, in §6.4 we state the G/GI/∞ limit in Krichagina and Puhalskii [37] and show that the special case M/M/∞ is consistent with Theorem 1.1.
In §7 we discuss stochastic-process limits for other queueing models. In §7.1 we present a martingale proof of the corresponding many-server heavy-traffic limit for the M/M/n/∞ + M model. Corresponding results hold for the model without customer abandonment by setting the abandonment rate to zero. The proof is nearly the same as for the M/M/∞ model. The only significant difference is the use of the optional stopping theorem for multiple stopping times with respect to multiparameter martingales, as in Kurtz [40], § §2.8 and 6.2 of Ethier and Kurtz [19] and §12 of Mandelbaum and Pats [48]. We discuss the extension to cover finite waiting rooms in §7.2 and non-Markovian arrival processes in §7.3.
We state a version of the martingale FCLT from p 339 of Ethier and Kurtz [19] in §8. In a companion paper, Whitt [73], we review the proof, elaborating on the proof in Ethier and Kurtz [19] and presenting alternative arguments, primarily based on Jacod and Shiryayev [30]. We present a more extensive review of tightness there. In §9 we show how the martingale FCLT implies both the FCLT for a Poisson process and the required FCLT for the scaled martingales arising in the second and third martingale representations.

Sample-Path Constructions
We start by making direct sample-path constructions of the stochastic process Q representing the number of customers in the system in terms of independent Poisson processes. We show how this can be done in two different ways. Afterwards, we present a different construction based on arrival and service times, which only exploits the fact that the service times are mutually independent and independent of the arrival process and the initial number of customers in the system (with appropriate common distribution assumptions).

Random Time Change of Unit-Rate Poisson Processes
We first represent arrivals and departures as random time changes of independent unit-rate Poisson processes. For that purpose, let A ≡ {A(t) : t ≥ 0} and S ≡ {S(t) : t ≥ 0} be two independent Poisson processes, each with rate (intensity) 1. We use the process A to generate arrivals and the process S to generate service completions, and thus departures. Let the initial number of busy servers be Q(0). We assume that Q(0) is a proper random variable independent of the two Poisson processes A and S.
The arrival process is simple. We obtain the originally-specified arrival process with rate λ by simply scaling time in the rate-1 Poisson process A; i.e., we use A λ (t) ≡ A(λt) for t ≥ 0. It is elementary to see that the stochastic process A λ is a Poisson process with rate λ.
The treatment of service completions is more complicated. Let D(t) be the number of departures (service completions) in the interval [0, t]. We construct this process in terms of S by setting but formula (11) is more complicated than it looks. The complication is that Q(t) appearing as part of the argument inside S necessarily depends on the history {Q(s) : 0 ≤ s < t}, which in turn depends on the history of S, in particular, upon {S µ s 0 Q(u) du : 0 ≤ s < t}. Hence formula (11) is recursive; we must show that it is well defined.
Of course, the idea is not so complicated: Formula (11) is a consequence of the fact that the intensity of departures at time s is µQ(s), where the number Q(s) of busy servers at time s is multiplied by the individual service rate µ. The function µ t 0 Q(s) ds appearing as an argument inside S serves as a random time change; see §II.6 of Brémaud [12], Chapter 6 of Ethier and Kurtz [19] and §7.4 of Daley and Vere-Jones [17]. By the simple conservation of flow, which expresses the content at time t as initial content plus flow in minus flow out, we have the basic expression Lemma 2.1 (construction) The stochastic process {Q(t) : t ≥ 0} is well defined as a random element of the function space D by formula (12). Moreover, it is a birth-and-death stochastic process with constant birth rate λ k = λ and linear state-dependent death rate µ k = kµ.
Proof. To construct a bonafide random element of D, start by conditioning upon the random variable Q(0) and the two Poisson processes A and S. Then, with these sample paths specified, we recursively construct the sample path of the stochastic process Q ≡ {Q(t) : t ≥ 0}. By induction over the sucessive jump times of the process Q, we show that the sample paths of Q are right-continuous piecewise-constant real-valued functions of t. Since the Poisson processes A and S have only finitely many transitions in any finite interval w.p.1, the same is necessarily true of the constructed process Q. This sample-path construction follows Theorem 4.1 in Chapter 8 on p. 327 of Ethier and Kurtz [19]; the argument also appears in Theorem 9.2 of Mandelbaum et al. [48]. Finally, we can directly verify that the stochastic process Q satisfies the differential definition of a birth-and-death stochastic process. Let F t represent the history of the system up to time t. That is the sigma field generated by {Q(s) : 0 ≤ s ≤ t}. It is then straightforward that the infinitesimal transition rates are as claimed: as h ↓ 0 for each k ≥ 0, where o(h) denotes a function f such that f (h)/h → 0 as h ↓ 0. Ethier and Kurtz [19] approach this last distributional step by verifying that the uniquely constructed process is the solution of the local-martingale problem for the generator of the Markov process, which in our case is the birthand-death process. Some further explanation is perhaps helpful. Our construction above is consistent with the construction of the queue-length process as a Markov process, specifically, a birth-and-death process. There are other possible constructions. A different construction would be the standard approach in discrete-event simulation, with event clocks. Upon the arrival of each customer, we might schedule the arrival time of the next customer, by generating an exponential interarrival time. We might also generate the required service time of the current arrival. With the simulation approach, we have information about the future state that we do not have with the Markov-process construction. The critical distinction between the different constructions involves the information available at each time. The information available is captured by the filtration, which we discuss with the martingale representations in the next section.
The random-time-change approach we have used here is natural when applying strong approximations, as was done by Mandelbaum, Massey and Reiman [46] and Mandelbaum and Pats [47,48]. They applied the strong approximation for a Poisson process, as did Kurtz [39]. A different use of strong approximations to establish heavy-traffic limits for non-Markovian infinite-server models is contained in Glynn and Whitt [22].

Random Thinning of Poisson Processes
We now present an alternative construction, which follows §II.5 of Brémaud [12] and Puhalskii and Reiman [53]. For this construction, let A λ and S µ,k for k ≥ 1 be independent Poisson processes with rate λ and µ, respectively. As before, we assume that these Poisson processes are independent of the initial number of busy servers, Q(0). This will not alter the overall system behavior, because the service-time distribution is exponential. By the lack-of-memory property, the remaining service times are distributed as i.i.d. exponential random variables, independent of the elapsed service times at time 0.
We let all arrivals be generated from arrival process A λ ; we let service completions from individual servers be generated from the processes S µ,k . To be specific, let the servers be numbered. With that numbering, at any time we use S µ,k to generate service completions from the busy server with the k th smallest index among all busy servers. (We do not fix attention on a particular server, because we want the initial indices to refer to the busy servers at all time.) Instead of (11), we define the departure process by where 1 A is the indicator function of the event A; i.e., 1 A (ω) = 1 of ω ∈ A and 1 A (ω) = 0 otherwise. It is important that we use the left limit Q(s−) in the integrand of (13), so that the intensity of any service completion does not depend on that service completion itself; i.e., the functions Q(s−) and 1 {Q(s−)=k} are left-continuous in s for each sample point. Then, instead of (12), we have With this alternative construction, there is an analog of Lemma 2.1 proved in essentially the same way. In this setting, it is even more evident that we can construct a sample path of the stochastic process {Q(t) : t ≥ 0} by first conditioning on a realization of Even though this construction is different that the one in §2.1, it too is consistent with the Markov-process view. Consistent with most applications, we know what has happened up to time t, but not future arrival times and service times.

Construction from Arrival and Service Times
Now, following Krichagina and Puhalskii [37], we construct the sample paths of the process Q from the arrival and service times. This construction applies to the more general G/GI/∞ system; we will show how the approach applies to the special case M/M/∞; see § §3.7 and 6.4. See [37] for references to related work.
Let A(t) be the cumulative number of arrivals in the interval [0, t] and let τ i be the time of the i th arrival. Let all the service times be mutually independent random variables, independent of the arrival process and the number, Q(0), of customers in the system at time 0 (before new arrivals). Let Q(0) be independent of the arrival process. Let the Q(0) initial customers have service times from {η i , i ≥ 1} with cumulative distribution function (cdf) F 0 . Let the new arrivals have service times from {η i , i ≥ 1} with cdf F .
Then D(t), the number of customers that leave the system by time t, can be expressed as and Since this construction does not require A to be Poisson (or even renewal) or the cdf's F and F 0 to be exponential, this approach applies to the non-Markovian G/GI/∞ model. However, the stochastic process Q itself is no longer Markov in the general setting. With Poisson arrivals, we can extend [37] to obtain a Markov process by considering the two-parameter process {Q(t, y) : 0 ≤ y ≤ t, t ≥ 0}, where Q(t, y) is the number of customers in the system at time t that have elapsed service times greater than or equal to y. (For simplicity in treating the initial conditions, let the initial customers have elapsed service times 0 at time 0. Since P (A(0) = 0) = 1, Q(t, t) is (w.p.1) the number of initial customers still in the system at time t.) Then With renewal (GI) arrivals (and the extra assumption that P (A(0) = 0) = 1), we can obtain a Markov process by also appending the elapsed interarrival time. Of course, there is an alternative to (17) if we add remaining times instead of elapsed times, but that information is less likely to be available as time evolves. Heavy-traffic limits for these two-parameter processes follow from the argument of [37], but we leave detailed discussion of these extensions to future work. For other recent constructions and limits in this spirit, see Kaspi and Ramanan [34] and references cited there.

Martingale Representations
For each of the sample-path constructions in the previous section, we have associated martingale representations. At a high level, it is easy to summarize how to treat the first two sample-path constructions, drawing only on the first two chapters of Brémaud [12]. We represent the random time changes as stopping times with respect to appropriate filtrations and apply versions of the optional stopping theorem, which states that random time changes of martingales are again martingales, under appropriate regularity conditions; e.g., see Theorem T2 on p. 7 of [12]. The problems we consider tend to produce multiple random time changes, making it desirable to apply the optional stopping theorem for multiparameter random time changes, as in § §2.8 and 6.2 of Ethier and Kurtz [19], but we postpone discussion of that more involved approach until Section 7.1.
For the random thinning, we instead apply the integration theorem for martingales associated with counting processes, as in Theorem T8 on p. 27 of Brémaud [12], which concludes that integrals of predictable processes with respect to martingales associated with counting processes produce new martingales, under regularity conditions.
We also present a third martingale representation, based on martingales associated with generators of Markov processes. That approach applies nicely here, because the queueing processes we consider are birth-and-death processes. Thus we can also apply martingales associated with Markov chains, as on pp 5-6, 294 of [12]. Finally, we present a fourth martingale representation associated with §2.3.

Martingale Basics
In this subsection we present some preliminary material on continuous-time martingales. There is a large body of literature providing background, including the books by: Brémaud [12], Ethier and Kurtz [19], Liptser and Shiryayev [44], Jacod and Shiryayev [30], Rogers and Williams [58,59], Karatzas and Shreve [33] and Kallenberg [35]. The early book by Brémaud [12] remains especially useful because of its focus on stochastic point processes and queueing models. More recent lecture notes by Kurtz [41] and van der Vaart [64] are very helpful as well.
Stochastic integrals play a prominent role, but we only need relatively simple cases. Since we will be considering martingales associated with counting processes, we rely on the elementary theory for finite-variation processes, as in §IV.3 of Rogers and Williams [58]. In that setting, the stochastic integrals reduce to ordinary Stieltjes integrals and we exploit integration by parts.
On the other hand, as can be seen from Theorems 1.1 and 1.2, the limiting stochastic processes involve stochastic integrals with respect to Brownian motion, which is substantially more complicated. However, that still leaves us well within the classical theory. We can apply the Ito calculus as in Chapter IV of [58], without having to draw upon the advanced theory in Chapter VI.
For the stochastic-process limits in the martingale setting, it is natural to turn to Ethier and Kurtz [19], Liptser and Shiryayev [44] and Jacod and Shiryayev [30]. We primarily rely on Theorem 7.1 on p. 339 of Ethier and Kurtz [19]. The Shiryayev books are thorough; Liptser and Shiryayev [44] focuses on basic martingale theory, while Jacod and Shiryayev [30] focuses on stochastic-process limits.
We will start by imposing regularity conditions. We assume that all stochastic processes X ≡ {X(t) : t ≥ 0} under consideration are measurable maps from an underlying probability space (Ω, F , P ) to the function space D ≡ D([0, ∞), R) endowed with the standard Skorohod J 1 topology and the associated Borel σfield (generated by the open subsets), which coincides with the usual σ-field generated by the coordinate projection maps; see § §3.3 and 11.5 of [69].
Since we will be working with martingales, a prominent role is played by the filtration (histories, family of σ fields) F ≡ {F t : t ≥ 0} defined on the underlying probability space (Ω, F , P ). (We have the containments: F t1 ⊆ F t2 ⊆ F for all t 1 < t 2 .) As is customary, see p. 1 of [44], we assume that all filtrations satisfy the usual conditions: (i) they are right-continuous: (ii) complete (F 0 , and thus F t contains all P -null sets of F ).
We will assume that any stochastic process X under consideration is adapted to the filtration; i.e., X is F-adapted, which means that X(t) is F t -measurable for each t. These regularity conditions guarantee desirable measurability properties, such as progressive measurability: The stochastic process X is progressively measurable with respect to the filtration F if, for each t ≥ 0 and Borel measur- See p. 5 of [33] and Section 1.1 of [44]. In turn, progressive measurability implies measurability, regarding X(t) as a map from the product with probability 1 (w.p.1) with respect to the underlying probability measure P for each t ≥ 0 and s > 0.
It is often important to have a stronger property than the finite-moment see p. 286 of [12] and p. 114 of [59]. We remark that UI implies, but is not implied by sup A word of warning is appropriate, because the books are not consistent in their assumptions about integrability. A stochastic process X ≡ {X(t) : t ≥ 0} may be called integrable if E[|X(t)|] < ∞ for all t or if the stronger (18) holds. This variation occurs with square-integrable, defined below. Similarly, the basic objects may be taken to be martingales, as defined as above, or might instead be UI martingales, as on p. 20 of [44]. The stronger UI property is used in preservation theorems -theorems implying that stopped martingales and stochastic integrals with respect to martingales -remain martingales. In order to get this property when it is not at first present, the technique of localizing is applied. We localize by introducing associated stopped processes, where the stopping is done with stopping times. A nonnegative random variable τ is an F-stopping time if stopping sometime before t depends only on the history up to time t, i.e., if For any class C of stochastic processes, we define the associated local class C loc as the class of stochastic processes {X(t) : t ≥ 0} for which there exists a sequence of stopping times {τ n : n ≥ 1} such that τ n → ∞ w.p.1 as n → ∞ and the associated stopped processes {X(τ n ∧ t) : t ≥ 0} belong to class C for each n, where a ∧ b ≡ min {a, b}. We obtain the class of local martingales when C is the class of martingales. Localizing expands the scope, because if we start with martingales, then such stopped martingales remain martingales, so that all martingales are local martingales. Localizing is important, because the stopped processes not only are martingales but can be taken to be UI martingales; see p. 21 of [44] and §IV.12 of [58]. The UI property is needed for the preservation theorems.
As is customary, we will also be exploiting predictable stochastic processes, which we will take to mean having left-continuous sample paths. See p. 8 of [12] and §1.2 of [30] for the more general definition and additional discussion. The general idea is that a stochastic process X ≡ {X(t) : t ≥ 0} is predictable if its value at t is determined by values at times prior to t. But we can obtain simplification by working with stochastic processes with sample paths in D. In the setting of D, we can have a left-continuous process by either (i) considering the left-limit version {X(t−) : t ≥ 0} of a stochastic process X in D (with X(0−) ≡ X(0)) or (ii) considering a stochastic process in D with continuous sample paths. Once we restrict attention to stochastic processes with sample paths in D, we do not need the more general notion, because the left-continuous version is always well defined. If we allowed more general sample paths, that would not be the case.
We will be interested in martingales associated with counting processes, adapted to the appropriate filtration. These counting processes will be nonnegative submartingale processes. Thus we will be applying the following special case of the Doob-Meyer decomposition.
and Y is adapted to a filtration F ≡ {F t }, then there exists an F-predictable process A, called the compensator of Y or the dual-predictable projection, such that A has nonnegative nondecreasing sample paths, E[A(t)] < ∞ for each t, and M ≡ Y − A is an F-martingale. The compensator is unique in the sense that the sample paths of any two versions must be equal w.p.1.
Proof. See §1.4 of Karatzas and Shreve [33]. The DL condition in [33] is satisfied because of the assumed nonnegativity; see Definition 4.8 and Problem 4.9 on p. 24. For a full account, see §VI.6 of [58].

Quadratic Variation and Covariation Processes
A central role in the martingale approach to stochastic-process limits is played by the quadratic-variation and quadratic-covariation processes, as can be seen from the martingale FCLT stated here in §8. That in turn depends on the notion of square-integrability. We say that a stochastic process Again, to expand the scope, we can localize, and focus on the class of locally square integrable martingales. Because we can localize to get the square-integrability, the condition is not very restrictive.
If M is a square-integrable martingale, then M 2 ≡ {M (t) 2 : t ≥ 0} is necessarily a submartingale with nonnegative sample paths, and thus satisfies the conditions of Theorem 3.1. The predictable quadratic variation (PQV) of a square-integrable martingale M , denoted by M ≡ { M (t) : t ≥ 0} the (angle-bracket process), is the compensator of the submartingale M 2 ; i.e., the stochastic process M is the unique nondecreasing nonnegative predictable process such that E[ M (t)] < ∞ for each t and M 2 − M is a martingale with respect to the reference filtration. (Again, uniqueness holds to the extent that any two versions have the same sample paths w.p.1.) Not only does square integrability extend by localizing, but Theorem 3.1 has a local version; see p. 375 of [58]. As a consequence the PQV is well defined for any locally square-integrable martingale.
Given two locally square-integrable martingales M 1 and M 2 , the predictable quadratic covariation can be defined as see p. 48 of [44]. It can be characterized as the unique (up to equality of sample paths w.p.1) nondecreasing nonnegative predictable process such that We will also be interested in another quadratic variation of a square-integrable martingale M , the so-called optional quadratic variation [M ] (OQV), the square-bracket process). The square-bracket process is actually more general than the angle-bracket process, because the square-bracket process is well defined for any local martingale, as opposed to only all locally square-integrable martingales; see § §IV. 18, 26 and VI. 36, 37 of [58]. The following is Theorem 37.8 on p. 389 of [58]. For a stochastic process M , let ∆M (t) ≡ M (t) − M (t−), the jump at t, for t ≥ 0.  There is also an alternative definition of the OQV; see Theorem 5.57 in §5.8 of [64].
where t n,i ≡ t ∧ (i2 −n ) and the mode of convergence for the limit as n → ∞ is understood to be in probability. The limit is independent of the way that the time points t n,i are selected within the interval [0, t], provided that t n,i > t n,i−1 and that the maximum difference t n,i − t n,i−1 for points inside the interval [0, t] goes to 0 as n → ∞.
Unfortunately, these two quadratic-variation processes M and [M ] associated with a locally square-integrable martingale M , and the more general covariation processes M 1 , M 2 and [M 1 , M 2 ], are somewhat elusive, since the definitions are indirect; it remains to exhibit these processes. We will exploit our sample-path constructions in terms of Poisson processes above to identify appropriate quadratic variation and covariation processes in following subsections.
Fortunately, however, the story about structure is relatively simple in the two cases of interest to us: (i) when the martingale is a compensated counting process, and (ii) when the martingale has continuous sample paths. The story in the second case is easy to tell: When M is continuous, M = [M ], and this (predictable and optional) quadratic variation process itself is continuous; see §VI.34 of [58]. This case applies to Brownian motion and our limit processes. For standard Brownian motion,

Counting Processes
The martingales we consider for our pre-limit processes will be compensated counting processes. By a counting process (or point process), we mean a stochastic process N ≡ {N (t) : t ≥ 0} with nondecreasing nonnegative-integer-valued sample paths in D and N (0) = 0. We say that N is a unit-jump counting process if all jumps are of size 1. We say that As discussed by Brémaud [12], the compensator of a non-explosive unit-jump counting process is typically (under regularity conditions!) a stochastic process with sample paths that are absolutely continuous with respect to Lebesgue measure, so that the compensator A can represented as an integral is adapted to the filtration F. When the compensator has such an integral representation, the integrand X is called the stochastic intensity of the counting process N .
We will apply the following extension of Theorem 3.1. We first observe that the conditions of Lemma 3.1 imply that N is a nonnegative F-submartingale, so that we can apply Theorem 3.1. We use the extra conditions to get more. We prove Lemma 3.1 in the Appendix.

First Martingale Representation
We start with the first sample-path construction in §2.1. As a regularity condition, here we assume that E[Q(0)] < ∞. We will show how to remove that condition later in §6.3. It could also be removed immediately if we chose to localize.
Here is a quick summary of how martingales enter the picture: The Poisson processes A(t) and S(t) underlying the first representation of the queueing model in §2.1 as well as the new processes A(λt) and S µ t 0 Q(s) ds there have nondecreasing nonnegative sample paths. Consequently, they are submartingales with respect to appropriate filtrations (histories, families of σfields). Thus, by subtracting the compensators, we obtain martingales. Then the martingale M so constructed turns out to be square integrable, admitting a martingale representation M 2 − M , where M is the predictable quadratic variation, which in our context will coincide with the compensator.
In constructing this representation, we want to be careful about the filtration (histories, family of σ fields). Here we will want to use the filtration F ≡ {F t : t ≥ 0} defined by augmented by including all null sets.
The following processes will be proved to be F-martingales: where here A refers to the arrival process. Hence, instead of (12), we have the alternate martingale representation In applications of Theorem 3.1 and Lemma 3.1, it remains to show that the conditions are satisfied and to identify the compensator. The following lemma fills in that step for a random time change of a rate-1 Poisson process by applying the optional stopping theorem. At this step, it is natural, as in §12 of Mandelbaum and Pats [48], to apply the optional stopping theorem for martingales indexed by directed sets (Theorem 2.8.7 on p. 87 of Ethier and Kurtz [19]) associated with multiparameter random time changes ( §6.2 on p. 311 of [19]), but here we can use a more elementary approach. For supporting theory at this point, see Theorem 17.24 and Proposition 7.9 of Kallenberg [35] and §7.4 of Daley and Vere-Jones [17]. We use the mutiparameter random time change in §7.1.
Proof. Since the sample paths of I are continuous, it is evident that S • I is a unit-jump counting process. By condition (22), it is non-explosive. In order to apply the optional stopping theorem, we now localize by letting is an F I -martingale; e.g., see p. 7 of [12] or p. 61 of [19]. As a consequence I m is the compensator of S • I m . Since we have the moment conditions in (22), we can let m ↑ ∞ and apply the monotone convergence theorem with conditioning, as on p. 280 of Brémaud, to deduce for all m, Lemma 3.1 implies the square-integrability and identifies the quadratic variation processes.
To apply Lemma 3.2 to our M/M/∞ queueing problem, we need to verify the finite-moment conditions in (22). For that purpose, we use a crude inequality: (12),

Lemma 3.3 (crude inequality) Given the representation
Now we want to show that our processes in (20) actually are martingales with respect to the filtration in (19). To do so, we will apply Lemma 3.2. However, to apply Lemma 3.2, we will first alter the filtration. In order to focus on the service completions in the easiest way, we initially condition on the entire arrival process, and consider the filtration augmented by including all null sets. Then, as in the statement of Lemma 3.2, we consider the associated filtration F 1 I ≡ {F 1 I(t) : t ≥ 0}. Finally, we are able to obtain the desired martingale result with respect to the desired filtration F in (19). (12), and the filtration being F 1 in (26). Then the conditions of Lemma 3.2 are satisfied, so that S • I − I is a square-integrable F 1 I -martingale with F 1 I -compensator I in (27). As a consequence, S • I − I is also a square-integrable F-martingale with F-compensator I in (27) for filtration F in (19).
Proof. First, we can apply the crude inequality in (25) to establish the required moment conditions: Then, by virtue of (12) and the recursive construction in Lemma 2.1, for each t ≥ 0, I(t) in (27) is a stopping time relative to F 1 x for all x ≥ 0, i.e., (This step is a bit tricky: To know I(t), we need to know Q(s), 0 ≤ s < t, but, by (2.2), that depends on I(s), 0 ≤ s < t. Hence, to know whether or not is a martingale with respect to F 1 and the moment conditions in (22) are satisfied, we can apply Lemma 3.2 to deduce that {S(I(t)) − I(t) : t ≥ 0} is a square-integrable martingale with respect to the filtration F 1 I ≡ {F 1 I(t) : t ≥ 0} augmented by including all null sets and that I in (27) is both the compensator and the predictable quadratic variation. Finally, since the process representing the arrivals after time t, i.e., the stochastic process {A(t + s) − A(t) : s ≥ 0}, is independent of Q(s), 0 ≤ s ≤ t, by virtue of the recursive construction in Lemma 2.1 (and the assumption that A is a Poisson process), we can replace the filtration F 1 I by the smaller filtration F in (19). That completes the proof. We now introduce corresponding processes associated with the sequence of models indexed by n. We have The filtrations change with n in the obvious way We now introduce the scaling, just as in (3). Let the scaled martingales be Then, from (28)-(30), we get Now we summarize this martingale representation for the scaled processes, depending upon the index n. Here is the implication of the analysis above: where M n,i are given in (29) and (30). These processes M n,i are square-integrable martingales with respect to the filtrations F n ≡ {F n,t : t ≥ 0} defined by augmented by including all null sets. Their associated predictable quadratic variations are M n,1 (t) = λ n t/n, t ≥ 0, and where Note that X n appears on both sides of the integral representation (32), but X n (t) appears on the left, while X n (s) for 0 ≤ s ≤ t appears on the right. In §4.1 we show how to work with this integral representation.

Second Martingale Representation
We can also start with the second sample-path construction and obtain another integral representation of the form (32). Now we start with the martingales: so that, instead of (12), we have the alternate representation where M A and M S are square-integrable martingales with respect to the filtra- again augmented by the null sets. Notice that this martingale representation is very similar to the martingale representation in (21). The martingales in (21) and (35) are different and the filtrations in (19) and (36) are different, but the predictable quadratic variations are the same and the form of the integral representation is the same. Thus, there is an analog of Theorem 3.4 in this setting. We now provide theoretical support for the claims above. First, we put ourselves in the setting of Lemma 3.1.
Proof. It is immediate that Y is a counting process with unit jumps, but there is some question about integrability To establish integrability, we apply the crude inequality (24) to get Given Lemmas 3.1 and 3.5, it only remains to identify the compensator of the counting process Y , which we callỸ since A is used to refer to the arrival process. For that purpose, we can apply the integration theorem, as on p. 10 of Brémaud [12]. But our process Y in (37) is actually a sum involving infinitely many Poisson processes, so we need to be careful.
Proof. As indicated above, we can apply the integration theorem on p. 10 of Brémaud, but we have to be careful because Y involves infinitely many Poisson processes. Hence we first consider the first n terms in the sum. With that restriction, since the integrand is an indicator function for each k, we consider the integral of a bounded predictable process with respect to the martingale { n k=1 (S µ,k (t)−µt) : t ≥ 0}, which is a martingale of "integrable bounded variation," as required (and defined) by Brémaud. As a consequence, However, given that E[Y (t)] < ∞, we can apply the monotone convergence theorem to each of the two terms in (39) in order to take the limit as n → ∞ to deduce that E[Ỹ (t)] < ∞ and M S itself, as defined in (34), is an F-martingale, which implies that the compensator of Y in (37) is indeed given by (38).
By this route we obtain another integral representation for the scaled processes of exactly the same form as in Theorem 3.4. As before in (28)-(31), we introduce the sequence of models indexed by n. The martingales and filtrations are slightly different, but in the end the predictable quadratic variation processes are essentially the same.
where M n,i are given in (30), but instead of (29), we have These processes M n,i are square-integrable martingales with respect to the filtra- augmented by including all null sets. Their associated predictable quadratic variations are M n,1 (t) = λ n t/n, t ≥ 0, and where E[ M n,2 (t)] < ∞ for all t ≥ 0 and n ≥ 1 and M n,1 (t) = (λ n t/n) = µt. The associated optional quadratic variations are [M n,1 ](t) = A λn (t)/n, t ≥ 0, and

Third Martingale Representation
We can also obtain a martingale representation for the stochastic process Q by exploiting the fact that the stochastic process Q is a birth-and-death process. We have the basic representation where A is the arrival process and D is the departure process, as in (12). Since Q is a birth-and-death process, we can apply the Lévy and Dynkin formulas, as on p. 294 of Brémaud [12] to obtain martingales associated with various counting processes associated with Q, including the counting processes A and D. Of course, A is easy, but the Dynkin formula immediately yields the desired martingale for D: where the compensator of M D is just as in the first two martingale representation, i.e., as in (20), (27). (34) and (38); see pp 6 and 294 of [12]. We thus again obtain the martingale representation of the form (21) and (35). Here, however, the filtration can be taken to be The proof of Theorem 1.1 is then the same as for the second representation, which will be by an application of the martingale FCLT in §8.

Fourth Martingale Representation
In this section, we present the martingale representation for the construction in terms of arrival and service times in §2.3, but without any proofs. Consider a sequence of G/GI/∞ queues indexed by n and let Q n (0), Q n , A n , and D n be the corresponding quantities in the n th queueing system, just as defined in §2.3. For any cdf F , let the associated complementary cdf (ccdf) be F c ≡ 1 − F .
Given representation (17), the Krichagina and Puhalskii [37] insight is to write the process Q n as where is a sequential empirical process (a random field, having two parameters), so that The division by n in (44) provides a law-of-large-numbers (LLN) or fluidlimit scaling. To proceed, we define associated queueing processes with LLN scaling. In particular, define the normalized processesQ n ≡ {Q n (t), t ≥ 0}, For our general arrival process, we assume thatĀ n (t) → a(t) ≡ µt w.p.1 as n → ∞. For the M/M/∞ special case, that follows from (1).
Next write equation (45) as Now we introduce stochastic processes with central-limit-theorem (CLT) scaling. In particular, let Then so that the process Q n in (43) can be written as and In contrast to previous representations, note that, except for Q n (0) which can be regarded as known, Q n (s) for s < t does not appear on the righthand side of representation (48). Instead of the integral representations in Theorems 3.4 and 3.5, here we have a direct expression of Q n (t) in terms of other model elements, but we will see that some of these model elements in turn do have integral representations.
By equations (46) and (48), we havē Q n (t) = 1 n From equation (51), we can prove the following FWLLN. We remark that we could allow more general limit functions a for the LLN-scaled arrival process.
Let the scaled process X n be defined by If q(t) = 1 for t ≥ 0, then (53) coincides with (3). By equations (53), (51) and (52), we obtain the following theorem for the scaled processes.
Theorem 3.7 (fourth martingale representation for the scaled processes) The scaled process X n in (53) has the representation where M n,1 and M n,2 are defined as in (49) and (50), respectively.
The situation is more complicated here, because the processes M n,1 and M n,2 in (49) and (50) are not naturally martingales for the G/GI/∞ model or even the M/M/∞ special case, with respect to the obvious filtration, but they can be analyzed by martingale methods. In particular, associated martingales can be exploited to establish stochastic-process limits. In particular, the proof of the FCLT for the processes X n in (53) -see §6.4 -exploits semimartingale decompositions of the following two-parameter process U n ≡ {U n (t, x), t ≥ 0, 0 ≤ x ≤ 1} (and related martingale properties): where the ζ i are independent and uniformly distributed on [0, 1]. Extending Bickel and Wichura [7], Krichagina and Puhalskii [37] proved that the sequence of processes {U n , n ≥ 1} converges in distribution to the Kiefer process U in D([0, ∞), D([0, 1])). For properties of Kiefer processes, we refer to Csörgó M. and P. Révéz [13] and Khoshnevisan [35]. The importance of the Kiefer process for infinite-server queues was evidently first observed by Louchard [45].
The process U n has the following semimartingale decomposition (See Chapter IX of Jacod and Shiryaev [30]): is a square-integrable martingale relative to the filtration F n = i≤⌊nt⌋ F i (x) and Hence V n (t, x) = U n (a n (t), F (x)) can be written as and In closing this subsection, we remark that an associated representation holds for the two-parameter process Q(t, y) in (17). Let the associated scaled twoparameter process be defined by X n (t, y) ≡ √ n(Q n (t, y) − q(t, y)), t ≥ 0, whereQ n (t, y) ≡ Q n (t, y)/n,Q n ⇒ q as n → ∞ and Corollary 3.1 (associated representation for the scaled two-parameter processes) Paralleling (54), the scaled process in (56) has the representation where, paralleling (49) and (50),

Main
Steps in the Proof of Theorem 1.1 In this section we indicate the main steps in the proof of Theorem 1.1 starting from one of the first three martingale representations in the previous section. First, in §4.1 we show that the integral representation appearing in both Theorems 3.4 and 3.5 has a unique solution, so that it constitutes a continuous function from D to D. Next, in §4.2 we show how the limit can be obtained from the functional central limit theorem for the Poisson process and the continuous mapping theorem, but a fluid limit (Lemma 4.2 or Lemma 4.3) remains to be verified. In §4.3 we show how the proof can be completed without martingales by directly establishing that associated fluid limit. In § §5-6 we show how martingales can achieve the same result. In §6.4 we indicate how to complete the proof with the fourth martingale representation.

Continuity of the Integral Representation
We apply the continuous-mapping theorem (CMT) with the integral representation in (32) and (40) in order to establish the desired convergence; for background on the CMT, see §3.4 of [69]. In subsequent sections we will show that the scaled martingales converge weakly to independent Brownian motions, i.e., where B 1 and B 2 are two independent standard Brownian motions, from which an application of the CMT with subtraction yields where B is a single standard Brownian motion. We then apply the CMT with the function f : D × R → D taking (y, b) into x determined by the integral representation In the pre-limit, the function y in (61) is played by M n,1 − M n,2 ≡ {M n,1 (t) − M n,2 (t) : t ≥ 0} in (60), while b is played by X n (0). In the limit, the function y in (61) is played by the limit √ 2µB in (60), while b is played by X(0). (The constant b does not play an essential role in (61); it is sometimes convenient when we want to focus on the solution x as a function of the initial conditions.) For our application, the limiting stochastic process in (60) has continuous sample paths. Moreover, the function f in (61) maps continuous functions into continuous functions, as we show below. Hence, it suffices to show that the map f : D × R → D is measurable and continuous at continuous limits. Since the limit is necessarily continuous as well, the required continuity follows from continuity when the function space D appearing in both the domain and the range is endowed with the topology of uniform convergence on bounded intervals. However, if we only establish such continuity, then that leaves open the issue of measurability. It is significant that the σ field on D generated by the topology of uniform convergence on bounded intervals is not the desired customary σ field on D, which is generated by the coordinate projections or by any of the Skorohod topologies; see §11.5 of [69] and §18 of Billingsley [8]. We prove measurability with respect to the appropriate σ field on D (generated by the J 1 topology) by proving continuity when the function space D appearing in both the domain and the range is endowed with the Skorohod J 1 topology. That implies the required measurability. At the same time, of course, it provides continuity in that setting.
We now establish the basic continuity result. We establish a slightly more general form than needed here in order to be able to treat other cases. In particular, we introduce a Lipschitz function h : R → R; i.e., we assume that there exists a constant c > 0 such that We apply the more general form to treat the Erlang A model in §7.1. Theorem 7.3 in §7.2 involves an even more general version in which h : D → D.

Theorem 4.1 (continuity of the integral representation) Consider the integral representation
where h : R → R satisfies h(0) = 0 and is a Lipschitz function as defined in (62). The integral representation in (63) has a unique solution x, so that the integral representation constitutes a function f : D × R → D mapping (y, b) into x ≡ f (y, b). In addition, the function f is continuous provided that the function space D (in both the domain and range) is endowed with either: (i) the topology of uniform convergence over bounded intervals or (ii) the Skorohod J 1 topology. Moreover, if y is continuous, then so is x.
Proof. If y is a piecewise-constant function, then we can directly construct the solution x of the integral representation by doing an inductive construction, just as in Lemma 2.1. Since any element y of D can be represented as the limit of piecewise-constant functions, where the convergence is uniform over bounded intervals, using endpoints that are continuity points of y, we can then extend the function f to arbitrary elements of D, exploiting continuity in the topology of uniform convergence over bounded intervals, shown below. Uniqueness follows from the fact that the only function x in D satisfying the inequality is the zero function, which is a consequence of Gronwall's inequality, which we re-state in Lemma 4.1 below in the form needed here. For the remainder of the proof, we apply Gronwall's inequality again. We introduce the norm ||x|| T ≡ sup 0≤t≤T |x(t)| .
First consider the case of the topology of uniform convergence over bounded intervals. We need to show that, for any ǫ > 0, there exists a δ > 0 such that ||x 1 − x 2 || T < ǫ when |b 1 − b 2 | + ||y 1 − y 2 || T < δ, where (y i , x i ) are two pairs of functions satisfying the relation (63). From (63), we have Hence it suffices to let δ = ǫe −cT . We now turn to the Skorohod J 1 topology; see § §3.3 and 11.5 and Chapter 12 of [69] for background. To treat this non-uniform topology, we will use the fact that the function x is necessarily bounded. That is proved later in Lemma 5.5. We want to show that x n → x in D([0, ∞), R, J 1 ) when b n → b in R and y n → y in D([0, ∞), R, J 1 ). For y given, let the interval right endpoint T be a continuity point of y. Then there exist increasing homeomorphisms λ n of the interval [0, T ] such that y n − y • λ n T → 0 and λ n − e T → 0 as n → ∞. Moreover, it suffices to consider homeomorphisms λ n that are absolutely continuous with respect to Lebesgue measure on [0, T ] having derivativesλ n satisfying λ n − 1 T → 0 as n → ∞. The fact that the topology is actually unchanged is a consequence of Billingsley's equivalent complete metric d 0 on pp 112-114 of Billingsley [8]. Hence, for y given, let M ≡ sup 0≤t≤T {|x(t)|}. Since h in (62) is Lipschitz, we have
Finally, for the inheritance of continuity, note that Since x is bounded over [0, T ], x is continuous if y is continuous. In our case we can simply let h(s) = µs, but we will need the more complicated function h in (63) and (62) in §7.1. To be self-contained, we now state a version of Gronwall's inequality; see p. 498 of [19]. See §11 of [46] for other versions of Gronwall's inequality.

Lemma 4.1 (version of Gronwall's inequality) Suppose that
for some positive finite ǫ and M . Then It thus remains to establish the limit in (59). Our proof based on the first martingale representation in Theorem 3.4 relies on a FCLT for the Poisson process and the CMT with the composition map. The application of the CMT with the composition map requires a fluid limit, which requires further argument. That is contained in subsequent sections.

Poisson FCLT Plus the CMT
As a consequence of the last section, it suffices to show that the scaled martingales converge, as in (59). From the martingale perspective, it is natural to achieve that goal by directly applying the martingale FCLT, as in §7.1 of Ethier and Kurtz [19], and as reviewed here in §8, and that works. In particular, the desired limit (59) follows from Theorems 3.4 and 8.1 (ii) (or Theorems 3.5 and 8.1 (ii)) plus Lemma 4.2 below. Lemma 4.2 shows that the scaled predictable quadratic variation processes in (33) and (41)  However, starting with the first martingale representation in Theorem 3.4, we do not need to apply the martingale FCLT. Instead, we can justify the martingale limit in (59) by yet another application of the CMT, using the composition map associated with the random time changes, in addition to a functional central limit theorem (FCLT) for scaled Poisson processes. Our approach also requires establishing a limit for the sequence of scaled predictable quadratic variations associated with the martingales, so the main steps of the argument become the same as when applying the martingale FCLT.
The FCLT for Poisson processes is a classical result. It is a special case of the FCLT for a renewal process, appearing as Theorem 17.3 in Billingsley [8]. It and generalizations are also discussed extensively in [69]; see § §6.3, 7.3, 7.4, 13.7 and 13.8. The FCLT for a Poisson process can also be obtained via a strong approximation, as was done by Kurtz [39], Mandelbaum and Pats [47,48] and Mandelbaum, Massey and Reiman [46]. Finally, the FCLT for a Poisson process itself can be obtained as an easy application of the martingale FCLT, as we show in §8.
We start with the scaled Poisson processes We employ the following basic FCLT: Since A and S are independent rate-1 Poisson processes, we have

Theorem 4.2 (FCLT for independent Poisson processes) If A and S are independent rate-1 Poisson processes, then
where M A,n and M S,n are the scaled processes in (65), while B 1 and B 2 are independent standard Brownian motions.
We can prove the desired limit in (59) for both martingale representations, but we will only give the details for the first martingale representation in Theorem 3.4. In order to get the desired limit in (59), we introduce a deterministic and a random time change. For that purpose, let e : [0, ∞) → [0, ∞) be the identity function in D, defined by e(t) ≡ t for t ≥ 0. Then let We will establish the following fluid limit, which can be regarded as a functional weak law of large numbers (FWLLN). Here below, and frequently later, we have convergence in distribution to a deterministic limit; that is equivalent to convergence in probability; see p. 27 of [8].
where Φ S,n is defined in (67).
For that purpose, it suffices to establish another more basic fluid limit. Consider the stochastic process Let ω be the function that is identically 1 for all t.
We thus have the following result Lemma 4.4 (all but the fluid limit) If the limit in (70) holds, then as required to complete the proof of Theorem 1.1.
Proof. From the limit in (66), the desired fluid limit in (68) and Theorem 11.4.5 of [69], it follows that as n → ∞. By basic properties of Brownian motion, It thus remains to establish the key fluid limit in Lemma 4.3. In the next section we show how to do that directly, without martingales, by applying the continuous mapping provided by Theorem 4.1 in the fluid scale or, equivalently, by applying Gronwall's inequality again. We would stop there if we only wanted to analyze the M/M/∞ model, but in order to illustrate other methods used in Krichagina and Puhalskii [37] and Puhalskii and Reiman [53], we also apply martingale methods. Thus, in the subsequent four sections we show how to establish that fluid limit using martingales. Here is an outline of the remaining martingale argument: This alternate route to the fluid limit is much longer, but all the steps might be considered well known. We remark that the fluid limit seems to be required by any of the proofs, including by the direct application of the martingale FCLT.

Fluid Limit Without Martingales
In this section we prove Lemma 4.3 without using martingales. We do so by establishing a stochastic-process limit in the fluid scale which is similar to the corresponding stochastic-process limit with the more refined scaling. This is a standard line of reasoning for heavy-traffic stochastic-process limits; e.g., see the proofs of Theorems 9.3.4, 10.2.3 and 14.7.4 of Whitt [69]. The specific argument here follows §6 of Mandelbaum and Pats [47]. With this approach, even though we exploit the martingale representations, we do not need to mention martingales at all. We are only applying the continuous mapping theorem.
By essentially the same reasoning as in §3.4, we obtain a fluid-scale analog of (31) and (32): with M * n,i (t) defined in (29). Notice that the limitX where is equivalent to the desired conclusion of Lemma 4.3. Hence we will prove the fluid limit in (77). The assumed limit in (4) implies thatX n (0) ⇒ 0 in R as n → ∞. We can apply Theorem 4.1 or directly Gronwall's inequality in Lemma 4.1 to deduce the desired limit (77) if we can establish the following lemma.
Proof of Lemma 4.5. We can apply the SLLN for the Poisson process, which is equivalent to the more general functional strong law of large numbers (FS-LLN); see §3.2 of [70]. (Alternatively, we could apply the FWLLN, which is a corollary to the FCLT.) First, the SLLN for the Poisson process states that as n → ∞ for each T with 0 < T < ∞. We thus can treatM n,1 directly. To treat M n,2 , we combine (80) with the crude inequality in (25) and the representation in (75)-(76) in order to obtain the desired limit (79). To elaborate, the crude inequality in (25) implies that, for any T 1 > 0, there exists T 2 such that P µ n T1 0 Q n (s) ds > T 2 → 0 as n → ∞ .
That provides the key, because

Tightness
As indicated at the end of §4.2, we can also use a stochastic-boundedness argument in order to establish the desired fluid limit. Since stochastic boundedness is closely related to tightness, we start by reviewing tightness concepts. In the next section we apply the tightness notions to stochastic boundedness. The next three sections contain extra material not really needed for the current proofs. Additional material on tightness criteria appears in Whitt [73].
We work in the setting of a complete separable metric space (CSMS), also known as a Polish space; see § §13 and 19 of Billingsley [8], § §3.8-3.10 of Ethier and Kurtz [19] and § §11.1 and 11.2 of [69]. (The space D k ≡ D([0, ∞), R) k is made a CSMS in a standard way and the space of probability measures on D k becomes a CSMS as well.) Key concepts are: closed, compact, tight, relatively compact and sequentially compact. We assume knowledge of metric spaces and compactness in metric spaces.

Definition 5.1 (tightness) A set A of probability measures on a metric space
S is tight if, for all ǫ > 0, there exists a compact subset K of S such that A set of random elements of the metric space S is tight if the associated set of their probability laws on S is tight. Consequently, a sequence {X n : n ≥ 1} of random elements of the metric space S is tight if, for all ǫ > 0, there exists a compact subset K of S such that P (X n ∈ K) > 1 − ǫ for all n ≥ 1 .
Since a continuous image of a compact subset is compact, we have the following lemma. Proof. As before, let • be used for composition: (f • g)(x) ≡ f (g(x)). For any function f : S → S ′ and any subset A of S, A ⊆ f −1 • f (A). Let ǫ > 0 be given. Since {X n : n ≥ 1} is a tight sequence of random elements of the metric space S, there exists a compact subset K of S such that P (X n ∈ K) > 1 − ǫ for all n ≥ 1 .
Then f (K) will serve as the desired compact set in S ′ , because We next observe that on products of separable metric spaces tightness is characterized by tightness of the components; see §11.4 of [69].
Lemma 5.2 (tightness on product spaces) Suppose that {(X n,1 , . . . , X n,k ) : n ≥ 1} is a sequence of random elements of the product space S 1 × · · · × S k , where each coordinate space S i is a separable metric space. The sequence {(X n,1 , . . . , X n,k ) : n ≥ 1} is tight if and only if the sequence {X n,i : n ≥ 1} is tight for each i, 1 ≤ i ≤ k.
Proof. The implication from the random vector to the components follows from Lemma 5.1 because the component X n,i is the image of the projection map π i : S 1 × · · · × S k → S i taking (x 1 , . . . , x k ) into x i , and the projection map is continuous. Going the other way, we use the fact that for all subsets A i ⊆ S i . Thus, for each i and any ǫ > 0, we can choose K i such that P (X n,i / ∈ K i ) < ǫ/k for all n ≥ 1. We then let K 1 × · · · × K k be the desired compact for the random vector. We have P ((X n,1 , . . . , Tightness goes a long way toward establishing convergence because of Prohorov's theorem. It involves the notions of sequential compactness and relative compactness. Definition 5.2 (relative compactness and sequential compactness) A subset A of a metric space S is relatively compact if every sequence {x n : n ≥ 1} from A has a subsequence that converges to a limit in S (which necessarily belongs to the closureĀ of A).
We can now state Prohorov's theorem; see §11.6 of [69]. It relates compactness of sets of measures to compact subsets of the underlying sample space S on which the probability measures are defined.

Theorem 5.1 (Prohorov's theorem) A subset of probability measures on a CSMS is tight if and only if it is relatively compact.
We have the following elementary corollaries: Corollary 5.1 (convergence implies tightness) If X n ⇒ X as n → ∞ for random elements of a CSMS, then the sequence {X n : n ≥ 1} is tight.

Corollary 5.2 (individual probability measures) Every individual probability measure on a CSMS is tight.
As a consequence of Prohorov's Theorem, we have the following method for establishing convergence of random elements: In other words, once we have established tightness, it only remains to show that the limits of all converging subsequences must be the same. With tightness, we only need to uniquely determine the limit. When proving Donsker's theorem, it is natural to uniquely determine the limit through the finite-dimensional distributions. Convergence of all the finite-dimensional distributions is not enough to imply convergence on D, but it does uniquely determine the distribution of the limit; see pp 20 and 121 of Billingsley [8] and Example 11.6.1 in [69].
This approach is applied to prove the martingale FCLT stated in §8; see [73]. In the martingale setting it is natural instead to use the martingale characterization of Brownian motion, originally established by Lévy [42] and proved by Ito's formula by Kunita and Watanabe [38]; see p. 156 of Karatzas and Shreve [33], and various extensions, such as to continuous processes with independent Gaussian increments, as in Theorem 1.1 on p. 338 of Ethier and Kurtz [19]. A thorough study of martingale characterizations appears in Chapter 4 of Liptser and Shiryayev [44] and in Chapters VIII and IX of Jacod and Shiryayev [30].
We have not discussed conditions to have tightness; they are reviewed in [73].

Stochastic Boundedness
We start by defining stochastic boundedness and relating it to tightness. We then discuss situations in which stochastic boundedness is preserved. Afterwards, we give conditions for a sequence of martingales to be stochastically bounded in D involving the stochastic boundedness of appropriate sequences of R-valued random variables. Finally, we show that the FWLLN follows from stochastic boundedness.

Connection to Tightness
For random elements of R and R k , stochastic boundedness and tightness are equivalent, but tightness is stronger than stochastic boundedness for random elements of the functions spaces C and D (and the associated product spaces C k and D k ). The notions of tightness and stochastic boundedness thus agree for random elements of R k , but these notions differ for stochastic processes. For a function For random elements of D k , tightness is a strictly stronger concept than stochastic boundedness. Tightness of {X n } in D k implies stochastic boundedness, but not conversely; see §15 of Billingsely [8]. However, stochastic boundedness is sufficient for us, because it alone implies the desired fluid limit.

Preservation
We have the following analog of Lemma 5.2, which characterizes tightness for sequences of random vectors in terms of tightness of the associated sequences of components. Proof. Assume that we are using the maximum norm on product spaces. We can apply Lemma 5.2 after noticing that for each element (x 1 , . . . , x k ) of D k . Since other norms are equivalent, the result applies more generally.
Lemma 5.4 (stochastic boundedness in D k for sums) Suppose that Y n (t) ≡ X n,1 (t) + · · · + X n,k (t), t ≥ 0, for each n ≥ 1, where {(X n,1 , . . . , X n,k ) : n ≥ 1} is a sequence of random elements of the product space D k ≡ D × · · · × D. If {X n,i : n ≥ 1} is stochastically bounded in D for each i, 1 ≤ i ≤ k, then the sequence {Y n : n ≥ 1} is stochastically bounded in D.
Note that the converse is not true: We could have k = 2 with X n,2 (t) = −X n,1 (t) for all n and t. In that case we have Y n (t) = 0 for all X n,1 (t).
We now provide conditions for the stochastic boundedness of integral representations such as (32).

Lemma 5.5 (stochastic boundedness for integral representations) Suppose that
where h is a Lipschitz function as in (62) and (X n (0), (Y n,1 , . . . , Y n,k )) is a random element of R × D k for each n ≥ 1. If the sequences {X n (0) : n ≥ 1} and {Y n,i : n ≥ 1} are stochastically bounded (in R and D, respectively,) for 1 ≤ i ≤ k, then the sequence {X n : n ≥ 1} is stochastically bounded in D.

Stochastic Boundedness for Martingales
We now provide ways to get stochastic boundedness for sequences of martingales in D from associated sequences of random variables. Our first result exploits the classical submartingale-maximum inequality; e.g., see p. 13 of Karatzas and Shreve [33]. We say that a function f : Lemma 5.6 (SB from the maximum inequality) Suppose that, for each n ≥ 1, M n ≡ {M n (t) : t ≥ 0} is a martingale (with respect to a specified filtration) with sample paths in D. Also suppose that, for each T > 0, there exists an even nonnegative convex function f : R → R with first derivative f ′ (t) > 0 for t > 0 (e.g., f (t) ≡ t 2 ), there exists a positive constant K ≡ K(T, f ), and there exists an integer n 0 ≡ n 0 (T, f, K), such that Then the sequence of stochastic processes {M n : n ≥ 1} is stochastically bounded in D.
Proof. Since any set of finitely many random elements of D is automatically tight, Theorem 1.3 of Billingsley [8], it suffices to consider n ≥ n 0 . Since f is continuous and f ′ (t) > 0 for t > 0, t > c if and only if f (t) > f (c) for t > 0. Since f is even, for all t, 0 ≤ t ≤ T . Since these moments are finite and f is convex, the stochastic process {f (M n (t)) : 0 ≤ t ≤ T } is a submartingale for each n ≥ 1, so that we can apply the submartingale-maximum inequality to get for all n ≥ n 0 . Since f (c) → ∞ as c → ∞, we have the desired conclusion. We now establish another sufficient condition for stochastic boundedness of square-integrable martingales by applying the Lenglart-Rebolledo inequality; see p. 66 of Liptser and Shiryayev [44] or p. 30 of Karatzas and Shreve [33].
Then, for all c > 0 and d > 0, As a consequence we have the following criterion for stochastic boundedness of a sequence of square-integrable martingales. Then for that determined d, choose c such that d/c 2 < ǫ/2. By the Lenglart-Rebolledo inequality (81), these two inequalities imply that P sup 0≤t≤T {|M n (t)|} > c < ǫ .

FWLLN from Stochastic Boundedness
We will want to apply stochastic boundedness in D to imply the desired fluid limit in Lemmas 4.2 and 4.3. The fluid limit corresponds to a functional weak law of large numbers (FWLLN).
Lemma 5.9 (FWLLN from stochastic boundedness in D k ) Let {X n : n ≥ 1} be a sequence of random elements of D k . Let {a n : n ≥ 1} be a sequence of positive real numbers such that a n → ∞ as n → ∞. If the sequence {X n : n ≥ 1} is stochastically bounded in D k , then X n a n ⇒ η in D k as n → ∞ , where η(t) ≡ (0, 0, . . . , 0), t ≥ 0.
Proof. As specified in Definition 5.4, stochastic boundedness of the sequence {X n : n ≥ 1} in D k corresponds to stochastic boundedness of the associated sequence { X n T : n ≥ 1} in R for each T > 0. By Definition 5.3, stochastic boundedness in R is equivalent to tightness. It is easy to verify directly that we then have tightness (or, equivalently, stochastic boundedness) for the associated sequence { X n T /a n : n ≥ 1} in R. By Prohorov's theorem, Theorem 5.1, tightness on R (or any CSMS) is equivalent to relative compactness. Hence consider a convergent subsequence { X n k T /a n k : k ≥ 1} of the sequence { X n T /a n : n ≥ 1} in R: X n k T /a n k → L as k → ∞. It suffices to show that P (L = 0) = 1; then all convergent subsequences will have the same limit, which implies convergence to that limit. For that purpose, consider the associated subsequence { X n k T : k ≥ 1} in R. It too is tight. So by Prohorov's theorem again, it too is relatively compact. Thus there exists a convergent subsubsequence: X n k l T ⇒ L ′ in R. It follows immediately that This can be regarded as a consequence of the generalized continuous mapping theorem (GCMT), Theorem 3.4.4 of [69], which involves a sequence of continuous functions: Consider the functions f n : R → R defined by f n (x) ≡ x/a n , n ≥ 1, and the limiting zero function f : R → R defined by f (x) ≡ 0 for all x. It is easy to see that f n (x n ) → f (x) ≡ 0 whenever x n → x in R. Thus the GCMT implies the limit in (83). Consequently, the limit L we found for the subsequence { X n k T /a n k : k ≥ 1} must actually be 0. Since that must be true for all convergent subsequences, we must have the claimed convergence in (82).

Completing the Proof of Theorem 1.1
In the next three subsections we complete the proof of Theorem 1.1 by the martingale argument, as outlined at the end of §4.2. In §6.1 we show that the required fluid limit in Lemma 4.3 follows from the stochastic boundedness of the the sequence of stochastic processes {X n : n ≥ 1}. In §6.2 we finish the proof of the fluid limit by proving that the associated predictable quadratic variation processes are stochastically bounded. Finally, in §6.3 we show how to remove the condition on the initial conditions.

Fluid Limit from Stochastic Boundedness in D
We now show how to apply stochastic boundedness to imply the desired fluid limit in Lemma 4.3. Here we simply apply Lemma 5.9 with a n ≡ √ n for n ≥ 1 to our particular sequence of stochastic processes {X n : n ≥ 1} in (3) or (32). Lemma 6.1 (application to queueing processes) Let X n be the random elements of D defined in (3) or (32). If the sequence {X n : n ≥ 1} is stochastically bounded in D, then where ω(t) ≡ 1, t ≥ 0. Equivalently, Proof. As a consequence of Lemma 5.9, From the stochastic boundedness of {X n } in D, we will obtain where η is the zero function defined above. This limit is equivalent to (84) (29), (30) and (59). First, by Lemma 5.8, the stochastic boundedness of these two sequences of PQV random variables implies stochastic boundedness in D of the two sequences of scaled martingales {M n,1 : n ≥ 1} and {M n,2 : n ≥ 1} in (30). That, in turn, under condition (4), implies the stochastic boundedness in D of the sequence of scaled queue-length processes {X n : n ≥ 1} in (32) and (40) by Lemma 5.5. Finally, by Lemma 5.9 the stochastic boundedness of {X n } in D implies the required fluid limit in (84). The fluid limit in (84) was just what was needed in Lemmas 4.2 and 4.3.

Stochastic Boundedness of the Quadratic Variations
We have observed at the end of the last section that it only remains to establish the stochastic boundedness of two sequences of PQV random variables: { M n,1 (t) : n ≥ 1} and { M n,2 (t) : n ≥ 1} for any t > 0. For each n ≥ 1, the associated stochastic processes { M n,1 (t) : t ≥ 0} and { M n,2 (t) : t ≥ 0} are the predictable quadratic variations (compensators) of the scaled martingales M n,1 and M n,2 in (33). We note that the stochastic boundedness of { M n,1 (t) : t ≥ 0} is trivial because it is deterministic. Proof. It suffices to apply the crude inequality in Lemma 3.3:

The Initial Conditions
The results in § §3-6.1 have used the finite moment condition E[Q n (0)] < ∞, but we want to establish Theorem 1.1 without this condition. In particular, note that this moment condition appears prominently in Lemmas 3.4 and 3.5 and in the resulting final martingale representations for the scaled processes, e.g., as stated in Theorems 3.4 and 3.5.
However, for the desired Theorem 1.1, we do not need to directly impose this moment condition. We do need the assumed convergence of the scaled initial conditions in (4), but we can circumvent this moment condition by defining bounded initial conditions that converge to the same limit. For that purpose, we can work withQ Then, for each n ≥ 1, we useQ n (0) andX n (0) instead of Q n (0) and X n (0). We then obtain Theorem 1.1 for these modified processes:X n ⇒ X in D as n → ∞. However, P (X n =X n ) → 0 as n → ∞. Hence, we have X n ⇒ X as well.

Limit from the Fourth Martingale Representation
In this section we state the FCLT limit for the general G/GI/∞ queue stemming from the fourth martingale representation in §3.7, but we omit proofs and simply refer to Krichagina and Puhalskii [37]. (We do not call the FCLT a diffusion limit because the limit is not a diffusion process.) Even in the M/M/∞ special case, the limit process has a different representation from the representation in Theorem 1.1. We will show that the two representations are actually equivalent.
Theorem 6.1 (FCLT from the fourth martingale representation) Let X n be defined in (53) and letÂ n be defined in (46) and (47). If X n (0) ⇒ X(0) in R andÂ n ⇒ Z in D, where Z is a process with continuous sample paths and For M/M/∞ queues, Z = √ µB, where B is a standard Brownian motion and the limit process is Remark 6.1 A corresponding limit holds for the two-parameter processes X n ≡ {X n (t, y)} in Corollary 3.1 by a minor variation of the same argument.
Connection to Theorem 1.1. We now show that the two characterizations of the limit X in (87) and (5) are equivalent for the M/M/∞ special case. For that purpose, express the process X in (87) as , Clearly, Z 1 (t) can be written as By the solution to the linear SDE, §5.6 in [33], we have where B is again standard Brownian motion.
Recall -see §5.6.B, [33] -that the Brownian bridge W 0 is the unique strong solution to the one-dimensional SDE where B 2 is second independent standard Brownian motion. So we can write and it follows that Paralleling (90), the Kiefer process U is related to the Brownian sheet W ≡ Hence, for t ≥ 0, x ≥ 0, we have Next, by similar reasoning, it can be shown that By (88), (89), (91) and (93), we have where B and B 2 are independent standard Brownian motions. LetB be the sum of the last three components in (94), i.e., It is evident that the processB is a continuous Gaussian process with mean 0 and, for s < t,

Other Models
In this section we discuss how to treat other models closely related to our initial M/M/∞ model. We consider the Erlang-A model in §7.1; we also consider limits for the waiting time there. We consider finite waiting rooms in §7.2. Finally, we indicate how to treat general non-Poisson arrival processes in §7.3.

Erlang A Model
In this section we prove the corresponding many-server heavy-traffic limit for the M/M/n/∞ + M (or Erlang-A or Palm) model in the QED regime. As before, the arrival rate is λ and the individual service rate is µ. Now there are n servers and unlimited waiting room with the FCFS service discipline. Customer times to abandon are i.i.d. exponential random variables with a mean of 1/θ. Thus individual customers waiting in queue abandon at a constant rate θ. The Erlang-C model arises when there is no abandonment, which occurs when θ = 0. The Erlang C model is covered as a special case of the result below.
Let Q(t) denote the number of customers in the system at time t, either waiting or being served. It is well known that the stochastic process Q ≡ {Q(t) : t ≥ 0} is a birth-and-death stochastic process with constant birth rate λ k = λ and state-dependent death rate µ k = (k ∧ n)µ + (k − n) + θ, k ≥ 0, where a ∧ b ≡ min {a, b}, a ∨ b ≡ max {a, b} and (a) + ≡ a ∨ 0 for real numbers a and b.
As in Theorem 1.1, the many-server heavy-traffic limit involves a sequence of Erlang-A queueing models. As before, we let this sequence be indexed by n, but now this n coincides with the number of servers. Thus now we are letting the number of servers be finite, but then letting that number grow. At the same time, we let the arrival rate increase. As before, we let the arrival rate in model n be λ n , but now we stipulate that λ n grows with n. At the same time, we hold the individual service rate µ and abandonment rate θ fixed. Let ρ n ≡ λ n /nµ be the traffic intensity in model n. We stipulate that where β a (finite) real number. That is equivalent to assuming that as in (7). Conditions (95) and (96) are known to characterize the QED manyserver heavy-traffic regime; see Halfin and Whitt [26] and Puhalskii and Reiman [53]. The many-server heavy-traffic limit theorem for the Erlang-A model was proved by Garnett, Mandelbaum and Reiman [21], exploiting Stone [62]. See Zeltyn and Mandelbaum [74] and Mandelbaum and Zeltyn [49] for extensions and elaboration. For related results for single-server models with customer abandonment, see Ward and Glynn [65,66,67].
Here is the QED many-server heavy-traffic limit theorem for the M/M/n/∞+ M model: . Let X n be as defined in (3). If X n (0) ⇒ X(0) in R as n → ∞, then X n ⇒ X in D as n → ∞, where X is the diffusion process with infinitesimal mean m(x) = −βµ − µx for x < 0 and m(x) = −βµ − θx for x > 0, and infinitesimal variance σ 2 (x) = 2µ. Alternatively, the limit process X satisfies the stochastic integral equation for t ≥ 0, where B is a standard Brownian motion. Equivalently, X satisfies the SDE Proof. In the rest of this section we very quickly summarize the proof; the argument mostly differs little from what we did before. Indeed, if we use the second martingale representation as in § §2.2 and 3.5, then there is very little difference. However, if we use the first martingale representation, as in § §2.1 and 3.4, then there is a difference, because now we want to use the optional stopping theorem for multiparameter random time changes, as in § §2.8 and 6.2 of Ethier and Kurtz [19]. That approach follows Kurtz [40], which draws on Helms [28]. That approach has been applied in §12 of Mandelbaum and Pats [48]. To illustrate this alternate approach, we use the random time change approach here.
Just as in §2.1, we can construct the stochastic process Q in terms of rate-1 Poisson processes. In addition to the two Poisson processes A and S introduced before, now we have an extra rate-1 Poisson process R used to generate abandonments. Instead of (12), here we have representation Paralleling (14), we have the martingale representation for t ≥ 0, where for t ≥ 0 and the filtration is F ≡ {F t : t ≥ 0} defined by for t ≥ 0, augmented by including all null sets. We now want to justify the claims in (98)-(100). Just as before, we can apply Lemmas 3.1 and 3.3 to justify this martingale representation, but now we need to replace Lemmas 3.2 and 3.4 by corresponding lemmas involving the optional stopping theorem with multiparameter random time changes, as in § §2.8 and 6.2 of [19]. We now sketch the approach: We start with the three-parameter filtration in D 3 as n → ∞, and then in D. However, Finally, the CMT with the integral representation (103) and Theorem 4.1 completes the proof. Note that the limiting Brownian motion associated with R does not appear, because Φ R,n is asymptotically negligible. That is why the infinitesimal variance is the same as before.

Finite Waiting Rooms
We can also obtain stochastic-process limits for the number of customers in the system in associated M/M/n/0 (Erlang-B), M/M/n/m n and M/M/n/m n + M models, which have finite waiting rooms. For the Erlang-B model, there is no waiting room at all; for the other models there is a waiting room on size m n in model n, where m n is allowed to grow with n so that m n / √ n → κ ≥ 0 as n → ∞ as in (8). The QED many-server heavy-traffic limit was stated as Theorem 1.2.
The proof can be much the same as in §7.1. The idea is to introduce the finite waiting room via a reflection map, as in § §3.5, 5.2, 13.5, 14.2, 14.3 and 14.8 of Whitt [69], corresponding to an upper barrier at κ, but the reflection map here is more complicated than for single-server queues and networks of such queues, because it is not applied to a free process. We use an extension of Theorem 4.1 constructing a mapping from D × R into D 2 , taking model data into the content function and the upper-barrier regulator function.

Theorem 7.3 (a continuous integral representation with reflection) Consider the modified integral representation
where x(t) ≤ κ, h : R → R satisfies h(0) = 0 and is a Lipschitz function as in (62), and u is a nondecreasing nonnegative function in D such that (111) holds and The modified integral representation in (111) has a unique solution (x, u), so that it constitutes a bonafide function y, b) and u ≡ f 2 (y, b). In addition, the function (f 1 , f 2 ) is continuous provided that the product topology is used for product spaces and the function space D (in both the domain and range) is endowed with either: (i) the topology of uniform convergence over bounded intervals or (ii) the Skorohod J 1 topology. Moreover, if y is continuous, then so are x and u.
Proof. We only show the key step, for which we follow the argument in §3 of Mandelbaum and Pats [48] and §4 of Reed and Ward [56]; see these sources for additional details and references. The idea is to combine classical results for the conventional one-dimensional reflection map, as in § §5.2 and 13.5 of [69] with a modification of Theorem 4.1. Let (φ κ , ψ κ ) be the one-sided reflection map with upper barrier at κ, so that φ κ (y) = y − ψ κ (y), with φ κ (y) being the content function and ψ κ (y) being the nondecreasing regulator function; see § §5.2 and 13.5 of [69]. We observe that the map in (111) can be expressed as x = φ κ (w) and u = ψ κ (w), where This lets us represent the desired map as the composition of the maps (φ κ , ψ κ ) and ξ. The argument to treat ξ is essentially the same as in the proof of Theorem 4.1, but we need to make a slight adjustment; we could apply it directly if we had h : D → D in Theorem 4.1. Recall that φ κ is Lipschitz continuous on D([0, t] for each t with the uniform norm, φ κ (y 1 ) − φ κ (y 2 ) t ≤ 2 y 1 − y 2 t , with modulus 2 independent of t. Hence, paralleling (64), we have for each t > 0. Hence we can apply Gronwall's inequality in Lemma 4.1 to establish (Lipschitz) continuity of the map ξ on D([0, T ]) × R. Combining this with the known (Lipschitz) continuity of the reflection map (φ κ , ψ κ ), we have the desired continuity for the overall map in the uniform topology. We can extend to the J 1 topology as in the proof of Theorem 4.1. Now that we understand how we are treating the finite waiting rooms, the QED many-server heavy-traffic limit theorem is as stated in Theorem 1.2. This modification alters the limiting diffusion process in Theorem 7.1 only by the addition of a reflecting upper barrier at κ for the sequence of models with waiting rooms of size m n , where κ = 0 for the Erlang-B model. When κ = 0, X is a reflected OU (ROU) process. Properties of the ROU process are contained in Ward and Glynn [66]. Proofs for the two cases κ > 0 and κ = 0 by other methods are contained in §4.5 of Whitt [72] and Theorem 4.1 of Srikant and Whitt [61]. General references on reflection maps are Lions and Sznitman [43] and Dupuis and Ishii [18].
Proof of Theorem 1.2. We briefly sketch the proof. Instead of (97), here we have representation for t ≥ 0, where U n (t) is the number of arrivals in the time interval [0, t] when the system is full in model n, i.e., when Q n (t) = n + m n . In particular, To connect to Theorem 7.3, it is significant that U n can also be represented as the unique nondecreasing nonnegative process such that Q n (t) ≤ n + m n , (113) holds and We now construct a martingale representation, just as in (98)-(102). The following is the natural extension of Theorem 7.2: By combining Theorems 7.3 and 7.4, we obtain the joint convergence (X n , V n ) ⇒ (X, U ) in D 2 as n → ∞ , for X n and V n in (116) and (117), where the vector (X, U ) is characterized by (9) and (10). That implies Theorem 1.2 stated in §1.
Remark 7.1 (correction) The argument in this section follows Whitt [72], but provides more detail. We note that the upper-barrier regulator processes are incorrectly expressed in formulas (5.2) and (5.8) of [72].

General Non-Markovian Arrival Processes
In this section, following §5 of Whitt [72], we show how to extend the manyserver heavy-traffic limit from M/M/n/m n + M models to G/M/n/m n + M models, where the arrival processes are allowed to be general stochastic point processes satisfying a FCLT. They could be renewal processes (GI) or even more general arrival processes. The limit of the arrival-process FCLT need not have continuous sample paths. (As noted at the end of §4.3, this separate argument is not needed if we do not use martingales.) LetĀ n ≡ {Ā n (t) : t ≥ 0} be the general arrival process in model n and let be the associated scaled arrival process. We assume that We also assume that, conditional on the entire arrival process, model n evolves as the Markovian queueing process with i.i.d. exponential service times and i.i.d. exponential times until abandonment. Thus, instead of Theorem 7.4, we have Theorem 7.5 (first martingale representation for the scaled processes in the G/M/n/m n +M model) Consider the family of G/M/n/m n +M models defined above, evolving as a Markovian queue conditional on the arrival process. If m n < ∞, then the scaled processes have the martingale representation where A n is the scaled arrival process in (120), M n,i are the scaled martingales in (102) and V n (t) ≡ U n (t)/ √ n, t ≥ 0, for U n in (113)-(115). The scaled processes as a probability measure on D for each z ∈ D and we can regard P (X n ∈ B|A n = z) as a measurable function of z in D for each Borel set B in D, where P (X n ∈ B) = B P (X n ∈ B|A n = z) dP (A n = z) .
And similarly for the pair (X, A).
A minor modification of the previous proof of Theorem 1.2 establishes that X ζn n ⇒ X ζ in D whenever ζ n → ζ in D ; i.e., for each continuous bounded real-valued function f on D, whenever ζ n → ζ in D. Now fix a continuous bounded real-valued function f and let Since we have regular conditional probabilities, we can regard the functions h n and h as measurable functions from D to R (depending on f ).
We are now ready to apply the generalized continuous mapping theorem, Theorem 3.4.4 of [69]. Since h n and h are measurable functions such that h n (ζ n ) → h(ζ) whenever ζ n → ζ and since A n ⇒ A in D, we have h n (A n ) ⇒ h(A) as n → ∞. Since the function f used in (126) and (127) is bounded, these random variables are bounded. Hence convergence in distribution implies convergence of moments. Hence, for that function f , we have Since this convergence holds for all continuous bounded real-valued functions f on D, we have shown that X n ⇒ X, as claimed.

The Martingale FCLT
We now turn to the martingale FCLT. For our queueing stochastic-process limits, it is of interest because it provides one way to prove the FCLT for a Poisson process in Theorem 4.2 and because we can base our entire proof of Theorem 1.1 on the martingale FCLT. However, the gain in the proof of Theorem 1.1 is not so great.
We now state a version of the martingale FCLT for a sequence of local martingales {M n : n ≥ 1} in D k , based on Theorem 7.1 on p. 339 of Ethier and Kurtz [19], hereafter referred to as EK. Another important reference is Jacod and Shiryayev [30], hereafter referred to as JS. See Section VIII.3 of JS for related results; see other sections of JS for generalizations.
We will state a special case of Theorem 7.1 of EK in which the limit process is multi-dimensional Brownian motion. However, the framework always produces limits with continuous sample paths and independent Gaussian increments. Most applications involve convergence to Brownian motion. Other situations are covered by JS, from which we see that proving convergence to discontinuous processes is more complicated.
The key part of each condition below is the convergence of the quadratic covariation processes. Condition (i) involves the optional quadratic-covariation (square-bracket) processes [M n,i , M n,j ], while condition (ii) involves the predictable quadratic-covariation (angle-bracket) processes M n,i , M n,j . Recall from §3.2 that the square-bracket process is more general, being well defined for any local martingale (and thus any martingale), whereas the associated anglebracket process is well defined only for any locally square-integrable martingale (and thus any square-integrable martingale). Thus the key conditions below are the assumed convergence of the quadraticvariation processes in conditions (130) and (133). The other conditions (129), (131) and (133) are technical regularity conditions. There is some variation in the literature concerning the extra technical regularity conditions; e.g., see Rebolledo [54] and JS.
Let J be the maximum-jump function, defined for any x ∈ D and T > 0 by Theorem 8.1 (multidimensional martingale FCLT) For n ≥ 1, let M n ≡ (M n,1 , . . . , M n,k ) be a local martingale in D k with respect to a filtration F n ≡ {F n,t : t ≥ 0} satisfying M n (0) = (0, . . . , 0). Let C ≡ (c i,j ) be a k × k covariance matrix, i.e., a nonnegative-definite symmetric matrix of real numbers.
Assume that one of the following two conditions holds: and M n,i , M n,j (t) ⇒ c i,j t in R as n → ∞ (133) for each t > 0 and for each (i, j).

Conclusion:
If indeed one of the the conditions (i) or (ii) above holds, then where, for a matrix A, A tr is the transpose.
Of course, a common simple case arises when C is a diagonal matrix; then the k component marginal one-dimensional Brownian motions are independent. When C = I, the identity matrix, M is a standard k-dimensional Brownian motion, with independent one-dimensional standard Brownian motions as marginals.
At a high level, Theorem 8.1 says that, under regularity conditions, convergence of martingales in D is implied by convergence of the associated quadratic covariation processes. At first glance, the result seems even stronger, because we need convergence of only the one-dimensional quadratic covariation processes for a single time argument. However, that is misleading, because the stronger weak convergence of these quadratic covariation processes in D k 2 is actually equivalent to the weaker required convergence in R for each t, i, j in conditions (130) and (133); see [73].

Applications of the Martingale FCLT
In this section we make two applications of the preceding martingale FCLT. First, we apply it to prove the FCLT for the scaled Poisson process, Theorem 4.2. Then we apply it to provide a third proof of Theorem 1.1. In the same way we could obtain alternate proofs of Theorems 1.2 and 7.1.

Proof of the Poisson FCLT
We now apply the martingale FCLT to prove the Poisson FCLT in Theorem 4.2. To do so, it suffices to consider the one-dimensional version in D, since the Poisson processes are mutually independent. Let the martingales M n ≡ M A,n be as defined in (65), i.e., Hence both (131) and (133) hold trivially. By the SLLN for a Poisson process, A(nt)/n → t w.p.1 as n → ∞ for each t > 0. Hence both conditions (i) and (ii) in Theorem 8.1 are satisfied, with C = c 1,1 = 1.

Completing the Proof of Theorem 1.1
The bulk of this paper has consisted of a proof of Theorem 1.1 based on the first martingale representation in Theorem 3.4, which in turn is based on the representation of the service-completion counting process as a random time change of a rate-1 Poisson process, as in (12). A second proof in §4.3 established the fluid limit directly. In this subsection we present a third proof of Theorem 1.1 based on the second martingale representation in Theorem 3.5, which in turn is based on a random thinning of rate-1 Poisson processes, as in (14). This third proof also applies to the third martingale representation in §3.6, which is based on constructing martingales for counting processes associated with the birth-and-death process {Q(t) : t ≥ 0} via its infinitesimal generator.
Starting with the second martingale representation in Theorem 3.5 (or the third martingale representation in Subsection 3.6), we cannot rely on the Poisson FCLT to obtain the required stochastic process limit (M n,1 , M n,2 ) ⇒ ( √ µB 1 , √ µB 2 ) in D 2 as n → ∞ , in (59). However, we can apply the martingale FCLT for this purpose, and we show how to do that now. As in §9.1, we can apply either condition (i) or (ii) in Theorem 8.1, but it is easier to apply (ii), so we will. The required argument looks more complicated because we have to establish the two-dimensional convergence in (134) in D 2 because the scaled martingales M n,1 and M n,2 are not independent. Fortunately, however, they are orthogonal, by virtue of the following lemma. That still means that we need to establish the two-dimensional limit in (134), but it is not difficult to do so.
We say that two locally square-integrable martingales with respect to the filtration F, M 1 and M 2 , are orthogonal if the process M 1 M 2 is a local martingale with M 1 (0)M 2 (0) = 0. Since M 1 M 2 − M 1 , M 2 is a local martingale, orthogonality implies that M 1 , M 2 (t) = 0 for all t.
Lemma 9.2 (Quadratic covariation of stochastic integrals with respect to martingales) Suppose that M 1 and M 2 are locally square-integrable martingales with respect to the filtration F, while C 1 and C 2 are locally-bounded F-predictable processes. Then As a consequence of the orthogonality provided by Lemma 9.1, we have [M n,1 , M n,2 ](t) = 0 and M n,1 , M n,2 (t) = 0 for all t and n for the martingales in (134), which in turn come from Theorem 3.5. Thus the orthogonality trivially implies that [M n,1 , M n,2 ] ⇒ 0 and M n,1 , M n,2 (t) ⇒ 0 in R as n → ∞ for all t ≥ 0. We then have M n,i , M n,i (t) ⇒ c i,i t = µt in R as n → ∞ for each t and i = 1, 2 by (41) in Theorem 3.5 and Lemma 4.2, as in the previous argument used in the first proof of Theorem 1.1 in § §4.1-6.2. As stated above, the bulk of the proof is thus identical. By additional argument, we can also show that [M n,i , M n,i ](t) ⇒ c i,i t = µt in R as n → ∞ .
We have just shown that (133) holds. It thus remains to show the other conditions in Theorem 8.1 (ii) are satisfied. First, since we have a scaled unit-jump counting process, condition (132) holds by virtue of the scaling in Theorem 3.5 and (30). Next (131) holds trivially because the predictable quadratic variation processes M n,1 and M n,2 are continuous. Hence this third proof is complete.
In closing this section, we observe that this alternate method of proof also applies to the Erlang-A model in §7.1 and the generalization with finite waiting rooms in Theorem 1.2.
A m (t) : t ≥ 0} is also an F-martingale. Adding, we see that {M m (t) 2 − A m (t) : t ≥ 0} is an F-martingale for each m. Thus, for each m, the predictable quadratic variation of M m is M m (t) = A m (t), t ≥ 0. Now we can let m ↑ ∞ and apply Fatou's Lemma to get Therefore, M itself is square integrable. We can now apply the monotone convergence theorem in the conditioning framework, as on p. 280 of [12], to get as well, so that M 2 − A is indeed a martingale. Of course that implies that M = A, as claimed. We get [M ] = N from Lemma A.2, as noted at the beginning of the proof. We remark that there is a parallel to Lemma A.2 for the angle-bracket process, applying to cases in which the compensator is not continuous. In contrast to Lemma 3.1, we now do not assume that E[N (t)] < ∞, so we need to localize. If, in addition, the compensator A is continuous, then Proof. For (143), we exploit the fact that N − A = [N − A] ; see p. 377 of Rogers and Williams [58] and §5.8 of van der Vaart [64]. The third term on the right in (137) is predictable and thus its own compensator. The compensators of the first two terms in (137) are obtained by replacing N by its compensator A. See Problem 3 on p. 60 of Liptser and Shiryayev [44].