Statistical consistency of the data association problem in multiple target tracking

Abstract: Simultaneous tracking of multiple moving objects extracted from an image sequence is an important problem which finds numerous applications in science and engineering. In this article we conduct an investigation of the theoretical properties a statistical model for tracking such moving objects, or targets. This tracking model allows for birth, death, splitting and merging of targets, and uses a Markov model to decide the times at which such events occur. This model also assumes that the track traveled by each target behaves like a Gaussian process. The estimated tracking solution is obtained via maximum likelihood. One of the contributions of this article is to establish the almost sure consistency to the data association problem by using these maximum likelihood tracking estimates. A major technical challenge for proving this consistency result is to identify the correct track (data association) amongst a group of similar (but incorrect) track proposals that are results of various combinations of target birth, death, splitting and/or merging. This consistency property of the tracking estimates is empirically verified by numerical experiments. To the best of our knowledge, this is the first time that a comprehensive study is performed for the large sample properties of a multiple target tracking method. In addition, the issue of how to quantify the confidence of a tracking estimate is also addressed.


Introduction
Multiple target tracking has application to many scientific problems.It has importance in radar and signal processing, air traffic control, robot vision, GPSbased navigation, biomedical engineering, and video surveillance to name a few.Typically the objects of interest (i.e., the targets) are captured in the form of an image sequence.Each image frame contains the locations and perhaps other attribute (such as sizes and shapes) information of the targets.The goal of tracking is to recover this information and use it to help reconstruct the tracks that the targets traveled.Quite often the following two-step strategy would be adopted to solve this problem.The first step is to extract the locations and/or other attributes of the targets from each image frame.There is no unified solution for this first step, as different targets need different methods for extraction; e.g., human faces and missiles require very different target recognition methods to detect their appearances in an image.Once the target coordinates are located, the second step for solving the tracking problem is to link these coordinates together in such a way that coordinates of the same target detected at different image frames are connected together as a reconstruction of the path that this target traveled.This article assumes that the target coordinates have already been extracted from the image sequence and focuses on the second step of coordinate linking.This second step is also known as the data association problem.
The work discussed in this paper was motivated by the practical need for tracking (i) storm activities captured in radar images and (ii) vortices generated in turbulence fields.In these two tracking applications, the splitting and merging of targets is quite common.A promising method is proposed Storlie, Lee, Hannig & Nychka (2009) for tracking of such, and other kinds of, merging and splitting targets.This tracking method models target locations with Gaussian processes, and estimates their tracks using a maximum likelihood approach.Movies presenting the results obtained by applying this tracking method to the storm activities and the vortices applications can be viewed at http://www.stat.unm.edu/~storlie/tracking/.In this article we complement the work of Storlie et al. (2009) by studying the asymptotic properties of their solution to the data association problem.We show that under certain regularity conditions this solution is strongly consistent.While this result is useful in and of itself, it is also our hope that the method of proof adopted here could be valuable for studying the large sample properties of solutions to other tracking problems as well.To the best of our knowledge, this is the first time that a comprehensive study is performed for the large sample properties of a multiple target tracking method.

A first description of the tracking problem
As mentioned before, in this article we assume that the coordinates (x, y) of the targets have been extracted from every frame of the image sequence.We do not know however which of the (x, y) locations corresponds to which target.The goal of a tracking method then, is to take the location data extracted over time and recover the track of each target.A track is defined to be the (x(t), y(t)) coordinates of a target at each time t during the image sequence.
To illustrate the idea further, consider the tracking problem depicted in assume that there are 4 targets at each time step as in Figure 1.The locations of the targets in this figure are simulated from the model to be described in Section 2. We are however ignoring the possibility of birth and death of targets as well as splitting and merging for the time being.The targets are free to change position from one time step (or image) to the next.The data association problem is to determine which temporal set of locations corresponds to one particular target.In other words, the goal is to sequentially connect the targets in the bottom left plot to form tracks. Since this is simulated data, the corresponding solution is known and the correct tracks are given in the bottom right plot.In many instances the observed locations of the targets include a non-negligible measurement error.In this case, it is typical to first solve the data association problem.The observations forming a track can then be smoothed to obtain the estimated paths of the solution.Many statistical approaches to the target tracking problem have been studied in the engineering literature over the past thirty years; e.g., see the two very comprehensive books Bar-Shalom, Li & Kirubarajan (2001) and Blackman & Popoli (1999), and the references given therein.Most of these methods employ a statistical model to describe the motion of the targets to be tracked.Usually a Gaussian state space model is assumed, and the "best" estimates are defined as the set of tracks that maximizes the likelihood of the model.Note that we use the term likelihood to be the unconditional likelihood of the data as typical in statistics literature, as opposed to the conditional likelihood which is what is often meant in the tracking community.Virtually no attempts have been made to investigate the theoretical properties of these maximum likelihood estimates, although various practical algorithms have been proposed for computing them.The two most widely used algorithms are the Multiple Hypothesis Tracking algorithm of Reid (1979) and the Joint Probabilistic Data Association algorithm developed by Fortmann, Bar-Shalom & Scheffe (1983).See also Mori, Chong, Tse & Wishner (1986) for a good general mathematical framework for this problem.Most recently, a new class of filtering methods based on particle filtering or sequential Monte Carlo (Gordon, Salmond & Smith (1993), Kitagawa (1996), Liu & Chen (1998), Doucet, Godsill & Andrieu (2000) and Doucet, de Freitas & Gordon (2001)) have been developed.Still other methods such as the probability hypothesis density method (Mahler (2003) and Vo, Singh & Doucet (2005)) approximate the likelihood by propagating only the posterior expectation instead of the entire distribution through to subsequent times.Since the major focus of this article is to present a thorough theoretical analysis of the tracking estimates provided by the method of Storlie et al. (2009), the algorithmic issue of how to computing the estimates of target location will not be further discussed.
Data association asymptotics 1231

Merging and splitting of targets
The example shown in Figure 1 is often a simplistic representation of reality.Many applications will not have the same number of targets in each image.For examples, imperfect detection and occlusion will lead to missing targets in some images.There can also be false alarms or clutter.Furthermore, some targets may appear for the first time or disappear permanently in the middle of the image sequence.We will call these events birth and death respectively.
In addition, this article is motivated by the scientific need of tracking merging and splitting targets, such as storms or vortices.Figure 2 illustrates a more realistic example of an actual tracking problem motivating this work.These images are from a two-dimensional turbulence simulation of freely decaying vortexes.The white objects are centers of vorticity rotating in a clockwise direction, whereas the black vortexes have the opposite rotation.Vortexes of the same spin will merge together as they move close to each other.There is a good example of a merger, between times 8 and 9, of two white vortexes that are left of center and below center in the images.
In the example in Figure 2, there will be birth, death, and merging.In practice, the vortices also need to be identified with some target extraction procedure, and there is no perfect method for doing this.This leads to false alarm observations and missing observations in some time windows.The ability to deal with these issues is important for a practical tracking system.For the description of a tracking procedure that allows for birth, death, splitting, merging, false alarms, and missing observations, see Storlie et al. (2009).In this presentation, we are only interested in studying the asymptotic behavior of the estimator proposed in Storlie et al. (2009) as observations become more and more frequent.To this end, the asymptotic analysis is greatly simplified if false alarms are ignored.Studying the asymptotic properties under this simplification provides insight into the more complex situation when missing observations and false alarms are present.For example, false alarms can be thought of intuitively as targets that last only for a short time frame.If this is the case, the results presented here would still apply to the actual targets, even if false alarms are present.
Merging and splitting of targets can be common in radar applications as well, though in a slightly different context.That is, when two targets are close together, resolution limits may prevent them from being simultaneously detected.The detection method will then return only one (or even no) observation for these two targets.This certainly poses additional difficulties and challenges.Although this is perceived as a very important issue by Daum (1994) and Blackman (2004), we are unaware of any satisfactory solution to this.Most existing methods for tracking merging targets are not well defined in terms of an overall probabilistic model (Trunk & Wilson, 1981;Chang & Bar-Shalom, 1984;Koch & van Keuk, 1997;Genovesio & Olivo-Marin, 2004).
Another limitation of most existing work on tracking methodology is the lack of theoretical understanding.Some notable exceptions are the work of Cong & Hong (1999) and Li & Jing (2003) which study the numerical convergence properties of their optimization algorithms, and the work of Chen, Li & Bar-Shalom ( 2004) which provides some theoretical justification for their method to choose the correct number of targets as the number of image frames goes to infinity.Also, for a somewhat different class of tracking problems, Hall & Rau (2000), Hall, Peng & Rau (2001) and Hall, Qiu & Rau (2007) provide tracking solutions with theoretical support.Despite of these various pieces of work, however, many important theoretical problems remain unsolved.For example, under what conditions can we obtain a consistent tracking estimate as the time increment between observations goes to zero?At what rate does this convergence take place?The novel contribution made in this article is to, via analyzing the method of Storlie et al. (2009), provide a first attempt at addressing these questions.While it is not surprising to expect the existence of a perfect tracking method when the number of observations increases to infinity, it is not obvious that the MLE of our tracking model converges almost surely to the correct solution.The current paper establishes this convergence result.
The rest of this article is organized as follows.Section 2 summarizes the major ingredients of the tracking method of Storlie et al. (2009).The almost sure convergence of the maximum likelihood solution of this tracking method is established in Section 3.These properties are then illustrated via numerical experiments in Section 4. Concluding remarks and possible future work are offered in Section 5, while technical details are deferred to the appendix.
2. The tracking model of Storlie, Lee, Hannig & Nychka In this section, for completeness, we summarize the multiple target tracking model proposed by Storlie et al. (2009), for which its theoretical properties are to be studied.This model has a continuous time stochastic component that describes (i) the events that occur and (ii) the locations of the targets to be tracked.The tracking estimate for the targets is obtained by using the model likelihood given in Appendix A.
Define a path, (X(t), Y (t)), to be the coordinates of a target at time t > 0. We observe the targets at discrete times t = (t 0 , t 1 , . . ., t n ).We assume a twodimensional path, but the following could easily allow for paths in ℜ 3 .We wish to model the path of a target by a two-dimensional Gaussian process.The complication is that, due to the following reasons, we may not be able to observe the target at all times: 1. it will exist in the future, but does not exist yet (birth), 2. it no longer exists (death), 3. it broke off into 2 new targets (splitting), and 4. it combined with another target (merger).
The tracking model to be studied below has two parts which we will refer to as (i) the Event Model and (ii) the Location Model.The Event Model describes how and when targets come into existence and termination, while the Location Model describes how an existing target travels around.
Before describing the Event Model, again note that in many practical tracking problems, there may be missing observations and false alarms.Also, additional target attribute variables, such as size and shape, may be available.For the description of a more complete model that allows for the all of the above complications as well as attributes, false alarms, and missing observations, see Storlie et al. (2009).In the following we omit these features in order to simplify the asymptotic analysis.As mentioned, we can gain a lot of insight about the asymptotic behavior of the solution to the data association problem in this simplified case, and project this understanding to more complex scenarios (i.e., those with false alarms and missing observations).

Event Model
The Event Model is a continuous time Markov chain model that is very similar to a birth and death process.Four types of events can occur: births, deaths, splits, and mergers.The rate at which these events happen are λ b , N (t)λ d , N (t)λ s , and (N (t) − 1)λ m respectively, where N (t) is the number of targets in existence at time t.It is assumed that the initial number of targets, N 0 = N (0) ∼ Poisson(λ 0 ).
The following notation will be used to describe the Event Model U b,j = number of births in the interval [t j , t j+1 ) U d,j = number of deaths in the interval [t j , t j+1 ) U s,j = number of splits in the interval [t j , t j+1 ) U m,j = number of mergers in the interval [t j , t j+1 ). (1) We will write U b = (U b,1 , . . ., U b,n ) and similarly for U d , U s , and U m .Also, denote the collection of N 0 and the U 's by Each target, regardless of its status (e.g., alive or dead), will be uniquely identified by a positive integer starting from 1.We shall call such integers indices.The initial targets alive at time t 1 are arbitrarily labeled with indices 1 through N 0 .The following actions will be taken at the time whenever any one of the four possible events happens.When there is a birth the new target will be given the next available index.For example, if there are already 10 targets in the model (some currently alive, some could be dead), these targets would have been labeled uniquely with indices from 1 to 10, and the new target will be given an index of 11.When there is a death, all targets that are still alive are equally likely to be selected as the one that dies.When there is a split, all of the living targets are equally likely to be the parent, and the children will be given the next two available indices.Finally, for merging events all of the possible pairs of all living targets are equally likely to be the parents, and the child will be given the next available index.
Notice that the assumption that all targets are equally likely to be parents in a merger appears to be at odds with the principle that only close targets are eligible to merge together.We will rectify this issue in the Location Model to be described in Section 2.2.In short, locations of the parents of a merger are conditioned to be "close" to each other right before the merger.This shifts the burden of enforcing the property that "only close targets merge together" to the location model.This leads to an important simplification of the likelihood calculation since the location model depends on the Event Model but not viseversa.On the other hand, this arrangement leads to complications in studying the theoretical properties of our tracking algorithm due to the loss of the Markov property.
We will specify which targets were involved in the events by V b,j = the collection of indices of targets that were born in the interval [t j , t j+1 ) V d,j = the collection of indices of targets that died in the interval [t j , t j+1 ) V s,j = the collection of triplets (i 1 , i 2 , i 3 ) where i 1 is the index of the parent and i 2 , i 3 are the children for every split in the interval [t j , t j+1 ) V m,j = the collection of triplets (i 1 , i 2 , i 3 ) where i 1 , i 2 are the indices of the parents and i 3 is the child for every merger in the interval [t j , t j+1 ).(2)

Data association asymptotics 1235
Let V b = (V b,1 , . . ., V b,n ) and similarly for V d , V s , and V m .The collection of all the V 's will be denoted as Lastly, it should be noted that this is a hidden Markov model in the sense that we do not actually observe the variables U, and V from the data.Predicting these variables is part of the tracking problem.This will be described further in Section 2.3.

Location Model
When a target is determined to exist by the Event Model, the observed path of the i th target (X i (t), Y i (t)) will be modeled by a Gaussian process.The Gaussian process is commonly used in tracking applications because it is mathematically straightforward to work with and yet it models the paths well in most applications.Target paths are assumed to run their course independently of other targets unless they are required to split or merge as determined by the Event Model.The dependency introduced by splitting and merging are described below.
The distribution of X i (t) will be defined below for the three cases of a target resulting from (i) birth, (ii) merger, and (iii) split.The distribution of Y i (t) will be similar with the obvious changes in notation and parameters and independent of X i (t) given the event variables U and V.
Let the x component of location of the i th target at time t be denoted by X i (t).Also denote the time of initiation of the i th target by ξ i .If the i th target exists at the first observation time t 1 , then it is assumed that ξ i = t 1 .Observed location is then governed by where H i (t) is a smooth function corresponding to the target location, G i (t) is some continuous mean zero Gaussian process, describing random fluctuations such as errors of measurement; both H i (0) = G i (0) = 0.The initial position, X i (ξ i ) will depend on whether the target resulted from a birth, merger, or split.These are described in Sections 2.2.1 to 2.2.4 below.The model in (3) is designed for G i (t) to be a Brownian motion.If we change the model for G i (t), we may wish to change (3) accordingly.For example, if we use integrated Brownian motion, we may want to add an initial velocity term instead of assuming that it is equal to zero.

Initial conditions for a target resulting from a birth event
Suppose that the i th target resulted from a birth.It is assumed that the initial position is Gaussian.Specifically, X i (ξ i ) ∼ N µ X0 , σ 2 X0 .For many problems, it may also seem reasonable to use a uniform distribution to model the initial location X i (ξ i ).However, very often the likelihood of a uniform distribution can be satisfactorily mimicked by sufficiently increasing the variance of a normal distribution.Thus we will keep the original Gaussian assumption for mathematical convenience.

Initial conditions for a target resulting from a merging event
Now suppose that the i th target is initiated from a merger.Let p i be a vector containing the indices of the two targets that merge together to create the i th target.If there is no size measurement made on the targets, which we are assuming here, then we can let the initial position of the child be the simple average of the positions of the parents at the time of merger plus a noise term ψ m,i .We model ψ m,i as ψ m,i ∼ N (0, σ 2 Xm ) with a small σ 2 Xm so that the new target location is likely to be close to the average of the parents.Figure 3 displays a physical representation of this.Therefore we have The two parent targets are currently not required to be close to each other at the time of merger.To ensure that the parents locations are close to each other before merging, the difference between locations of the parents is conditioned to be small at the time of merger, ξ i .This can be achieved as follows.Let d be a vector containing the three targets involved in the merger.The indices of the parents are d 1 and d 2 where d 3 is the index of the child.The difference in location between the two parents at the time of merger plus a random noise term is given by D = X d1 (ξ d3 ) − X d2 (ξ d3 ) + ψ d , where ψ d ∼ N (0, σ 2 X d ).In a manner similar to a Brownian Bridge process, we then condition on the event D = 0.If σ X d is small, this will make it very likely that the two parent paths are close together right before the merger.Referring to Figure 3 once again, we see a merging event with a possible realization of ψ d .

Parent locations at the time of a merging event
Notice that in our modeling so far, the two parent targets are not required to be close to each other at the time of a merging event.To ensure that the parents move close to each other before merging, the difference between locations of the parents at the time of merger is conditioned to be small.This is achieved as follows.
Let d = (d 1 , d 2 , d 3 ) be a vector containing the indices of the three targets involved in a merging event where d 1 and d 2 are the parents while d 3 is the index of the child.Let D be the difference in location between the observed locations of the two parents at the time of merger plus a noise term, where ψ d ∼ N (0, σ 2 X d ) and independent of the targets.If σ X d is small, then it is likely that ψ d is small in absolute value.If we then condition the model for X d1 and X d2 on the event D = 0, this will ensure that the parents are only a small distance ψ d apart at the time of the merging event.In Figure 3 once again, we see a merging event with a possible realization of ψ d .
In general, there will be N m = n j=1 U m,j merging events during the time window [t 1 , t n ].We will condition the target paths on all of these mergers in a manner similar to that above.This is described more precisely as follows.Let D i be the D from (5) and ψ d,i be the corresponding ψ d for the i th merging event, i = 1, . . ., N m .In a manner similar to a Brownian Bridge, the paths for (X 1 , . . ., X M ) are then conditioned on the event {(D 1 , . . ., D Nm ) = (0, . . ., 0)}, where M is the total number of targets that existed before time t n .

Initial conditions for a target resulting from a splitting event
Suppose that the i th target is initiated by a splitting event.To keep notation consistent with that for mergers, let p i be a vector of length one that contains the index of the parent of the i th target.The initial location of a target resulting from a split is given by where ψ s,i ∼ N (0, σ 2 Xs ).Similar to the model for a merger, the initial position of a new target from a split is the same as that of the parent plus some error.It is assumed that σ 2 Xs is small so that the new targets are likely to appear close to where the parent split.

The tracking estimate
Here we formally define our estimand and our method for estimating it.The setup for this estimation problem is as follows.We collect data at the following times t 1 , . . ., t n .At each time there are m j observations.Let Z ij be the i th observation at time t j , i = 1, . . ., m j .Each Z ij is a vector of the location values for a target.Further denote Z(t) as the collection of observations at time t, so that Z(t j ) = (Z 1,j , . . ., Z mj,j ), and let Z = {Z ij : j = 1, . . ., n, i = 1, . . ., m j } be the collection of observations at all observed times.
From our data, Z, we need to decide which target to assign each observation to.Note that each observation can be assigned to only one target and each target can have only one observation assigned to it.We will create the variable p ij to be the index of the target that observation Z ij originated from.Let P = {p ij : j = 1, . . ., n, i = 1, . . ., m j }.
Now for a given Z, the indices contained in P will specify the tracks of each target.To completely specify a solution to the tracking problem, in addition to P, we must also specify the events (births, deaths, splits, and mergers) that occurred with the variables U and V defined in (1) and ( 2).Thus the variables U and V together with P denote a solution to the data association problem or the tracking solution.As mentioned before, the estimated paths can then be obtained by smoothing the observed locations of each track.We will denote our estimate of the tracking solution (U ,V,P) as ( Û, V, P).

Calculating ( Û , V, P)
Assume for the moment that the parameters in the model described in Section 2 are known quantities.We will consider the estimation of these parameters later.We adopt the Gelfand style for density notation (Gelfand, 1990), and let [X] denote the probability density function of the random variable X, [X](x) to denote [X] evaluated at x, and [X | Y ] to denote the conditional density of X given Y .Notice that the evaluation [X | Y ](x) is a random variable (i.e., a function of the random variable Y ).Finally, we will use [X | Y ](x | y) to denote the evaluation of the conditional density for some observed value of the random variable Y = y.To achieve our tracking estimate, we will compute an approximation to the conditional density of (U, V, P) given the data Z = z, Note that this is also a probability mass function since the variables (U , V, P) are discrete.From this it is natural to define our tracking estimate as We can also interpret [U , V, P | Z]( Û , V, P | z) as the probability that ( Û, V, P) is the correct solution given the data Z = z.
We now consider the calculation of the density in ( 7).With the one-to-one mapping g : (P, Z) → (X , Y, Z), for a given Z, the information contained in P and (X , Y) is the same.Let g * : (P, Z) → (X , Y) be the function g without the last variable in its output.In Storlie et al. (2009) it is shown that (7) can be written as where {(u j , v j , p j ) : j = 1, 2, . . .} is an enumeration of the possible tracking solutions.Of course in practice it may not be possible to exhaust all possible enumerations, but the solution (u, v, p) that maximizes ( 9) is the same one that maximizes the numerator.
In Appendix A a closed form approximation is derived for the model likelihood [U, V, X , Y].It is important to note that the estimator defined in ( 8) uses ( 9) with this approximation for [U, V, X , Y] rather than the actual density [U, V, X , Y].In fact, the actual density would not be tractable to work with.Thus, all of the main results in Section 3 assume that ( Û, V, P) is calculated with the approximate likelihood, which is explicitly presented in Equations (A1)-(A5).There are several optimization methods available to search for this maximum; see Blackman & Popoli (1999) for example.
Confidence Sets: We can also use ( 9) to get an approximation to the distribution of the possible tracking solutions given the data Z.This is much more informative than just the estimate ( Û, V, P).For example, we could make a confidence set of solutions such that the probability that the correct solution was any of the solutions in that set was 100(1 − α)%.We could also calculate the probability that a given observation belongs to a certain target or the probability that two targets merged, etc.
Up until this point, we have ignored the issue of estimating the model parameters.If the tracking application requires the estimation of such parameters, one could do so with maximum likelihood or other suitable estimation method.The problem is that, before any tracking solution (U ,V,P) is specified, we would not know the values of the variables that should go into these estimates.We allow each of the solutions that we consider in (8) to have its own parameter estimates.This will necessarily make incorrect solutions have an overly optimistic likelihood and bias the distribution given in (9).We can limit the amount of this bias by setting reasonable bounds for the parameter estimates.This can be done very effectively in many practical problems, since prior information like a possible range for the numbers of targets and their velocities are typically available to the researcher.In the next section we show that as the sampling rate approaches infinity, the estimate ( 8) is identical to the correct solution eventually, even when the model parameters are estimated.

Main results: Asymptotic properties
In this section we show the strong consistency of the estimator ( Û, V, P) defined by ( 8).First we highlight that our estimator assumes the observations are produced from the model described in Section 2, and that the estimator itself is constructed with the likelihood approximation given in (A1)-(A5).
We begin with the following notation.For each k = 1, 2, . . .we collect from the process the observations Further let t k i,j , j = 1, . . ., n k i denote the j th time that the i th target is observed, where n k i is the number of times the the i th target is observed.Also let ∆t k i,j = t k i,j+1 − t k i,j .At times it will be convenient to write t i,j = t k i,j , ∆t i,j = ∆t k i,j and n i = n k i , keeping in mind that these are still a function of k.
We will assume that we are using a Brownian motion model for the error component G i (t) in (3), ( 4) and ( 6).In addition, we will assume that the variance scalers σ 2 i = σ 2 for all i.The estimator we will use for σ 2 is defined as where N = m i=1 (n i − 1) is the total number of consecutive differences from all tracks, I A is the 0,1 indicator function of the event A, and the event E i,j is given by The indicator in ( 10) is to make the estimator more robust.Essentially this will eliminate extreme observations from biasing the variance estimate if the tracking estimate has incorrectly connected tracks.It will also be important to exclude extreme observations from the estimator even when the tracks are correctly specified as we will see.Following arguments similar to that of the proof of the Strong Law of Large Numbers it is straight forward to show that ( 10) is a consistent estimator of σ 2 when the tracks are correctly specified; see Lemma 8.The conditions needed for Theorem 1 are as follows.Condition 5.The likelihood [U, V, X , Y] is calculated according to the approximation given in (A1)-(A5) which treats H i (t) + σG i (t) as a scaled Brownian motion for all targets.Condition 6.The parameter estimates are confined to a compact set such that λ 0 , λ b , λ d , λ s , and λ m are greater than zero and all the variance components of the location model are greater than 0.
Condition 7. The variance parameters for the random process components of X i (t) and Y i (t) which are σ 2 i and η 2 i respectively are such that σ 2 i = σ 2 and η 2 i = η 2 for all i.The estimates for σ 2 and η 2 are given by ( 10).Condition 8.The estimate ( Û , V, P) given in ( 8) is restricted to those with less than M < ∞ targets with M > m, where m is the number of targets in the correct solution (U ,V,P).The estimate ( Û, V, P) is further restricted so that consecutive observations in a track must be such that for some positive constant K 2 .
The following theorem uses the propositions in the appendix to show that the tracking solution is estimated correctly in the limit.
Theorem 1. Assume Conditions 1-8.Let (U ,V,P) k be the sequence of correct tracking solutions.Let ( Û, V, P) k be given by ( 8) restricted by Condition 8 for each k.Then there exists a K s.t. ( Û , V, P) k = (U,V,P) k for all k > K almost surely (a.s.).
The conclusion in Theorem 1 implies that for a given ω in a set with probability 1, there exists a K(ω) such that the estimate will equal the correct solution for all k > K(ω).This also implies that the parameters in the model can be estimated consistently as well, provided that the estimators used are consistent for their respective parameters when the tracks are correctly specified.Some possible estimators are given in Storlie et al. (2009).
Theorem 1 is proved in Appendix F by computing the likelihood ratio of a possible sequence of alternatives (i.e., any sequence of solutions that is not equal to (U ,V,P) k for each k), ( Ũ, Ṽ, P) k , to that of the correct solution sequence, (U,V,P) k .We will then show that the supremum of this ratio converges to zero.
Remark 1.In practical terms, the consistent solution to the data association problem implies that we know without error the correct number of targets, as well as the number and time of the event for all births, deaths, splits, and mergers, respectively.The location of each target at each time is still not known exactly, but it can be inferred with standard smoothing techniques since it is known exactly which observations correspond to which target.
Remark 2. When k = ∞ it is clear that the target to track associations can be resolved since (X i (t), Y i (t)) is a continuous process.However, for finite k it is not always clear how to achieve a good estimate (U ,V,P), in a way such that this estimate is also consistent.These results show that ( Û, V, P) k is one such estimate.
Remark 3. It is only for the purpose of deriving a likelihood [U, V, X , Y] for the estimator in (8), that we assume a scaled Brownian motion model for H i (t) + σG i (t) from (3), or equivalently that H i (t) = 0, as stated in Condition 5. Theorem 1 still allows for the actual motion of the target to be any H i (t) satisfying Condition 3.This is due to the Cameron Martin theorem, Theorem 18.22 of Kallenberg (2002), which under the assumptions of Conditions 3 and 4, implies that the distribution of H i (t) + σG i (t) and the distribution of σG i (t) are mutually absolutely continuous.Since in the proof of the main Theorem, we are only interested in events of probability one, this allows us to just prove the result under the assumption that H i (t) = 0. See Appendix F for more details.
Remark 4. Condition 8 contains a mild regularity condition to ensure that the number of targets is not allowed to increase without bound in the estimation process.That is, some finite bound on the number of targets needs to be specified a priori, which has no effect on the result, provided this bound is greater than the true number of targets present in the actual event.In practice, we can make this bound some large number that we are sure is larger than the number of targets present in the case at hand.The second part of Condition 8 is basically a form of screening or "gating" as it is called in the tracking literature.It simply prevents us from entertaining very unlikely matches.
Remark 5. Recall again that the estimate ( Û , V, P) defined by ( 8) is constructed with the approximation to the model likelihood given in (A1)-(A5).Thus the results above hold for this estimator and not necessarily for the hypothetical estimator constructed using the true likelihood.For the following two reasons, we do not study the estimator that is based on the true likelihood.First, the form of this estimator is not tractable, making it very difficult to study analytically.Second, from a practical perspective, the consistent estimate ( Û , V, P) that is being studied here can be calculated in a reasonable amount of time, while for the true likelihood based estimator, a large number of likelihood evaluations would need to be numerically approximated, and hence making its use practically infeasible.
Remark 6.The result of Theorem 1 does not apply directly to situations with missed detections and false alarms.However, if we assume an independent over time spatial Poisson process for false alarms, and an independent in time Bernoulli process for missed detections, as is often done in the tracking literature, it would not seem to affect the result (at least intuitively).Therefore, we conjecture that the same result holds even in this more complex case.The addition of these features, however, would add substantial complication to the proof, so we leave this result for a future work.

Simulated data results
This section presents simulation results from numerical experiments that were conducted to give a numerical example of Theorem 1 from Section 3.For all of these simulations, the data, Z, is assumed to come from the model given in Section 2. The parameters used to simulate the different cases are given as follows: mean initial X location (µ X0 = −113), variance of initial X location (σ 2 X0 = 100), variance scalar for BM's in X location (σ 2 i = 0.1 for all i), variance of difference in X location between parent and child after a split (σ 2 Xs = .5),variance of difference in X location between parent and child after a merger (σ 2 Xm = .125),variance of difference in X location between parents at time of merger (σ 2 X d = 1), mean initial Y location (µ Y0 = 37.5), variance of initial Y location (σ 2 Y0 = 100), variance scalar for BM's in Y location (η 2 i = .1 for all i), variance of difference in Y location between parent and child after a split (σ 2 Ys = .5),variance of difference in Y location between parent and child after a merger (σ 2 Ym = .125),and variance of difference in Y location between parents at time of merger (σ 2 Y d = 1), where µ X0 , σ 2 X0 , . . ., σ 2 Y d are more precisely defined in Section 2.2.The event model parameters were set as mean number of initial targets (λ 0 = 4), rate of birth (λ b = 0.1), rate of death (λ d = .02),rate of split (λ s = 0.06), and rate of merger (λ m = .08). .For these simulations the random location component H i (t) is an integrated Brownian motion, and the error component G i (t) is a Brownian motion for all targets.The three different cases studied here use three different time increments, ∆t = 1.0, ∆t = 0.5, and ∆t = 0.1.For ∆t = 1.0 we collect observations at times 0.0, 1.0, . . ., 9.0, for ∆t = 0.5 we collect observations at times 0, 0.5, . . ., 9.0, etc.For each of the three cases we obtained the results of N = 100 realizations.
Recall that the aim of this numerical exercise is to empirically demonstrate the consistency of the tracking estimate ( Û, V, P).Therefore we are interested in comparing the correct solution with other possible solutions that have high likelihood values.We use the following idea to find a set of high likelihood solutions that also contains the correct solution.First, we start with the correct solution and then consider making different changes to it.These changes are the same as those mentioned in the proof of Theorem 1; i.e., breaking apart a split into two deaths and a birth, breaking a track apart to form a birth and a death, etc.It is demonstrated in the proof of Theorem 1 that any other solution can be obtained by making a sequence of changes of this type.
The algorithm provided below is not intended as an algorithm to estimate unkown tracking solutions as it requires knowledge of the correct solution.However, it is sufficient for our purposes here, which is to show that eventually the correct solution has a high likelihood relative to other possible solutions.For an algorithm to use to solve the data association problem in practice see Storlie et al. (2009).Algorithm 1 proceeds as follows.
Step I: begin with the correct solution Step II: consider all possible changes of the types below to the solution 1. breaking a track into two track segments.
2. labeling a merger as two deaths and a birth.
3. labeling a split as a death and two births.
4. connecting a death with a birth to make one track.
5. labeling two deaths and a birth as a merger.
6. labeling a death and two births as a split.
Step III: of the solutions resulting from these changes keep only solutions ( Ũ , Ṽ, P) such that L( Ũ, Ṽ, P) > L(U,V,P)/M where L( is the conditional density given the data. Step IV: repeat Steps II-III on the solutions obtained from Step III until the top K solutions remain unchanged.When Algorithm 1 converges, we have a set of solutions with high relative likelihood, and we are also assured that the correct solution is in this set.The estimate ( Û, V, P) is the solution from this set with highest likelihood.Notice that Algorithm 1 does not directly consider switching observations between tracks, but this possibility is considered by breaking two tracks apart at the same time, then reconnecting them via Steps II-1 and II-4.
The simulation results for each of the time increments are given as the columns of Table 1.From Table 1 we can see that the estimation does improve substantially as ∆t gets smaller.We see a dramatic improvement in the number of correct estimates.The percentage goes from 51.0% for the ∆t = 1.0 case, to 84.0% for the ∆t = 0.5 case, to 92.0% for the ∆t = 0.1 case.The 95% confidence sets also have reasonable coverage for all sample sizes.It is worth noting that these results are actually illustrating a convergence in probability to the correct solution; i.e., P (( Û , V, P) = (U ,V,P)) → 1, which is guaranteed by the stronger result of Theorem 1. From Theorem 1 we know that the 8 realizations in the ∆t = 0.1 simulation that still have incorrect estimates could eventually be estimated correctly if we made ∆t small enough.

Conclusions & further work
In this paper we have provided theoretical justification for the multiple target tracking method developed in Storlie et al. (2009).We have given sufficient conditions under which the estimate will converge to the correct solution almost surely.Our theoretical analysis revealed the importance of using a robust estimate of the variance component for the random process G i (t).The theoretical results were then demonstrated by simulation.
One important direction for future work is to generalize these results to more complicated multiple target tracking problems.For example one could examine the asymptotic properties of the estimate when using a more complicated (and more realistic) model such as integrated Brownian motion in the likelihood calculation.Also, it would be of much practical interest to investigate the prop- It is difficult to calculate the exact density for (U b,j , U d,j , U s,j , U m,j | N (t j )), as they are dependent on each other.The rate of death λ d N (t), for example, changes when there is a birth, death, split or merger.Suppose U j = U b,j + U d,j + U s,j + U m,j .The exact distribution of (U b,j , U d,j , U s,j , U m,j ) would require us to sum over all the permutations of the order that the U j events could happen in the interval [t j , t j+1 ).For each of these permutations, we would have to calculate the probability that the sum of U j independent exponential random variables with respective rates (which are generally different) would be less than ∆t j = t j+1 − t j .Instead, we will approximate this probability by assuming that the rate of the occurrence of events stays constant during the interval [t j , t j+1 ).Specifically, we assume that the rate of each of the events during the interval is λb,j = λ b , λd,j = λ d N (t j ), λs,j = λ s N (t j ) and λm,j = λ m N (t j ) for birth, death, splitting, and merging respectively.
With this approximation, the variables (U b,j , U d,j , U s,j , U m,j ) are independent and P (U d,j = u) for example is the probability that the sum of u iid exponential random variables with rate λd,j are less than ∆t j .This is the same as the Poisson density with parameter λd,j ∆t j evaluated at u. Hence we have
Now consider the variables (V b,i , V d,i , V s,i , V m,i ).Under the same approximation that N (t) is constant during the interval [t j , t j+1 ), we have .

Appendix C: Derivation of location density
Recall, that only for the purpose of evaluating the likelihood in (7), we use the assumption that H i (t) + σG i (t) is a Brownian motion, or equivalently that H i (t) = 0, as stated in Condition 5.In this case, X i (t) is normally distributed for all t, and the observed location of all targets at all time points, has a multivariate normal distribution.Recall that the times at which the i th target is observed are denoted by t i = (t i,1 , . . ., t i,ni ), X i = (X i (t i,1 ), . . ., X i (t i,ni )) and X = (X 1 , . . ., X m ).Then, X ∼ N (µ X , Σ X ).
Recall from Section 2.2 that this mean and covariance will depend on the time of initiation, ξ, of the targets.The event variables U and V do not specify the exact values of ξ i and ζ i .They do however specify the interval between observations they are in.For the following, if it is known that ξ i is in the interval (t j , t j+1 ), we will set ξ i = t j + ∆t j /2.
Since µ X and Σ X depend on the exact values of ξ, this will be an approximation to the true density.In order to get the exact density, we would have to integrate out on the joint distribution of X and ξ, given that the ξ i 's are in their respective intervals.This would have to be done numerically and would not be feasible in practice.If the ∆t j are sufficiently small though, this approximation will be quite close to the true density.
Also recall from Section 2.2 that we need to then condition X on the random variable D and evaluate this density at D = 0. Let d i be the vector d defined in the last paragraph of Section 2.2.2 for the i th merging event, i = 1, . . ., N d .Then let be the difference, D, for the i th merging event.The random variable D For the collection of both X and D we have (X , D) ′ ∼ N (µ, Σ), where The conditional distribution of X given D = 0, which we will just call the distribution of X from this point forward, is given by the density [X | D = 0](x) = φ(x; µ * , Σ * ), where φ(x; µ * , Σ * ) is the multivariate normal density with µ In Section 3 we assume a Brownian motion for the random process G i (t) in (3).Using the Markov property the means and the covariances in (A5) can be well approximated by (A8) and (A9) below.If G i (t) is modeled by an integrated Brownian motion, the corresponding approximations for the means and covariances are given in Storlie et al. (2009).

Appendix D: Technical Lemmas
Here we present the Lemmas used in the proofs of Appendix E. All of the results in this section assume the Brownian motion model for G i (t) in (3).We begin with the Law of Iterated Logarithm for two-dimensional Brownian motion.The proof of Lemma 1 for Brownian motion in d dimensions can be found in Kallenberg (2002).Let • denote the Euclidean norm.
Lemma 1 (Law of the Iterated Logarithm).For B a Brownian motion in ℜ 2 , Lemma 2 is Levy's Modulus of Continuity for two-dimensional Brownian motion.Its proof follows the same general argument as for the one dimensional case given in Revuz & Yor (1999).
Lemma 2 (Levy's Modulus of Continuity).For B a Brownian motion in ℜ 2 , For the purposes of comparing likelihoods, we need a convenient form for the location density.This will require the following Lemma.To write the overall location density, each observation is conditioned on the previous observations and all of the D j variables from the mergers.We then take the product of all these conditional densities.Let X i = (X i,1 , . . ., X i,ni ) and let F i,j = (X i,1 , . . ., X i,j , X i−1 , . . ., X 1 , D 1 , . . ., D Nm , U, V).As usual let t i,j be the j th time at which the i th target is observed.Notice that we can write the x component of the likelihood as (A8) So we will need to give a convenient expression for [X i,j | F i,j−1 ].
Lemma 3.Under Conditions 1-4 of Theorem 1, and further that H i (t) = 0, for j = 2, . . ., n i , the distribution of X i,j given F i,j−1 is Gaussian with mean and variance given respectively by Proof.We can write the density of X i,j given F i,j−1 as For convenience of notation, drop the subscript i on t i,j to let t j = t i,j .Now the conditional distribution of X i,j given X i,j−1 = x i,j−1 and X i (ζ) = x i,ζ can be shown to be Gaussian with mean and variance given respectively by Also the conditional distribution of X i (ζ) given F i,j−1 = f i,j−1 is Gaussian with mean and variance denoted by µ x ζ and σ 2 x ζ .Notice that we never observe X i (ζ) so this conditional distribution has a variance σ 2 x ζ > 0. We can then write the integral in (A10) as Following some tedious algebra we can see that the above expression is equal to which we can recognize as the density of a Normal distribution with mean x j−1 + O(∆t j ) and variance σ 2 ∆t j − O(∆t j ) 2 .Now using Condition 1 gives the desired result.
The following Lemma is a version of the main Theorem presented in Shepp (1966).We will say that two measures µ 1 and µ 2 are equivalent if they have the same sets of measure zero.We will denote this µ 1 ∼ µ 2 .This Theorem gives necessary and sufficient conditions for a measure µ X imposed by a Gaussian process X(t) on [0, T ] to be equivalent to the measure imposed by a Brownian motion on [0, T ], µ B .Let m(t) = EX(t) and γ(s, t) = E(X(s) − µ(s))(X(t) − µ(t)).
Lemma 4 (Shepp's Theorem).Assume that (∂/∂s)γ(s, t) = is continuous for 0 ≤ s ≤ T , s = t.Then µ X ∼ µ B if and only if and there exists a function k ∈ L 2 for which The function k is unique and is given by k(t) = m ′ (t) for almost every t.
We will now use Shepp's Theorem to show that the measure imposed by a Brownian motion conditioned on {D j : 1 ≤ j ≤ N m } is equivalent to µ B .
Lemma 5. Assume that G i (t) is the Brownian motion associated with path i in (3).Consider the measure, μ imposed by G i given (D 1 = 0, . . ., D Nm = 0) where D j is the difference between parent paths at the time of merger plus an error as defined in (A4).Then μ ∼ µ B .
Proof.Without loss of generality (WLOG) we assume that the birth and death times are ξ i = 0 and ζ i = T so that G i is on [0, T ].Let m(t) and γ(s, t) be the mean and covariance functions corresponding to μ.We need to first show that (∂/∂s)γ(s, t) is continuous.Then show that conditions (A11) and (A12) hold.But the second condition is trivially satisfied since m(t) = 0.So we just need to show continuity and (A11).
Let s ≤ t and let Σ be the covariance matrix for the vector (G i (s), G i (t), D 1 , . . ., D Nm ) ′ .Also, let Σ 1 and Σ 2 , be the covariance matrices for (G i (s), G i (t)) ′ and (D 1 , . . ., D Nm ) ′ respectively.Finally let Σ 12 be the matrix that contains the pairwise covariances of the elements in (G i (s), G i (t)) ′ with those in (D 1 , . . ., D Nm ) ′ as its elements so that Notice that for functions f j (t) = Cov(G i (t), D j ), j = 1, . . ., N m .The matrix Σ 2 is positive definite so let v 1 , . . ., v Nm and λ 1 , . . ., λ Nm be its eigenvectors and eigenvalues respectively.We can write the covariance matrix of (B(s), Hence we have We claim that f j (t) = c j t for some constant c j .To see this notice that D j = X dj,1 (ζ dj,1 ) − X dj,2 (ζ dj,2 ) + ψ j where the vector d j is defined in (A4) so that X dj,1 and X dj,2 are the parents of a merger and X dj,3 is the child.So we have ).Now consider a path X j .If X j did not result from a birth then we can write X j (t) as the position of its parent(s) at time ξ j plus G j (t).We can continue breaking the parent paths up in this same way until we have where the c k are constants and Y j is the sum of random variables such as ψ m,k 's, ψ s,k 's, and X k (ξ k )'s which are initial locations of paths resulting from birth, all of which are independent of G i .All of the G j k are also independent of G i unless j k ′ = i for some k ′ .In that case we have If there is no such k ′ so that j k ′ = i then Cov(G i (t), X j ) = 0. Hence f j (t) = c j t for some constant c j possibly equal to zero.
This means that ∂ ∂s γ(s, t) is continuous unless s = t and also which satisfies (A11) and completes the proof.
The next Lemma is needed to compare the location densities of two different sequences of solutions ( Û , V, P) k and ( Ũ, Ṽ, P) k .It gives a bound for the ratio of two location densities for two track segments with different F i,j variables.Let F i (t) = F i,j ′ for j ′ = max{j : t i,j ≤ t}.Lemma 6. Assume Conditions 1-4 of Theorem 1, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solutions sequences, θk = ( Û , V, P) k and θk = ( Ũ , Ṽ, P) k that have one of the differences 1a, 1b, 2, 3, 5a, 5b or 6b from the propositions.Consider a track segment (x 1 , . . ., ) for all k for some i 1 and i 2 .If Xi1 (t j ′ ) and Xi2 (t j ′ ) are both the first observation of a track in their respective solutions then for some constants a and b which depend on ω.
If neither Xi1 (t j ′ ) and Xi2 (t j ′ ) are the first observation of a track in their respective solutions then for some constants a and b which depend on ω.
Proof.For (A15) consider the case where target i 1 is a birth.Then which is the mode of the normal density for the initial position of a target resulting from birth.This is true for any θ.Similarly if target i 1 is the result of a split or a merger, then respectively.Let C 1 be the maximum of the three quantities in (A18) and (A19).Now for any Brownian path on a finite interval P sup 0≤t≤T |B(t)| < ∞ = 1.Hence for every ω in a set with probability 1, sup 0≤t≤T |B(t)| ≤ M (ω) < ∞.So we know that x 1 is less than a constant for all ω.Also, [ Xi2 (t j ′ ) | F i1 (t j ′ −1 )](x) > 0 for −∞ < x < ∞ since for any of the three cases (target i 2 resulting from birth, split, or merger) it has support over the entire real line.Thus inf which gives the first result.Now consider the ratios in (A16) and (A17).We only need to show (A16) since they are equivalent if in (A17) we relabel x 1 as x 2 and so on.By Lemma 3 can write the ratio in (A16) as  and all of the O(h) are possibly different functions that tend to zero like ch as k → ∞.
We will need to calculate a bound for how much different the estimates σ2 and σ2 can be.Notice that in the conditions of any of the propositions, the sets of variables D and D used to calculate σ2 and σ2 can be different only by one variable.Now all of the Di,j = Di,j except for possibly one, and for the moment that Ñ = N − 1 and that D2 i ′ ,j ′ does not appear in the estimate σ2 .Then Now consider the quantity A j in (A20) we have So we can write B as Now this track segment is from a solution from the conditions of one of Propositions 1a, 1b, 2, 3, 5a, 5b, 6a, or 6b so all of the tracks are correct tracks segments.Hence all of the (x j−1 , x j ) pairs are consecutive observations from a scaled Brownian path.Therefore The first term in the exponent is O(1) a.s.by the strong law of large numbers.The second term is less than (C 2 log log k) a.s.by the Law of the Iterated Logarithm and the fact that n k ≤ ck for some constant c by Condition 1.The third term is less than (C 3 k/ N log log k) a.s.also by the Law of the Iterated Logarithm and the fact that N ≤ ck and n k ≤ ck for some constant c.The last term is O(1) because we are adding up less than k terms which are all O(k −1 ).Hence we are left with sup But in the propositions, we are restricting the solutions to have at most a total of M tracks.This means that for some constant c, N ≥ ck for any solution, since at least one track must be accumulating observations as k → ∞.Thus sup for some constants a and b.Combining this with (A21) gives the desired result.
Lemma 7 states that the probability that any of the two-dimensional Brownian motion paths will intersect at any time t in a finite interval is zero.This Lemma is an immediate consequence of Proposition 1.4.1 on page 353 of Khoshnevisan (2002).
Lemma 7. Consider any two (x, y) paths (X 1 , Y 1 ) and (X 2 , Y 2 ) from the location model in Section 2.2 where G 1 (t) and G 2 (t) are independent Brownian motions.Then, Lastly, Lemma 8 states that our robust estimate of the Brownian motion variance term is consistent.
Lemma 8. Assume Conditions 1-4 of Theorem 1, and further that H i (t) = 0.The estimate σ2 given by ( 10) is strongly consistent as k → ∞ when applied to the any sequence tracking solutions, ( Û , V, P) k , that has less than M tracks and its tracks are made up entirely of correct track segments.
where N = m i=1 n k i .Under the the assumption that all of the tracks are correct track segments, for each of the terms and they are independent.If we let (Z * j ) 2 = Z 2 j I Ei,j then we can write σ2 k as where Z j iid ∼ N (0, σ 2 ).There are a finite number of tracks in ( Û, V, P) which means N ≥ c 1 k for some positive constant c 1 , so N → ∞ as k → ∞.Hence it is sufficient to show that 1 N S * N → σ 2 as N → ∞ a.s.Let u n = ⌊α n ⌋, for α > 1 which is fixed for now.We first need to show that So it follows by Chebyshev's Inequality that the sum in (A25) is no more than and finally Hence by the first Borel-Cantelli Lemma, we have for all ε > 0. Taking a union over positive rational ε gives us that ( This is true for all α > 1.Now take α ↓ 1, rational, and we have lim Corollary 1.The estimate σ2 given by ( 10) is strongly consistent as k → ∞ when applied to the correct sequence of tracking solutions, (U,V,P) k .
Proof.Immediate consequence of Lemma 8.
. .3. incorrectly labeling a split as a death and two births.4. incorrectly connecting a death with a birth to make one track.5. incorrectly labeling two deaths and a birth as a merger.6. incorrectly labeling a death and two births as a split.
The following propositions in this section deal with each of the six possible differences between solutions listed above.Each one assumes something about two sequences of tracking estimates ( Û, V, P) k and ( Ũ , Ṽ, P) k .Propositions 1a and 1b deal with the first difference listed above.They basically say that asymptotically it is not beneficial to break a correct target track into separate tracks.Proposition 1a considers breaking the track at a fixed time, while Proposition 1b considers breaking the track at an arbitrary time.
Proposition 1a.Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solutions sequences, θk = ( Û, V, P) k and θk = ( Ũ, Ṽ, P) k that have the following property.All of the tracks that make up θk and θk are correct track segments and θk differs from θk at every k only by breaking a correct target track labeled i 1 from θk into two tracks by incorrectly specifying the death of target i 1 and the birth of target i 2 during a fixed time interval [t k j ′ , t k j ′ +1 ) for each k.Then, for some positive constant c which depends on ω.
Proof.WLOG assume that i 1 = 1 and i 2 = m, where m is the number of tracks in ( Ũ, Ṽ, P) k ; see Figure 4.Note that this implies m = m − 1 where m is the number of tracks in ( Û , V, P) k .From (9) of Section 2.3 the ratio of the densities in the proposition can be written as (A26) Write X i = (X i,1 , . . ., X i,ni ) and F i,j = (X i,1 , . . ., X i,j , X i−1 , . . ., X 1 , D 1 , . . ., D Nm ), with D j being the difference variables resulting from merger as given in (A4).By convention let F i,0 = (X 1 , . . ., X i−1 , D 1 , . . ., D Nm ).Then let F i,j and F i,j denote the F i,j variable for the solutions ( Û , V, P) k and ( Ũ , Ṽ, P) k respectively.Notice that we can write the x component of the likelihood for ( Û, V, P) k given Z as for example.So we can write the ratio of the x location densities in (A26) by breaking it apart into tracks 2, . . ., m − 1, which the two solutions have in common, then handle tracks 1 and m separately But for i = 2, . . ., m − 1, and j = 1, . . ., ni , we have Xi,j = Xi ′ ,j for some i ′ in 1, . . ., m − 1.The number of observations ni = ñi ′ for all of these tracks are the same as well.Also X1,j = X1,j for j = 1, . . ., ñ1 and X1,ñ1+j = X m,j for j = 2, . . ., ñ m.So by Lemma 6 the first, second and third terms of (A27) are no more than a m(log k) mb for some uniform constants a and b so that Since m ≤ M by Condition 8 we have sup (A28) Notice that the term in the numerator of (A28) is the density of the first observation of a new track and hence for all ( Ũ, Ṽ, P) k must be smaller than the mode of the normal density with variance σ 2 X0 .Also, the denominator can be written out by utilizing Lemma 3.So we have, sup The same is true for Y and it is independent of X .Recall that the variance parameter of the Brownian motion for Y i is denoted as η i .Then we have sup where B(t) is a Brownian motion or a Brownian motion conditional on the D ′ j s of Lemma 5.In either case, because of Lemma 5, any path properties of Brownian motion will apply.Also realize that all of the track segments in ( Û , V, P) k are correct so by Lemma 8, σ2 → σ 2 .Using this fact along with (A31) on D, we have where B and B are independent Brownian motions.Now remember that [t k j ′ , t k j ′ +1 ) is a fixed interval for each k.Also assume that B is a Brownian motion (not con- where B * is a Brownian motion.We can now apply Lemma 1 to D which gives Note that B * may have been a different Brownian motion for each k, but there is a countable number of them so the sets with probability zero can be joined.If B(t) was a Brownian motion conditioned on D j 's then the result in (A33) is still true since the path set for B(t) is the same as that of Brownian motion θk = ( Ũ, Ṽ, P) k that have the following property.All of the tracks that make up θk and θk are correct track segments and θk differs from θk at every k only by breaking a correct target track labeled i 1 from θk into two tracks by incorrectly specifying the death of i 1 and the birth of target i 2 during an arbitrary time interval [t k j ′ , t k j ′ +1 ).Then, for some positive constant c which depends on ω and some ε > 0 which can be made arbitrarily small.
Proof.This proof follows the exact same logic as the previous proof of Proposition 1a.The only difference being that now t k j ′ is an arbitrary time.Hence the only change will be to apply Lemma 2 instead of Lemma 1 to (A32).So for the D of (A30) we end up with for some constant C ′ and an arbitrarily small ǫ > 0. This along with (A30) gives us sup (A40) for some constant C ′′ and ε > ǫ which can still be taken to be arbitrarily small.
The event model is the same as previously in Proposition 1a, sup This along with (A40) gives us, sup Propositions 2 and 3 deal with the second and third differences listed earlier.These propositions say that it is not beneficial asymptotically to break apart a correctly specified merging or splitting event.
Proposition 2. Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solution sequences, θk = ( Û, V, P) k and θk = ( Ũ, Ṽ, P) k that have the following property.All of the track segments that make up θk and θk are correct track segments and θk differs from θk at every k only by relabeling a correct merging event of targets i 1 and i 2 into target i 3 as the deaths of targets i 1 and i 2 and the birth of target i 3 .Then for some positive constant c which depends on ω.
Proof.WLOG assume that i 1 = 1, i 2 = 2 and i 3 = 3.The relabeling described in the proposition is portrayed in Figure 5.Here we have that for i = 1, . . ., m, and j = 1, . . ., ni , the locations Xi,j = Xi ′ ,j for some i ′ .The number of observations ni = ñi ′ for all of these tracks are the same as well.Hence by Lemma 6 the ratio of the x location densities in (A26) is no more than a m(log k) mb for some constants a and b, and since there are less than M tracks the solutions ( Û , V, P) k and ( Ũ, Ṽ, P) k , we have sup Of course the same is true for Y as well so we have sup For the event model contribution, the event model for ( Ũ , Ṽ, P) k has 2 more events in [t k j ′ , t k j ′ +1 ), than ( Û , V, P) k .As in (A38) then we have sup This along with (A41) gives us the desired result.
Proposition 3. Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solution sequences, θk = ( Û , V, P) k and θk = ( Ũ, Ṽ, P) k that have the following property.All of the tracks that make up θk and θk are correct track segments and θk differs from θk at every k only by relabeling a correct splitting event of target i 1 into targets i 2 and i 3 as the death of target i 1 and the birth of targets i 2 and i 3 .Then for some positive constant c which depends on ω.
Proof.This is symmetric with respect to the difference between solutions in Proposition 2. The proof will therefore be identical.
We will say that two targets labeled i 1 and i 2 at times t 1 and t 2 are distinct if their labels are different in the correct solution (U ,V,P).That is to say that they are not the same physical target.Proposition 4 deals with the fourth difference listed earlier in the chapter.It assumes that ( Û, V, P) has the death of a target labeled i 1 and the birth of a target labeled i 2 in the same interval [t k j ′ , t k j ′ +1 ).These two events are not necessarily labeled correctly.However target i 1 at time t k j ′ is assumed to be distinct from target i 2 at time t k j ′ +1 .The solution ( Ũ, Ṽ, P) then connects these two track segments which is not consistent with the correct solution; see Figure 6.Proposition 4 then says that there can be no differences of this type eventually.
Proposition 4. Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solution sequences, θk = ( Û , V, P) k and θk ( Ũ, Ṽ, P) k that have the following property.The sequences θk and θk differ at every k only by joining the birth and death of two targets i 1 and i 2 which are distinct into one target track.Then Θ is the empty set.
Proof.WLOG assume that i 1 = 1 and i 2 = 2.The observations that get incorrectly joined together in a track in the solution ( Ũ, Ṽ, P) are ( X1 (t k j ′ ), Ŷ 1 (t k j ′ )) and ( X2 (t k j ′ +1 ), Ŷ 2 (t k j ′ +1 )).Now these are observations from distinct targets in the correct solution so by Lemma 7 and the continuity of Brownian paths, we know that So by Condition 8 there cannot be a solution that connects these two observations in the same track eventually.Hence such a sequence ( Ũ , Ṽ, P) k does not exist.
Note that this is the only place where we use the second part of Condition 8.This is a very reasonable assumption to make, since it only prevents us from forming discontinuous paths.It seems however, that the likelihood should prevent us from doing this anyway.We do indeed believe that this is the case, but need to develop tighter bounds in formulation of Lemma 6 before we can remove the second part of Condition 8.
The next proposition deals with the fifth difference listed in the beginning of the chapter.It basically says that it is not advantageous asymptotically to take actual deaths and a birth and merge them together.
We will say that an event in ( Û, V, P) k corresponds to an event in (U,V,P) k if they happen to the same target in the same interval.For example if target 1 dies in the interval [t k j ′ , t k j ′ +1 ) in ( Û, V, P) k and in (U ,V,P) k target 1 merges with target 2 in that interval, then the death of target 1 in ( Û, V, P) k corresponds to the merger of target 1 with target 2 in (U,V,P) k .Also two events in a solution are distinct if they are not the same event.
Proposition 5a.Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solution sequences, θk = ( Û, V, P) k and θk = ( Ũ, Ṽ, P) k that have the following property.All of the tracks that make up θk and θk are correct track segments and θk differs from θk at every k only by declaring a merging event in place of two deaths and a birth for two targets i 1 and i 2 that died in ( Û, V, P) k with a target i 3 that was born in ( Û, V, P) k .It is further assumed that the deaths of targets i 1 and i 2 and the birth of target i 3 may be incorrectly specified in ( Û , V, P) k , but at least two of these three events must correspond to two distinct events in the correct solution (U ,V,P) k .Then, Proof.WLOG let i 1 = 1, i 2 = 2, and i 3 = 3.The relabeling described in the proposition are portrayed in Figure 7. Let the unknown times at which the two distinct events occur be denoted τ 1 and τ 2 and assume WLOG that τ 1 ≤ τ 2 .Since these are the times of actual events from the event model, τ 1 < τ 2 with probability 1. Hence by condition 1 there is a K such that for all k > K there will be a sample time, t k j ′ , in the interval (τ 1 , τ 2 ).For this sample time, one of the targets involved in the proposed merging event in ( Ũ , Ṽ, P) k will be missing unless targets 1 and 2 merge before before τ 1 .But missing observations are not allowed and both targets still exists in ( Û , V, P) k at time t k j ′ so this would violate the hypothesis.Thus [ Ũ , Ṽ, P) k , Z k ](( Ũ , Ṽ, P) k , Z k ) = 0 eventually.Proposition 5b is very similar to proposition 5a, but now there is no restriction that any of the two deaths and a birth in ( Û, V, P) k correspond to events in (U,V,P) k .Thus they can be at arbitrary times.In this case there may be an advantage to combine these three events into a merger by switching to the alternative ( Ũ , Ṽ, P) k .This will not be a problem however as we will see that there would have to be too many other negative differences before ( Ũ , Ṽ, P) k could make use out of any possible advantage it may gain from the difference in Proposition 5b.Proposition 5b.Assume Conditions 1-4, and further that H i (t) = 0. Let Θ be the set of all pairs of tracking solution sequences, θk = ( Û , V, P) k and θk = ( Ũ, Ṽ, P) k that have the following property.All of the tracks that make up θk and θk are correct track segments and θk differs from θk at every k only by declaring a merging event in place of two deaths and a birth for two targets i 1 and i 2 that died in ( Û , V, P) k with a target i 3 that was born in ( Û , V, P) k .Then Proof.This differs from the previous proposition since, the deaths of targets i 1 and i 2 and the birth of target i 3 in ( Û, V, P) k do not necessarily correspond to events in the correct solution.WLOG assume that i 1 = 1, i 2 = 2 and i 3 = 3.Here we have the reverse case of Proposition 2, so again we have that for segments and hence fit into the conditions of their respective propositions.In actuality, there can be no difference 4's eventually by Proposition 4 anyway.So there is a sequence of solutions that starts with (U,V,P) k and passes through several incorrect solutions to arrive at ( Ũ , Ṽ, P) k .Each element of this sequence has one and only one of the differences described above from the previous element.We shall write this sequence as (U,V,P) k , (U 1 ,V 1 ,P 1 ) k , (U 2 ,V 2 ,P 2 ) k , . . ., (U l ,V l ,P l ) k , ( Ũ, Ṽ, P) k .
We can write the likelihood ratio of any incorrect solution ( Ũ , Ṽ, P) k to that of the correct solution (U,V,P) k as Let Θ be the set of all tracking solution sequences satisfying Condition 8 eventually.That is they have no more than M tracks and they restrict the distance between consecutive observations in a track to be less than (c log k −1 ).We claim that the correct sequence of solutions (U ,V,P) k is in this set.
Obviously (U,V,P) k has no more than M tracks.And for the difference between consecutive observations, in any of the tracks we have by Lemma 2. Hence all of the consecutive differences in the correct solution will eventually be smaller than c log k −1 .Also note that by Proposition 4, there can be no sequence ( Ũ , Ṽ, P) k ∈ Θ that has any difference 4's from the correct solution.
Let Θ ′ = Θ\{(U,V,P) k }.We need to show that the supremum over ( Ũ , Ṽ, P) k ∈ Θ ′ of the ratio in (A43) converges to 0 as k → ∞.By Propositions 1a, 1b, 2, 3, 5a, and 6a, any of the ratios in (A43) that have differences 1a, 1b, 2, 3, 5a, and 6a, are ≤ O(k −2 (log k) c ).However, if any of the terms in (A43) have differences 5b or 6b, they can be as big as ck 2 (log k) c .If there are two many of these differences then the ratio may not converge to 0. We will then have to consider how these differences could be applied to obtain ( Ũ , Ṽ, P) k .We will first consider how there can be one difference 5b or 5a in an interval [t k j ′ , t k j ′ +1 ), then consider multiple differences.
Suppose exactly one difference 5b was applied to (U i ,V i ,P i ) k during the interval [t k j ′ , t k j ′ +1 ) to obtain (U i+1 ,V i+1 ,P i+1 ) k .For difference 5b, we must merge together two deaths and a birth, of which no two of these three events can correspond to distinct actual events in (U ,V,P) k .Otherwise this would be difference 5a.The three events must also be in the same interval [t k j ′ , t k j ′ +1 ), otherwise in a manner similar to that of Proposition 5a, the ratio [(U ,V,P) k | Z]((U i+1 ,V i+1 ,P i+1 ) k ) [(U ,V,P) k | Z]((U i ,V i ,P i ) k ) = 0 eventually.So before we can apply difference 5b, we must first use differences 1a, 1b, 2, or 3, to create at least two of the three events (two deaths and a birth) in (U i ,V i ,P i ) k .Notice that we cannot use differences 2 and 3 together to create these events since then two of the three events would correspond to distinct events in (U,V,P) k .
So there are exactly five ways difference 5b can be applied.
1. We could use a correctly labeled death in (U i ,V i ,P i ) k .We would then still need a birth and a death.This would require using at least one difference 1a in the interval [t k j ′ , t k j ′ +1 ) previously in our sequence of solutions to get of difference 1a in the interval [t k j ′ , t k j ′ +1 ) previously in our sequence of solutions.So the contribution of the N * differences 5b and/or 6b to the ratio of (A43) is no more than (R 1a ) N * (R 5b ) N * = O(k −N * (log k) 2N * c ).
2. We could use a correctly labeled birth in (U i ,V i ,P i ) k .We would then still need at least N * deaths.This would require using at least N * applications of difference 1a previously.So the contribution of differences 5b and/or 6b to the ratio is again no more than 3. We could use difference 2 to get two deaths and a birth previous to (U i ,V i ,P i ) k , but we would still need N * − 1 more births which would require N * − 1 applications of difference 1a previously.This makes the contribution of differences 5b and/or 6b to the ratio no more than 4. We could use difference 3 to get a death and two births previous to (U i ,V i ,P i ) k , but we would still need at least N * − 1 more deaths.Hence we would need to apply difference 1a at least N * − 1 times previous to (U i ,V i ,P i ) k get the other deaths.This makes the contribution to the ratio no more than R 3 • (R 1a ) N * −1 (R 5b ) N * = O(k −(N * −1) (log k) 2N * c ).
5. Lastly, we could apply the differences 5b and/or 6b in an arbitrary time interval.Let N 5 and N 6 be the number of applications of difference 5b and difference 6b respectively so that N This means we need at least ⌈(5/4)N * ⌉ applications of difference 1b previous to (U i ,V i ,P i ) k , which means the contribution of the N * differences 5b and/or 6b to the ratio is no more than (R 1b ) (5/4)N * (R 5b ) N * ≤ Ck −(1/2+ǫ)N * (log k) N * c = O k −(1/2+ε)N *

Fig 1 .
Fig 1. Illustration of the tracking problem.

Fig 5 .
Fig 5. ( Ũ , Ṽ, P) k incorrectly breaks apart a merger into two deaths and a birth.
5 + N 6 = N * .Let N b and N d be the minimum number of births and deaths needed respectively.Notice that N b = N 5 +2N 6 and N d = 2N 5 +N 6 .The minimum number of applications of difference 1b that we would need is minN 5 ,N 6 N5+N6=N * {N b ∨ N d } = min N 5 ,N 6 N5+N6=N * {(N 5 + 2N 6 ) ∨ (2N 5 + N 6 )} .This minimum is achieved when N 5 = N * The events of birth, death, splitting, and merging are distributed according to the Event Model of Section 2.1.The error component, G i (t), of the observed location model in Section 2.2, is a Brownian motion for all targets.

Table 1
Results of 100 Realizations from a Brownian Motion Model.% Est Correct is the percentage of times that ( Û , V, P) was equal to the correct solution (U ,V,P).% Births Correct, % Deaths Correct,% Splits Correct, % Mergers Correct are the percentages of all birth, death, splitting, and merging events, respectively, in the simulation that were labeled correctly by the estimate.% Falling in 95% CS is the percentage of times that the 95% confidence set (formed for each of the 100 data realizations) contained U i ,V i ,p i ) k (Fig 8. Break (U ,V,P) k into track segments and connect them to get ( Ũ , Ṽ, P) k .