Uniform Confidence Band for Optimal Transport Map on One-Dimensional Data

We develop a statistical inference method for an optimal transport map between distributions on real numbers with uniform confidence bands. The concept of optimal transport (OT) is used to measure distances between distributions, and OT maps are used to construct the distance. OT has been applied in many fields in recent years, and its statistical properties have attracted much interest. In particular, since the OT map is a function, a uniform norm-based statistical inference is significant for visualization and interpretation. In this study, we derive a limit distribution of a uniform norm of an estimation error for the OT map, and then develop a uniform confidence band based on it. In addition to our limit theorem, we develop a bootstrap method with kernel smoothing, then also derive its validation and guarantee on an asymptotic coverage probability of the confidence band. Our proof is based on the functional delta method and the representation of OT maps on the reals.


Introduction
We consider the framework of optimal transport (OT) and develop a statistical inference method on an OT map used in the concept.Specifically, we consider the Monge problem; for probability measure  and  on R, we consider the following optimization problem min where ℎ : R → R is a nonnegative convex function (where ℎ() =  or ℎ() =  2 in most applications) and T (, ) is a set of measurable maps  : R → R such that ( −1 ( )) = ( ) for any measurable  ⊂ R. Let  0 ∈ T (, ) be an OT map which is the minimizer of the problem (1).Our goal is the statistical inference on the OT map from samples: we construct a confidence band for  0 based on samples independently generated from  and , respectively.To this end, we develop estimators for the OT map estimator and derive a limiting distribution of their estimation error.
OT is a general framework to measure the distance between probability measures, and has attracted attention in a wide range of fields related to data analysis, such as social science, statistics, machine learning, and image analysis [ACB17,Gal18,TGR21].For a general textbook, see [V + 09, PC + 19, Vil21].The formulation of OT is given by the Monge problem (1) or the Kantorovich problem related to the Wasserstein distance [Kan60], in which the OT map plays an important role.OT plays a major role in modern data science due to its flexible extensibility and adaptability to high-dimensional data.
Statistical analysis for OT has played an important role in the usage of OT with finite samples, which are useful in assessing the uncertainty of estimated elements of the OT framework.The most representative analysis examines the sample complexity of estimation methods for OT.One of the typical interests is the problem of estimating the sum of optimized transport costs, which corresponds to the Wasserstein distance or its variants, and numerous studies have proposed optimal estimation methods [WB19, GCB + 19, NWR22].In recent years, several studies [HR21,DGS21] have developed estimation methods for OT maps and also investigated their theoretical accuracy.On the other hand, statistical inference for OT, i.e., statistical tests and confidence sets, has also attracted attention.In addition to inference methods on OT costs and distances [dBL19, SM18, OI23], [GKRS22a,SGK23,MBNWW23] propose inference methods on OT maps in several settings.
Despite the importance, the statistical inference for OT maps is still a developing issue, even when the sample is one-dimensional.This is because that OT maps are functions and hence it is nontrivial to handle its uncertainty properly in a function space.In particular, when investigating a global shape of an OT map (e.g., monotonicity) in some application fields, it is necessary to examine errors of the OT map estimator in the uniform norm sense, which is a nontrivial challenge.
In this paper, we develop a framework of confidence bands for an OT map between two unknown one-dimensional distributions.Specifically, we develop a uniform confidence band that measures the OT map considering all input points in its support.A uniform confidence band has the following advantages.First, it allows us to study the global shape of the OT map to be estimated.This advantage can not be achieved by arranging pointwise confidence intervals, which generally results in highly discontinuous curves.Second, it can be visualized on a plot and hence is easy to interpret, in contrast to confidence intervals based on the  2 -norm, which cannot be easily visualized.Our approach is to use kernel smoothing on the distribution functions, and then perform a bootstrap scheme with kernel smoothing on the estimated distributions in order to construct a confidence band.As a theoretical contribution, we derive the asymptotic distribution of the estimation error in order to validate our confidence band.
Our contributions can be summarized as follows: • We develop the statistical inference on OT maps by proposing the uniform confidence bands that are computationally tractable.Coverage probabilities of the proposed bands are theoretically validated, and our simulations support the validity.• The uniform confidence bands allow us to study a global property of functions, which can investigate complex hypotheses on OT maps.Our real-data experiments with population data demonstrate that our method rigorously tests a hypothesis whether the population pyramid changes faster, by examining a difference in shape between the OT map and a linear function.
As a technical contribution, we derive a simple asymptotic representation of the OT map estimation error as a linear sum of estimation errors for cumulative distribution functions.From this, we can apply the functional delta method to the representation, thereby obtaining the asymptotic distribution of the OT map estimation error.

Related Studies
Optimal transport has been actively studied in recent years.For a comprehensive review or textbook, see [V + 09, PC + 19, Vil21].It is known that there are several variations of the OT problem, such as the entropic regularization [Cut13] or a low-dimensional projection named slicing [BRPP15].Since there are numerous papers on OT, we refer to only a few that are relevant to our study.
Many studies have examined the sample complexity of the problem of estimating elements of OT from observed samples.
A typical estimation target is a sum of optimized transport costs.In the setting of onedimensional data, [BL19] gives a comprehensive analysis for the statistical properties of OT costs.[Dud69, WB19, MNW21] studied the estimation of the cost and reported that an optimal convergence rate of the estimation error is slow depending on the dimension of data, i.e., there is the curse of dimensionality.Related to this curse, [NWR22, GCB + 19, LFH + 20, Lei20] reported that the curse can be reduced by introducing the low-dimensional projection or the entropic regularization to the OT problem.The estimation error of an OT map is also a target of intense study.[HR21] has shown that an estimation error of OT maps also suffers from the curse of dimensionality of data, as is the case for the sum of costs.Similarly, entropic regularization and other techniques have been shown to mitigate this rate [DGS21, MBNWW21, PNW21, DNWP22].
For statistical inference, it is also a concern to derive limiting distributions of an error of estimators.Importantly, when data are multi-dimensional, it is not easy to obtain the limiting distributions relevant to the OT problem.Therefore, statistical inference is performed for the OT problem when the data are one-dimensional or discrete, or when some types of regularization are introduced.For a limiting distribution of estimating a sum of transportation costs, [MC98, RGTC17, DBCAMRR99, DBGU05] studied the one-dimensional case, and [SM18, BCP19, TSM19, KTM20, OI23] studied the discrete data case.[MNW19, dBGSL24, dBSLNW23, GKRS22b, MBW22, IOH22] investigated a case with the regularization.[DR23] constructs a confidence interval based on the OT distance in the problem of estimating density functions.This situation is similar to the limiting distribution of estimating OT maps.[GKRS22a] developed a statistical inference method on an OT map for the regularized OT problem case, and [SGK23] derived a limiting distribution of an estimator for an OT map with a setting of semi-discrete data.[MBNWW23] investigated the availability of limiting distributions in the multi-dimensional case and derived a pointwise convergence to a limiting distribution under certain conditions.

Notation
A cadlag function is a right continuous function whose limits from the left exist everywhere. [, ] denotes the space of all cadlag functions  : [, ] → [−1, 1] equipped with the uniform norm.For a topological space Ω,  1 (Ω) denotes the space of Lipschitz functions ℎ : Ω → [−1, 1] with Lipschitz constants bounded by one.∥ • ∥ denotes the Euclidean norm.∥  ∥ ∞ = sup  |  ()| denotes the sup-norm.ℓ ∞ (F ) denotes the set of all uniformly bounded real functions on R.   is the Dirac measure at .With an event , 1{ } is an indicator function.
For any non-negative integer,  and open set  ⊆ R,   () is the space of all bounded continuous real-valued functions that are -times differentiable on .This can be extended to the notion of Hölder space:   () is the space of functions  ∈  ⌊ ⌋ () (⌊⌋ is the integer part of ) whose ⌊⌋-th derivative is Hölder continuous with exponent  − ⌊⌋, that is, there is some constant  > 0 such that for all ,  ∈ , | ⌊  ⌋ () −  ⌊  ⌋ ()| ≤  |−| − ⌊ ⌋ holds.We say that a function  is (continuously) differentiable on [, ] if there is a small  > 0 such that  is (continuously) differentiable on ( − ,  + ).

Paper Organization
Section 2 gives the optimal transport problem and its associated statistical inference problem.Section 3 develops a uniform confidence band of the OT map as our proposed methodology.Section 4 shows theoretical guarantees of the developed confidence band with an outline of the proof.Section 5 conducts a numerical simulation to show the experimental validity of the proposed confidence band.Section 6 handles a real data analysis to demonstrate the usefulness of our method.Section 7 finally concludes our study.

Optimal Transport Problem with Samples
We consider the setup with  = 1.Let   and   be distribution functions of  and , respectively.We shall make an assumption that   is continuous, which guarantees a solution to the OT map problem (1) of the form (see e.g.[Vil21, Remarks 2.19]): Throughout the paper, we will only consider the values of the OT map  0 on a closed interval [, ].For simplicity, we make an abuse of notation and treat  0 and members of T (, ) as functions on [, ].Suppose that we observe  i.i.d.samples  1 , ...,   ∼  and  i.i.d.samples  1 , ...,   ∼ .For simplicity, we assume that  and  follow the following asymptotic that often appears in nonparametric two-sample tests: with some  ∈ (0, 1): (3) Our goal with this problem is to infer the true OT map  0 using the samples.Specifically, we construct an estimator of  0 based on the sample, as well as the following statistical inference.

Statistical Inference via Confidence Band
We aim to develop an estimator for the OT map  0 and conduct statistical inference for  0 from  observations.Specifically, with pre-specified  ∈ (0, 1), we will develop a set C ( ) = {[(), ()] |  ∈ [, ]} with some functions (), () depending on the  samples and  samples and shows that as ,  → ∞ with the limit ratio (3).
Importantly, our interest is a confidence interval using the sup-norm, rigorously, our theory will show that  0 () ∈ [(), ()] holds for every  ∈ [, ] with the asymptotic probability 1 − .The sup-norm is useful for intuitive analysis because it provides visualization in a functional space, whereas several common norms, e.g., the  2 -norm, cannot be visualized on a plot.

Methodology
In this section, we describe our methodology for constructing a confidence band for the OT map.First, we propose a kernel estimator for the OT map (Section 3.1).We then construct a confidence band based on this estimator and a bootstrap scheme (Section 3.2).Donlapark, Okano, and Imaizumi/Uniform Confidence Band for OT Map on 1D Data 6

Kernel Estimator for OT Map
Suppose that we observe the samples  1 , ...,   ∼  and  1 , . . .,   ∼ .We estimate density and distribution functions from the samples, then use them to construct an estimator for the OT map  0 .
First, we define a kernel density estimator.As a preparation, we define a kernel function  : R → R, which should be a positive function and satisfy ∫  () = 1.There are several common choices: the Gaussian kernel, the Epanechinikov kernel, and so on (for an overview, see [Sil86]).Then, we define density estimators as Since   and   are strictly positive by the positive property of the kernel ,   and   are strictly increasing function and hence invertible.
We define an estimator for the OT map  0 using the kernel estimators.Using the form (2), we define the kernel smoothed estimator for the OT map  0 : . This estimator has several advantages.First, we can achieve a smooth estimation, and hence a smooth confidence band, for any input .Second, we can guarantee the asymptotic validity of the uniform confidence band using smoothness.Third, as will be shown later, we can develop a bootstrap method with kernel smoothing for practical use and show its convergence.We will compare this kernel-based approach with a pointwise estimator using empirical distributions in Remark 2.
These conditions are widespread in density estimation using kernel methods.Regarding the order of the kernel, for example, [Tsy08] describes a method to construct kernels of arbitrary order using Legendre polynomials.The setup of the bandwidth parameter is commonly used as well, which is designed to balance bias and variance in density function estimation.See [Tsy08] for details.
Remark 1 (Choice of kernels and bandwidth).The choice of the kernel  and bandwidth   in Assumption 1 is one of the methods used to implement under-smoothing, which is common in constructions of confidence bands as summarized in Section 5.7 of [Was06].In our design, we increase the order of the kernel  from  to  + 1/2 while keeping the bandwidth as  −1/(2+1) , the usual choice that leads to optimal rates of convergence in kernel density estimation.This approach with the larger order kernels follows Corollary 2 of [GN07], which in turn allows us to utilize the functional delta method in the proof of our main theorem below.
We further put the following assumption on a density function.
] and  has bounded variation.
We remark that Assumption 2 is more flexible than assuming that   is positive on R as it allows distributions that are supported on a proper subset of R, such as gamma or chi-squared distributions.

Construction of Confidence Band
We develop our methodology to construct a confidence band for  0 based on the estimator  , .

Bahadur Representation of Estimation Error
Our basic strategy is a Gaussian approximation of the estimation error  , () −  0 ().Specifically, we investigate the following scaled estimation error then show that it converges in distribution to a limiting Gaussian process as ,  → ∞, where   () is a standard deviation of  , () −  0 ().To achieve the Gaussian limit of (4), the following asymptotic linear form, named the Bahadur representation, plays an important role: Proposition 1 (Bahadur Representation).Suppose that Assumption 1 and 2 hold.Then, for any  ∈ [, ], we have where Since we assume that  has a density   that is nowhere zero, the representation holds for all  ∈ [, ].With this asymptotic linear representation, we can guarantee the existence of Gaussian processes in the limit.See Theorem 3 below for a formal result.

Plugin-Estimator for Standard Deviation
To use the limiting Gaussian process in practice, we should derive several values.First, we consider the standard deviation term   () of  , () −  0 () appearing in (4).In view of (5), we consider the following form of the standard deviation as in which   ( 0 ()) =   ( −1  (  ())) =   () plays an important role using the continuity of   .In practice, we do not know   ,   , and   , so it is impossible to compute   ().Instead, we estimate   () by the plug-in estimator: Note that the plug-in estimator always exists by the positivity of   .The following result shows its consistency: Lemma 2. Suppose that Assumption 1 and 2 hold.Then, we have the following convergence:

Bootstrap Approach with Kernel Smoothing and Confidence Band
We approximate the Gaussian process for which (4) converges by a distribution generated by a bootstrap method.Specifically, we develop a bootstrap method with kernel smoothing which newly generates samples from the estimated distribution functions   and   by the smooth kernels.In the bootstrap scheme, we sample  * 1 , . . .,  *  ∼   and  * 1 , . . .,  *  ∼   .Define bootstrap distribution functions  *  () Then, we consider the bootstrap estimator for the OT map  0 as Note that  *  and  *  are not subsamples of the dataset, but are generated from   and   .This approach is more suitable when we apply the functional delta method to validate a confidence band in our proof.
Using the distribution of the bootstrap estimator  * , , we derive quantiles of the distribution of the bootstrap version of the estimation error.Let   and   denote the conditional probability given  1 , ...,   and  1 , ...,   , respectively.For any  ∈ (0, 1) define , under P , :=   ⊗   .Then, we propose the bootstrap confidence band Note that except for the bandwidth   , this confidence band is computed in a data-driven way.Also, we will later propose a method to select   based on the observed samples.
Remark 2 (Comparison with a pointwise confidence interval).Another natural approach is to construct a pointwise confidence interval for the OT map by plugging in the empirical distributions.To complement the main study, we present in Section A.4 a methodology for constructing a pointwise confidence interval, along with proof of its asymptotic validity and its empirical evaluations.Compared to this pointwise approach, our main proposed kernel-based method has several advantages.First, it produces smooth confidence bands, which benefit interpretability.In contrast, pointwise confidence intervals based on the empirical distributions are riddled with discontinuities and can be difficult to interpret.Second, the uniform confidence band can be used to infer the OT map across the whole domain.For instance, a 95%-level uniform confidence band has asymptotically 95% chance to cover the true OT map over the domain.In contrast, a 95%-level pointwise confidence interval never contains all, but asymptotically only 95% of the true OT map.
Remark 3 (Relation to ROC curves).We discuss a relation of the confidence band for OT maps to that for ROC (receiver operating characteristic) curves.ROC curves have a similar form   •  −1  and its confidence analysis has been developed by [HHZ08].As a point of distinction between our study and the existing work, we develop the confidence band whose width differs for each input  ∈ [, ].Rigorously, we have introduced the standard deviation   () and its estimator, then our confidence band achieves the adaptive widths for each input .This result is in contrast to the confidence band by [HHZ08] for ROC curves, which has a constant width independent of .

Theoretical Result
In this section, we present the main theoretical contributions of this paper, namely the bootstrap consistency (Theorem 3) and the asymptotic validity of the confidence band C ( )  , (Corollary 4).We first state the theorems in Section 4.1, and then provide an outline of the proof of Theorem 3 in Section 4.2.The full proofs of the theorems can be found in Appendix C and D.

Validity of Confidence Band
We start with a consistency result of the bootstrap estimator.Notice the inclusions of the supremums on the left-hand sides of (8) and (9), which are essential for obtaining a uniform confidence band of  0 .
Theorem 3 (Bootstrap consistency).Suppose that the distribution functions   and   satisfy Assumption 1 and 2.Then, there are independent Brownian bridges G 1 and G 2 such that for any  0 > 0, the followings holds as ,  → ∞ and /( + ) → : and where is a Gaussian process with unit variance for any  ∈ R.
This theorem implies that the supremum values of both the scaled estimation error by the kernel estimator and the estimation error by the bootstrap estimator converge in distribution to the supremum of the same Gaussian process.In essence, the convergence (8) is the intermediate result on the estimation error of the kernel method, and the convergence (9) additionally provides the convergence of the bootstrap method.
Based on this result, we show the asymptotic validity of the proposed confidence band: Corollary 4 (Asymptotic Validity of Bootstrap Confidence Band).Consider the proposed confidence band C ( )  , in (7).Then, under Assumption 1 and 2, the bootstrap confidence band C ( ) , is asymptotically uniformly consistent at level 1 − , that is, it holds that as ,  → ∞ and /( + ) → .
This result shows that our confidence bands are asymptotically valid in a uniform sense, that is, the OT map  0 is included in our confidence band for every input  simultaneously with the probability.

Proof Outline of Theorem 3
Our proof consists of two parts: the first is the convergence of the estimated distributions by the kernel of the developed bootstrap method, and the second is the convergence of an application of the functional delta method.The details are described below.

Convergence of Estimated Distributions
As a preparation, we first derive limiting Gaussian processes of the distributions   ,   ,  *  , and  *  .We first describe the analysis of the distributions  *  and  *  by the bootstrap method with kernel smoothing.Rigorously, the central limit theorem in [KMT76] shows that there exists a Browninan bridge G 1 such that the following holds for every  > 0: as  → ∞.This shows that the error  *  () −   () by the bootstrap with kernel smoothing converges to the Brownian bridge with the proper scaling in the uniform norm sense.For technical reasons in further proof, we also derive the convergence in the sense of bounded Lipschitz metrics, that is, we obtain Here, we denote by E  the expectation with respect to   .Similarly, we can obtain a similar limiting statement for the error  *  () −   () by the other distribution by the bootstrap.
Next, we also analyze the error by the estimated distributions   and   by the kernel method.We apply the seminal analysis on the convergence of kernel convolutions [GN07] and obtain the following joint convergence: as  → ∞.Here, G 1 and G 2 are some independent Brownian bridges.

Functional Delta Method
We study the convergence of the estimator  * , by using the above convergence results of the distributions and the representation (2) of the OT map.To the aim, we use the functional delta method (see Appendix B for a brief exposition).
Formally, we define a functional  : which implies that  * , = (  *  ,  *  ),  , = (   ,   ), and  0 = (  ,   ).Then, we shall make a first-order approximation (   ,   ) − (  ,   ) ≈  ′ (   −   ,   −   ).Thus it is important to first derive the Hadamard derivative  ′ .Lemma 5. Let [, ] satisfy the conditions in Proposition 6. Define the functional  : We slightly extend this derivative for the design of confidence bands.Define a functional Using Lemma 5, we derive its derivative as Note that we have added the term   (•), which determines the scale of the confidence band.
Finally, we apply the functional delta method (Lemma 10 in Appendix) and study the limit of the estimation error of the estimator  *  as where   is defined in (10).By a similar discussion, we also prove that the estimation error  , −  0 of the kernel estimator also converges to the same Gaussian process.In addition, we give evaluations of several plug-in estimators such as  , (•), then obtain the statement of Theorem 3.

Simulation design
To support our asymptotic validity results, we perform a Monte Carlo simulation to evaluate the coverage probabilities of the confidence bands.For 1,000 iterations,  1 Evaluations of our (1 − )-level uniform confidence bands of the optimal transport map from  (0, 1) to Gamma(5, 0.5) based on 1,000 pairs of samples from each distribution.The table displays the median of average widths and the coverage probabilities of the confidence bands on [ −2.5, 2.5].
To evaluate our confidence bands, we estimate the coverage probability as the proportion of 1000 runs in which  0 () is contained in the confidence band for all  ∈ [−2.5, 2.5].Additionally, we assess the confidence bands by calculating the median of the bands' average widths.

Result and discussion
The median of per-iteration average widths and the coverage probabilities over  ∈ [−2.5, 2.5] for  = 0.5 and  = 200, 700 and 2000 are shown in Table 1.From the table, we can see that the coverage probabilities approach the nominal probabilities (1 − ), and the widths become smaller as  increases.In particular, when  = 2000 and  = 0.90 or 0.95, the coverage probabilities are slightly larger than the nominal probabilities.
The plots in Figure 1 illustrate the coverage probability and median of average width as functions of .These plots lead us to the same conclusion: as  increases, the average coverage probabilities approach the nominal probabilities, and the width of the band decreases.
We now examine the uniform confidence bands of a specific sample with  = 0.5, 1.0 and 1.4 (recall Assumption 1 that  + 0.5 must be less than the order of the Gaussian kernel, which is 2).The plots of the true optimal transport maps, their kernel estimates, and the uniform confidence bands are shown in Figure 2. We observe that for  > 1.8, the estimated transport map (the red curve) remains significantly distant from the actual transport map (the black curve).This disparity arises due to the heavier right tail of Gamma(5, 0.5) compared to that of  (0, 1).Consequently, there is an inadequate number of sample points on the right tail of  (0, 1) to estimate the transport map between the two distributions.
As  increases, the kernel bandwidths increase and the confidence bands become smoother.Note that if the actual density functions are rougher than   (R), the kernel estimate and the confidence band might be too smooth.

Real data analysis
As an application, we use our confidence bands to assess the uncertainty of our estimate of the transport map of the distribution of ages of death in 2001 to those in 2021.The data of the age of deaths from 12 countries were taken from the Human Mortality Database [Max23].For each country, let us simply denote the dataset from the year 2001 by , and those from the year 2021 by  .Let | | =  and | | = .Assume that the observed age of deaths in 2001 and 2021 are sampled from two separate continuous probability distributions.Then there is some uncertainty in our estimate due to randomness in the sampling.
To construct the estimators and confidence bands at level 1 − , we use the Gaussian kernel.Our choices of bandwidths are guided by our theory in Section 3. Recall from Assumption 1 that   ≈  −1/(2+1) where  + 0.5 must be less than the order of the kernel.Since the Gaussian kernel is of second order, any  < 1.5 is permissible.In particular, we choose  = 1.25 so that 2 + 1 = 3.5; this leads to our choices of kernel bandwidths   = 2 −1/3.5 for , and   = 2 −1/3.5 for  .With these bandwidths, we use the method in Section 3 to construct a kernel estimate of the optimal transport map and a 95% uniform confidence band for each country.
The plots of our estimates and confidence bands for 12 countries are shown in Figure 3.The kernel estimate of the optimal transport map is the black curve, showing the correspondence between the age of deaths in 2001 and 2021.The identity function is the red dashed line.The estimate lying above the identity function indicates a shift in mortality towards higher ages.From these estimates and the confidence bands, we observe the most significant age shifts in Portugal (from 30 to 45) and Japan (from 30 to 40).Other countries also show minor age shifts, except for Norway, New Zealand, Luxembourg, and the USA, where we cannot confidently assert an upward shift in ages.

Conclusion
In this paper, we develop a method to construct uniform confidence bands for the optimal transport maps based on two samples from two unknown continuous distributions.First, we use a kernel to estimate the densities, and then we use the empirical bootstrap to construct the confidence bands.We show that our confidence bands are asymptotically valid, meaning that they contain the actual OT map on an interval with a probability that approaches the nominal coverage probability.We perform simulations to verify the validity of our confidence bands.As an application, we apply our confidence bands to analyze the shift in life expectancy across 12 countries from the year 2001 to 2021.There are a couple of directions for future research.Firstly, our delta method and bootstrap procedure rely on first-order approximations.Exploring higher-order approximations would be an improvement worth considering.Secondly, our choice of kernel bandwidth directly follows from the theory of kernel density estimation, which, in practice, may not be sample-efficient in achieving consistency.This raises another research problem of finding a bandwidth search procedure that can achieve consistency more efficiently than the one presented in this paper.

Appendix A: Pointwise Confidence Intervals via the Empirical Distributions
In certain situations, one might wish to construct a confidence interval for the value of the optimal transport at a specific point.Our approach to constructing such an interval follows closely to that of confidence bands, as the interval can be seen as a specific instance of confidence bands that covers only a single point.One key distinction is that, with only a single point, there is no need to estimate the standard deviation, and consequently removing the necessity for kernel density estimation.

A.2. Bahadur representation
Proposition 6 (Bahadur Representation of 1D Transport Map).Suppose that there exists an interval [, ] such that: (i)   is continuously differentiable on the interval where (  , ) Moreover, the  ⊗ (1) term does not depend on the choice of .
The conditions on   imply that   is continuous on We derive an asymptotic representation of the estimation error  , () −  0 ().

𝑄
. From this, we use Taylor's formula again to obtain: Therefore, the first term on the right of ( 13) is Combining ( 14) and ( 15) and the continuity of  •  −1  and  •  −1  , we conclude the convergence in the Hadamard sense, that is, it holds that lim

A.3. Gaussian Approximation Theorem
With the Bahadur representation (12), we can easily develop a Gaussian approximation on the estimation error.If   an interval [, ] satisfy the conditions in Proposition 6, we obtain the following representation of the error on [, ]: Note that the two processes are independent, since {  }  =1 and {  }  =1 are mutually independent.
We consider Gaussian limits of the terms in (16).In view of Proposition 6, Donsker's theorem tells us that there exist two independent Brownian bridges G 1 and G 2 such that the following convergences of  [, ] functions hold under the uniform norm: Recalling the sample size condition /( + ) → , we have the following convergence to a Gaussian process: We remark that the covariance function of G 1 •   and G 2 •   are can be written explicitly:

A.5. Simulation
To validate our asymptotic intervals, we perform Monte Carlo simulation with the same design as in Section 5.1 for three sample sizes:  = 200, 700 and 2000.For each  ∈ [−2.5, 2.5], the estimated coverage probability at point  is the proportion of 1000 runs in which our confidence interval at  contains  0 ().Table 2 displays the summary statistics for the pointwise coverage probabilities and the medians of average widths of the intervals across  ∈ [−2.5, 2.5].
For small sample sizes ( = 200 and  = 700), the minimum coverage probabilities are much smaller than the nominal probabilities (1−), whereas the maximum coverage probabilities are close to the nominal probabilities.And as the sample size increases, both the minimum and maximum coverage probabilities converge to the nominal probabilities.Of course, the widths of the confidence intervals decreases as  increases and 1 −  decreases.In Figure 4, we plot the average coverage probability and the median of average width as a function of .From the plots, we see that the average coverage probability converges to the nominal probability as  increases.However, the convergent is very slow for 1 −  = 0.90 and 0.95.Ss discussed in Section 5.2, this is attributed to the insufficient number of sample points from the right tail of  (0, 1), which hinders the estimation of the transport map between the two distributions.
The cause of the sub-optimal coverage probability becomes more apparent when examining the individual confidence intervals (Figure 5).We can see that for larger values of , our plug-in estimator (the red curves) remains significantly distant from the true transport map (the black curves).Consequently, this discrepancy causes the confidence intervals to fail in capturing the transport map accurately.
delta method provides a way to turn the weak convergence of a sequence of stochastic processes   (X  − T) into that of   ((X  ) − (T)).The idea is to realize that the latter can be written as   (T +  −1  ℎ  ) − (T) where ℎ  =   (X  − T).With sufficient differentiability condition on  we expect this sequence to converge to  ′ (T) as   → ∞.To rigorously obtaining the convergence, we need the notion of Hadamard differentiable functions.Definition 1 (Hadamard Derivative).A function  : D → E is Hadamard differentiable at T ∈ D if there exists a continuous linear map  ′ T : D → E such that for any ℎ  , ℎ ∈ D satisfying ℎ  → ℎ in D as  → 0, we have In this case, we say that  ′ T is the Hadamard derivative of  at T. If  is Hadamard differentiable then one can obtain the convergence of stochastic processes under -this is essentially the statement of the functional delta method.

B.2. Functional delta method for bootstrap
In this work, we use the bootstrap procedure to estimate the distribution of stochastic processes of type   ((X  ) − (T)).Let   be the empirical distribution based on the sample  1 , . . .,   and X *  be the bootstrap estimator based on a bootstrap sample  * 1 , . . .,  *  ∼   .In view of (22), the asymptotic consistency of the bootstrap distribution requires that: where P|   − −−− → denotes the convergence in probability, conditionally given  1 , . . .,   in distribution, which can be formally written in terms of the bounded Lipschitz metric: where E  is the expectation with respect to   and  1 (E) is the space of Lipschitz functions ℎ : E → [−1, 1] with Lipschitz constants bounded by one.The first convergence in (23) can be obtained via the functional delta method, and the second convergence can be obtained using the following lemma: where G 2 is another independent Browninan bridge.
Step 2. Convert the convergences in the uniform norm to those in the bounded Lipschitz metric.
Recall that  1 (ℓ ∞ (F )) denotes the space of Lipschitz functions ℎ : ℓ ∞ (F ) → [−1, 1] with Lipschitz constants bounded by one.For any ℎ ∈  1 (ℓ ∞ (F )), we have |ℎ( Denoting by E  the expectation with respect to   , Jensen's inequality yields that where we used the fact that ℎ is bounded by one in the second to last inequality.From (25), we have   ( ,, ) → 0 for any arbitrary ; therefore, by taking supremum over all ℎ ∈  1 (ℓ ∞ (F )), there is a Brownian bridge G 1 such that sup ℎ∈  1 (ℓ ∞ ( F) ) Similarly, letting E  be the expectation with respect to   , there is a Brownian bridge G 2 such that sup Step 3. Prove the convergence of the sequence √  + (  *  −   ,  *  −   ) in the bounded Lipschitz metric.Now we will show that Denoting  , the expectation with respect to the empirical distribution P , , we have Using ( 26) and ( 27), the last expression converges  ⊗ -a.s. to zero uniformly over ℎ ∈  1 (ℓ ∞ (F ) 2 ) as ,  → ∞.Thus taking the supremum over such ℎ yields (28) as desired.
In addition, under the required conditions on   ,   and , we can use Corollary Step 4. Apply the delta method for bootstrap.With (28) and (29), we can now apply the delta method for bootstrap (see Appendix B.2 for a brief exposition).Define a functional Ψ : We recall from (11) that the Hadamard derivative of Ψ at (  ,   ) is Using Lemma 10 on Ψ, the sequence weakly converges to This is where we use Lemma 2. For arbitrarily small , ,  ′ > 0, there are sufficiently large ,  such that, with  ⊗ -probability greater than 1 − , the following inequalities hold simulteneously: first,

Fig 1 :Fig 2 :
Fig 1: Plots of the coverage probabilities (left) and the median of average widths (right) of the simulated uniform confidence bands on [−2.5, 2.5] as functions of sample size .

Fig 3 :
Fig 3: Analysis of the distribution shifts in ages of death from the year 2001 to 2021 using our uniform confidence bands.

Fig 4 :Fig 5 :
Fig 4: Plots of the average of the coverage probabilities (left) and the median of average widths (right) of the simulated pointwise confidence intervals over [−2.5, 2.5] as functions of sample size .