Log-concave density estimation with symmetry or modal constraints

We study nonparametric maximum likelihood estimation of a log-concave density function $f_0$ which is known to satisfy further constraints, where either (a) the mode $m$ of $f_0$ is known, or (b) $f_0$ is known to be symmetric about a fixed point $m$. We develop asymptotic theory for both constrained log-concave maximum likelihood estimators (MLE's), including consistency, global rates of convergence, and local limit distribution theory. In both cases, we find the MLE's pointwise limit distribution at $m$ (either the known mode or the known center of symmetry) and at a point $x_0 \ne m$. Software to compute the constrained estimators is available in the R package \verb+logcondens.mode+. The symmetry-constrained MLE is particularly useful in contexts of location estimation. The mode-constrained MLE is useful for mode-regression. The mode-constrained MLE can also be used to form a likelihood ratio test for the location of the mode of $f_0$. These problems are studied in separate papers. In particular, in a separate paper we show that, under a curvature assumption, the likelihood ratio statistic for the location of the mode can be used for hypothesis tests or confidence intervals that do not depend on either tuning parameters or nuisance parameters.


Introduction and overview
The classes of log-concave densities on R (and on R d ) have great importance in statistics for a variety of reasons including their many natural closure properties, including closure under convolution, affine transformations, convergence in distribution, and marginalization. These classes are also unimodal and serve as important nonparametric generalizations of the class of Gaussian distributions.
Nonparametric estimation in the unconstrained classes of log-concave densities has developed rapidly in the past 10-15 years. Existence of maximum likelihood estimators for log-concave densities on R was provided by Walther (2002), while Pal, Woodroofe and Meyer (2007) established consistency. Dümbgen and Rufibach (2009) gave rates of convergence in certain uniform metrics, and pro-vided efficient algorithms based on "active set" methods (see also Dümbgen and Rufibach (2011)). Balabdaoui, Rufibach and Wellner (2009) established pointwise limit distribution theory for the MLE's, while Doss and Wellner (2016) established rates of convergence of the MLE in the Hellinger metric. There has also been rapid progress in estimation of log-concave densities on R d ; see e.g. Cule, Samworth and Stewart (2010), , Dümbgen, Samworth and Schuhmacher (2011), Seregin and Wellner (2010), and Han and Wellner (2016).
Interesting uses of the unconstrained log-concave MLE's in more complicated models, mostly in mixture modeling and clustering, have been considered by Chang and Walther (2007), Eilers and Borgdorff (2007), Walther (2009), and .
On the other hand, for a number of important statistical problems it is of great interest to understand estimation in several important sub-classes of the class of all log-concave densities on R.
• For testing that a log-concave density on R is symmetric about a known point, for example 0, we need know how to estimate the log-concave density both with and without the constraint of symmetry. • For the basic problem of estimation of location with a symmetric error density, it is important to know how to estimate a symmetric log-concave density with mode (and median and mean) equal to 0. • For inference about the mode of a log-concave density it is necessary to understand how to estimate a log-concave density with a known mode m (but without the constraint of symmetry).
Once the properties of nonparametric estimators within these sub-classes is understood, then the estimators can be used to develop statistical methods with known properties for other more complex statistical problems. For example: the basic procedures we study here can be viewed as building blocks to be used for, among others: (a) Testing the hypothesis of symmetry of a log-concave density. (b) Estimation of the location of a symmetric log-concave density.
(c) Inference about the mode of a log-concave density. (d) Nonparametric modal regression (as in Chen et al. (2016), but using logconcavity). (e) Semiparametric estimation in mixture models based on symmetric logconcave distributions; see e.g. Balabdaoui and Doss (2018), Pu and Arias-Castro (2017), and Eilers and Borgdorff (2007). (f ) Modal clustering (as in Chacón (2018a), but using log-concavity). (g) Estimation of a spherically symmetric multivariate log-concave density, which is pursued in Xu and Samworth (2017). (h) Inference about the center of an elliptical multivariate distribution based on the assumption of a log-concave underlying shape.
Thus our focus here is on estimation of a log-concave density in two important sub-classes: Let LC denote the class of all log-concave densities on the real line R. 2394 C. R. Doss and J. A. Wellner The two subclasses we study here are: (1) The class LC(m) = LC m of all log-concave densities with mode a fixed number m.
We letf 0 n denote the maximum likelihood estimator of f 0 ∈ LC m based on an i.i.d. sample X 1 , . . . , X n from f 0 ; and we letĝ 0 n denote the maximum likelihood estimator of g 0 ∈ SLC 0 , based on an i.i.d. sample from g 0 .
We rely on the methods and properties developed here for the subclass LC(m) to derive new inference procedures for the mode in Doss and Wellner (2019). The sub-class SLC(0) has already been used in Balabdaoui and Doss (2018) to study semiparametric mixture models. The methods developed here for SLC(0) are also being used in an on-going study by Laha (2019) of efficient estimation of a location parameter in the classical semiparametric symmetric location model with the (very natural) assumption of a symmetric log-concave error distribution. Methodology based on modes or local maxima of nonparametrically estimated functions has seen a resurgence in recent years; see, e.g., Chen et al. (2016), Chen, Genovese and Wasserman (2015), and Qiao and Polonik (2016). A recent survey on estimation and inference for the mode and on mode-based methodology is given by Chacón (2018b).
Thus our main goals here are the following: (a) To show that the mode-constrained MLE'sf 0 n ∈ LC(m) andĝ 0 n ∈ SLC(0) exist and to provide useful characterizations thereof. Here is a brief summary of the paper: In Section 2 we show that the constrained estimators exist and satisfy useful characterizations. Section 3 provides plots of the constrained estimators and provides comparisons to each other and to the unconstrained maximum likelihood estimatorsf n ∈ LC. In Section 4 we summarize results concerning consistency and global rates of convergence, while Section 5 addresses local rates of convergence and limiting distributions at fixed points. Section 6 summarizes some problems and difficulties concerning extensions to higher dimensions. All the proofs are given in Sections 7 and 8.
Many of our theorems have parts labeled "A", "B", and "C." In general the "A parts" of results here have been proved by other authors (as noted in the theorem statements), the "B parts" were proved (for the most part) in the University of Washington Ph.D. dissertation of the first author, Doss (2013b). The "C parts" are new findings by the present authors, whose proofs are in some cases (as noted in text near the corresponding results) related to proofs developed by Balabdaoui and Doss (2018).

Maximum likelihood estimator finite sample properties:
Unconstrained and mode-constrained

Notation and terminology
Several classes of concave functions will play a central role in this paper. In particular, we let C := {ϕ : R → [−∞, ∞) | ϕ is concave, closed, and proper} (2.1) and, for any fixed m ∈ R, is the class of concave functions on R with mode at m. We also let Here proper and closed concave functions are as defined in Rockafellar (1970), pages 24 and 50. We will follow the convention that all concave functions ϕ are defined on all of R and take the value −∞ off of their effective domains where dom(ϕ) := {x : ϕ(x) > −∞} (Rockafellar (1970), page 40). The classes of unconstrained and constrained log-concave densities are then LC := e ϕ : e ϕ dλ = 1, ϕ ∈ C , LC m := e ϕ : e ϕ dλ = 1, ϕ ∈ C m , and SLC 0 := e ϕ : e ϕ dλ = 1, ϕ ∈ SC 0 , where λ is Lebesgue measure on R. We let X 1 , . . . , X n be the observations, independent and identically distributed with density f 0 with respect to Lebesgue measure. Here we assume throughout that f 0 ∈ LC and frequently that f 0 = e ϕ0 ∈ LC m for some m ∈ R or f 0 = e ϕ0 ∈ SLC 0 . We let X (1) < · · · < X (n) denote the order statistics of the X i 's, and write |X| (1) < · · · < |X| (n) for the order statistics of |X 1 |, . . . , |X n |. We let P n = n −1 n i=1 δ Xi denote the empirical measure, let F n (x) = n −1 n i=1 1 (−∞,x] (X i ) denote the empirical distribution function, and let G n (x) = n −1 n i=1 1 [0,x] (|X i |) denote the empirical distribution function of |X 1 |, . . . , |X n |.
We define the log-likelihood criterion function Ψ n : C → R by where we have used the standard device of including the Lagrange term R e ϕ(x) dx in Ψ n to avoid the normalization constraints involved in the classes LC m and SLC 0 . This is as in Silverman (1982), Dümbgen and Rufibach (2009), and other current literature. We will denote the unconstrained MLE's of ϕ 0 , f 0 , and F 0 by ϕ n , f n , and F n respectively. The corresponding constrained estimators with mode m and symmetric estimators with mode at 0 will be denoted by ϕ 0 n , f 0 n , F 0 n , and ψ 0 n , g 0 n , G 0 n respectively. Thus ϕ n ≡ argmax ϕ∈C Ψ n (ϕ), ϕ 0 n ≡ argmax ϕ∈Cm Ψ n (ϕ), and ψ 0 n ≡ argmax ψ∈SC0 Ψ n (ψ).
Before proceeding to results concerning existence and uniqueness of the constrained estimators ϕ 0 n and ψ 0 n , we first explain some undesirable properties of "naive" constrained estimators based on the unconstrained MLE's f n and ϕ n .
In summary, naive plug-in estimation for the mode and symmetry constraints does not work. The poor performance of these and other "naive" or "plug-in" estimators motivates study of the constrained MLE's, which we now pursue.

The unconstrained and the constrained MLE's
To develop theory for the mode-constrained estimators ϕ 0 n , f 0 n , and F 0 n it will be helpful to consider mode-augmented data Z 1 , . . . , Z N with N = n or n + 1 as follows: . . , X (n) ) ∈ R n+1 and N = n + 1. A. (Pal, Woodroofe and Meyer (2007), Rufibach (2006)) For n ≥ 2 the (unconstrained) nonparametric MLE ϕ n exists and is unique. (Doss (2013b)) For N ≥ 2 the mode-constrained MLE ϕ 0 n exists and is unique. It is piecewise linear with knots at the Z i 's and domain [Z 1 n ∈ SC 0 exists for n ≥ 1 and is unique. It is piecewise linear with knots contained in the set of 2n + 1 points −|X| (n) , . . . , −|X| (1) , 0, |X| (1) , . . . , |X| (n) , and is −∞ for x / ∈ [−|X| (n) , |X| (n) ]. Furthermore, ( ψ 0 n ) (0±) = 0. The previous result shows that the MLE's exist. Unfortunately, there is no closed form expression for the MLE's. However, since they are solutions to optimization problems, they satisfy certain optimality conditions. Thus, the next two theorems we present provide systems of inequalities and equalities that characterize the MLE's.

Theorem 2.2.
A. (Rufibach (2006), Dümbgen and Rufibach (2009) ) Let ϕ n be a concave function such that {x : ϕ n (x) > −∞} = [X (1) , X (n) ]. Then f n = e ϕn ∈ LC is the unconstrained MLE if and only if for any function Δ : R → R such that ϕ n + λΔ is concave for some λ > 0. B. (Doss (2013b)) Suppose that f 0 n = e ϕ 0 n ∈ LC m . Then f 0 n is the MLE over LC m if and only if for all Δ such that ϕ 0 n + tΔ ∈ C m for some t > 0.
C. Suppose that g 0 n = e ψ 0 n ∈ SLC 0 and G 0 n (x) ≡ x −∞ g n (y)dy. Then g 0 n is the MLE over SLC 0 if and only if for all Δ such that ψ 0 n + tΔ ∈ SC 0 for some t > 0. To state the second characterization theorem for the MLE's, we first introduce some further notation and definitions. For a continuous and piecewise linear function h : [A, B] → R we define its knots to be Note that ϕ n , ϕ 0 n , and ψ 0 n are all continuous and piecewise linear functions (with A = X (1) , B = X (n) in the case of ϕ n and ϕ 0 n , and with A = −|X| (n) , B = |X| (n) in the case of ψ n ), and we have Now suppose that ϕ 0 n is piecewise linear with knots at the (mode-augmented) data, let m ∈ R, and assume that f 0 (2.7) Definition 2.3. With m considered as a possible knot of ϕ 0 n we say that m is a left knot (or LK) if ( ϕ 0 n ) (m−) > 0 and that m is a right knot (or RK) if ( ϕ 0 n ) (m+) < 0. We say that m is not a knot (or NK) if ( ϕ 0 n ) (m) = 0. All other knots are considered to be left knots (LKs) or right knots (RKs) depending on whether they are strictly smaller or strictly larger than m.

Theorem 2.4.
A. (Rufibach (2006), Dümbgen and Rufibach (2009) B. (Doss, 2013b) With the notation in (2.7), f 0 Remark 2.5. The conditions (2.8) and (2.9) only involve data from the left and right sides of m, and hence are separate characterizations in a sense. But they are coupled by way of the (global) constraint F 0 n (X (n) ) = 1 (or, equivalently, ϕ 0 n ∈ C m ) which involves the data on both sides of m. Remark 2.6. The "C parts" of Theorems 2.1, 2.2, and 2.4 will be proved here in detail via methods similar to those introduced briefly in Balabdaoui and Doss (2018) in the course of a study of two-component mixture models based on symmetric log-concave components.
These characterization theorems have two important corollaries. (Recall that G n denotes the empirical distribution function of the |X i |'s.) Corollary 2.7 (MLE's related to F n at knot points). Each of the following holds almost surely. |X| (n) ]. Now for any distribution function F on R let μ(F ) ≡ xdF (x) and Var(F ) = (x − μ(F )) 2 dF (x).

Corollary 2.8 (Mean and variance inequalities).
. Because Δ ± (x) = ±x does not have mode m, and because −(x − μ) 2 only has mode m if μ = m, we cannot make comparisons between the mean and variances of F n and F 0 n .

Hellinger consistency and rates
Pal, Woodroofe and Meyer (2007) This nicely includes the subclass S = LC m when f 0 ∈ LC m ; i.e. the mode m has been correctly specified. Further consistency results are due to , Rufibach (2006), and Dümbgen and Rufibach (2009). To the best of our knowledge, this is the first treatment of the consistency and global rate properties of the constrained estimators.
Theorem 3.1 (Hellinger consistency and rates of convergence).
A. (Doss and Wellner (2016) (Balabdaoui and Doss (2018) Kim and Samworth (2016) extend Part A of Theorem 3.1 by upper bounding the maximal risk of f n : their Theorem 5 implies that considering squared Hellinger rather than Hellinger distance). They also provide a matching lower bound: their Theorem 1 implies inf for some c > 0, where the infimum is over all (measurable) estimatorsf n of f . Neither upper nor lower bounds for the (Hellinger) minimax risk are known for either of the constrained density classes we consider in the present paper, although we conjecture that n −4/5 is the minimax rate of convergence in both cases.
Remark 3.3. When f 0 ∈ LC \ LC m , then we can show that Similarly, when f 0 ∈ LC \ SLC 0 , then we can show that H 2 ( g 0 but we will not pursue this here since our goal in this paper is to understand the null hypothesis (or correctly specified) behavior of the constrained estimators ϕ 0 n , f 0 n , and F 0 n . See Doss and Wellner (2019) for some initial steps concerning the power of a likelihood ratio test based on 2 log λ n when f 0 ∈ LC \ LC m .
In addition to considering Hellinger distance, one can consider the sup norm (on compact sets) as a metric for global convergence. It turns out that the proofs in Doss and Wellner (2019) rely crucially on knowing the rate of supnorm convergence for ϕ 0 n (as well as for ϕ n ). Thus we study the sup-norm rate of convergence for ϕ 0 n in that paper. In Theorem 4.1 of that paper we find, when the true log-density satisfies a Hölder condition of order 2 and ϕ (2) 0 (m) < 0, that the rate of convergence is (log n/n) 2/5 on compact sets interior to the support of f 0 .

Local limit processes and limiting distributions at fixed points
Our goal in this section is to describe the limiting distributions of our estimators, both unconstrained and constrained, at fixed points x 0 (and m and 0) at which the true density f 0 satisfies a curvature condition. We also want to compare and contrast the behavior of the three different estimators.

The limit processes, unconstrained and constrained
We first need to introduce the local limit proceses which are needed to treat the local (at a single point or in a neighorhood of a point) limiting distributions of the estimators, unconstrained and constrained. For all of our estimators (including the unconstrained and the two different mode-constrained estimators), the limit distributions are not Gaussian. Rather, they are defined in terms of so-called invelope processes of integrated Brownian motion. We first recall the invelope process related to the limit distribution for the unconstrained estimators; this process was first presented and studied in Groeneboom, Jongbloed and Wellner (2001a) (and shown to yield the limit distribution in several convex function estimation problems in Groeneboom, Jongbloed and Wellner (2001b)). Let W be a two-sided standard Brownian motion starting at 0 and for any t ∈ R let (4.1) Theorem 4.1 (Groeneboom, Jongbloed and Wellner (2001a)). Let W , X, and Y be as in (4.1). Then there exists an almost surely uniquely defined random continuous function H satisfying the following conditions: (i) The function H is everywhere below Y : (ii) H has a concave second derivative.
The random variables H (2) (0) and H (3) (0) give the universal component of the limit distribution of f n (x 0 ) and ( f n ) (x 0 ); see Theorem 4.5, below.
Theorem 4.1 concerns a process H, related to the unconstrained concave estimation problem. In the mode constrained estimation problem, f 0 ∈ LC m , instead of having one process we have two, one for the left-hand side of 0 (negative axis) and one for the right-hand side of 0 (positive axis). (Here, 0 corresponds to the mode m, by a translation.) The definitions of the left-and right-hand processes depend on a random starting point for the corresponding integrals involved, which we will eventually denote τ L and τ R (this is made clear in (4.3)-(4.5), below). To define τ L and τ R , we must define rigorously the possible 'bend points' of ϕ 0 . To describe the situation exactly, we also will define 'bend points' τ 0 + and τ 0 − , satisfying τ 0 + ≤ τ R and τ 0 − ≥ τ L , where the inequality may or may not be strict; these bend points arise in (4.10) below. For a concave function g, we let g (·−) and g (·+) be the left and right derivatives, respectively (which are always well defined).
Let W be a standard two-sided Brownian motion with W (0) = 0, and for t ∈ R let With these definitions, we assume that: (4.10) Then, H L and H R are unique, as are τ L and τ R .
Theorem 4.2 shows that processes with the given properties are unique; that they exist follows from the proofs of Theorem 4.8 and Theorem 4.9, which show that H L and H R exist since they are limit versions of certain finite sample processes ( H ϕ n,L , H ϕ n,R ). If ϕ 0 is, in fact, piecewise linear, then τ 0 − is just the last knot point τ of ϕ 0 with ( ϕ 0 ) (τ −) > 0. By Theorem 23.1 of Rockafellar (1970), a finite, concave function on R such as ϕ 0 has well-defined right-and left-derivatives at all of R; the specification of left-and right-derivatives in the definitions of τ 0 − and τ 0 + are for concreteness but not necessary since we consider all ε > 0. The distinction between τ R and τ 0 + depends only on the behavior of ϕ 0 at 0, and can be understood by considering the case where the infima in (4.4) and in (4.5) are actually minima (the infima are attained). In that case, we see that τ 0 + can be thought of as the smallest "right-knot" in the sense that ϕ 0 has a strictly negative slope to the right of τ 0 + . And τ R can be thought of as the smallest positive knot. Note that (by concavity) all positive knots are rightknots, so that τ R ≥ τ 0 + . Note that the infimum defining τ R in (4.5) is taken over knots that are strictly larger than 0, so that (when the infima are attained) we have τ R > 0. On the other hand, if 0 is a right-knot then τ 0 + = 0, so that then τ R and τ 0 + are distinct. If 0 is not a right-knot, then we will have τ 0 + = τ R . These statements are slightly complicated by the fact that τ 0 + and τ R are defined as infima rather than minima, but the intuitive differences are captured by the previous description. Corresponding statements hold for τ L and τ 0 − . The distinction between the two sets of knots pairs is important because many of our arguments depend on constructing "perturbations" of ϕ 0 , and we can use different types of perturbations at each pair. This means that the different knot pairs have different properties: if we replace τ L , τ R by τ 0 − , τ 0 + in (4.7), then that display may not hold, and similarly, if we replace τ 0 − , τ 0 + by τ L , τ R in (4.10), then that display may not hold. The following lemma holds for τ 0 − , τ 0 + but not necessarily for τ L , τ R .
(4.11) Now we introduce the appropriate limit processes for the symmetric about 0 mode-constrained estimators. The characterization is similar to that for the mode-constrained (but not symmetric) processes, but since it is defined only on [0, ∞) the processes are not the same.
Theorem 4.4. Assume H + is a random process on [0, ∞), and assume that

Unconstrained and constrained pointwise limit theory at
The two main limit theorems below will concern the limiting distributions of our estimators and their derivatives. Recall, we assume that X 1 , . . . , X n iid ∼ f 0 = e ϕ0 , where f 0 is a non-degenerate density on R. The three sets of estimators of f 0 , ϕ 0 , f 0 , and ϕ 0 to be considered are: A f n , ϕ n , ( f n ) , and ( ϕ n ) . B f 0 n , ϕ 0 n , ( f 0 n ) , and ( ϕ 0 n ) . C g 0 n , ψ 0 n , ( g 0 n ) , and ( ψ 0 n ) . Then the corresponding curvature assumptions are: Note that Hall (1984) shows that Curvature Assumption 2b holds for the class of symmetric α-stable densities on R for all 0 < α < 2. Assumption 1 will be used for the estimators in A, whereas for the estimators in B and C we will use Assumptions 2a and 2b. In all three cases we assume X 1 , . . . , X n iid ∼ f 0 . To state our theorem we first define some constants as follows: . Theorem 4.5 (Limiting distributions at a fixed point x 0 = m).
A. Balabdaoui, Rufibach and Wellner (2009). Suppose that f 0 ∈ LC and that the Curvature Assumption 1 holds at x 0 . Then with H as in Theorem 4.1 and ϕ ≡ H , Remark 4.6. (i) Comparing the MLE's for LC and LC m : Note that the limiting distributions in A and B at a point x 0 = m are the same. At a fixed point x 0 = m, the constraint that the mode is known does not help in estimating the function at x 0 . As we will see below, this picture changes when x 0 = m.
(ii) Note that the rate of convergence of f 0 n and g 0 n as x 0 = m (or x 0 = 0 in the case of g 0 n ) is n −2/5 in contrast to the n −1/5 rate achieved by the naive estimators discussed in Subsection 2.2.
(iii) Comparing the MLE's for LC and LC m with the MLE for SLC m : The limiting distributions for the symmetric log-concave class SLC in C are smaller than the limiting distributions of the MLE's for the possibly asymmetric logconcave classes LC and LC m by a factor of 2 −2/5 ≈ .757858 . . . for the functions themselves and by a factor of 2 −1/5 ≈ .870551 . . . for the derivatives of the functions. Thus the symmetry constraint substantially reduces the variability of the estimators (see also Figure 3).

Mode-constrained and symmetry-constrained pointwise limit theory at x 0 = m
The limit distribution of the mode-constrained estimators at a point x 0 depends on whether x 0 = m or x 0 = m. In the latter case the asymptotics are the same as the unconstrained estimator, but in the former case they depend on the mode-constrained limit process.
Theorem 4.7 (Limiting distributions at m). (4.16) C. Suppose that f 0 ∈ SLC 0 and that the Curvature Assumption 2b holds at m = 0. Then where ψ 0 is given by Theorem 4.4.
We label the parts of Theorem 4.7 as "B" and "C" to be in parallel with the labeling in Theorem 4.5. Notice that for the symmetric MLE, (( g 0 n ) (0) and (( ψ 0 n ) (0) are always both equal to 0, the value of f 0 (0) and of ϕ 0 (0), so we do not state a limit theorem for these estimators. Theorems 4.5 and 4.7 follow from more general theorems about the estimators not just at x 0 but in local n −1/5 neighborhoods of x 0 , stated below as Theorems 4.8 and 4.9. The "local neighborhood" Theorem 4.8 is the version from which we can derive the limit distribution of the mode likelihood ratio statistic 2 log λ n studied in Doss and Wellner (2019). Monte Carlo estimates of the distribution functions of ϕ (0) and of ( ϕ 0 ) (0) are presented in Figure 1 (left plot). Note that ( ϕ 0 ) (0) is stochastically smaller than ϕ (0).

Local process limit theory, mode-and symmetry-constrained
Here we state the local process limit theorems behind Theorem 4.7 B, where x 0 = m. In this subsection we will formulate a more general version of that theorem which applies to our estimators in n −1/5 neighborhoods of m.
Recall the definition of Y in (4.1). Now, for positive numbers a and σ, Let Let H a,σ , H L,a,σ , and H R,a,σ denote the unconstrained and mode-constrained left-and right-processes for Y a,σ . Then and Identical scaling relationships hold for H L,a,σ , H R,a,σ , and the corresponding derivatives, including ϕ 0 a,σ ≡ H (4.21) Theorem 4.8. Let Curvature Assumption 2b hold. Let H be as in Theorem 4.1 and let ϕ ≡ H and let ϕ 0 be as in Theorem 4.2. Let σ ≡ 1/ f 0 (m) and a = |ϕ

the topology of uniform convergence on compact sets and D ∞ is the set of right-continuous with limits from the left ("cadlag") functions on
A similar useful result for the symmetry constrained problem which generalizes or extends Theorem 4.7 part C is as follows.
Remark 4.10. To this point we have focused on the case in which the point of symmetry m is known (and equal to 0). If m is unknown (and possibly different from 0) and f 0 ∈ SLC m , then it is well-known that m is also the mean and median of f 0 and hence it can be estimated in several different ways by estimatorsm satisfying √ n(m − m) = O p (1). For example, we could takem = X n orm = F −1 n (1/2), the sample median. Then we can proceed by assuming that f 0 ∈ SLCm and carrying out the estimation as described above with the X i 's shifted bym. Denote the resulting estimators of g and ψ byĝ 0 n andψ 0 n . Then since n −1/2 = o(n −2/5 ) it is easily seen that that Theorems 4.5 C, 4.7 C and 4.9 continue to hold withĝ 0 n replaced byĝ 0 n andψ 0 n replaced byψ 0 n .

Asymptotics for the maximum
We now consider the asymptotic distribution of estimators of the maximum (0)). Thus the asymptotic distribution in those two cases is given by Theorem 4.7. We present the asymptotic distribution in the unconstrained case here. In fact, in the unconstrained case, we can present a somewhat stronger result, where we allow the possibility of increasingly flat modal regions.
Theorem 4.11. Let W denote two-sided Brownian motion starting at 0, and define Y k (t) = t 0 W (s)ds − t k+2 for k ≥ 2 an even integer. Let H k be the lower invelope of Y k , as defined in Theorem 2.1 of Balabdaoui, Rufibach and Wellner (2009) We compared, by Monte Carlo, the densities of N ( ϕ) (note ϕ ≡ ϕ 2 ) and of N ( ϕ 0 ). See estimates in Figure 1 (right plot); those estimates are log-concave MLE's (based on Monte Carlo simulations, as described in the caption). Additionally, simulations not presented here indicate that P The estimated density of N ( ϕ) in Figure 1 should be compared with the estimated density of ϕ(0) given in Figure 1 of Azadbakhsh, Jankowski and Gao (2014), noting that their C(0) is our ϕ(0). Remark 4.12. As is well known, one can see from Theorem 4.11 that the rate of convergence of the mode decreases as k increases, while the rate of convergence of the maximum increases (and gets closer to n 1/2 ). One can also check that where k is an even integer.

Simulation results
Software to compute the mode-constrained estimator, and also to implement the likelihood ratio test and corresponding confidence intervals studied in Doss and Wellner (2019), is available in the package logcondens.mode (Doss, 2013a) in R (R Core Team, 2016). Here we illustrate the existence and characterization results on simulated data in Figure 2. There are two columns of four plots. The left column includes the mode-constrained log-concave MLE. The right column includes the 0-symmetric log-concave MLE. The data points are represented by vertical hash lines along the bottom of each plot. The density, log density, and distribution function are plotted in the top three rows, with the unconstrained log-concave MLE in red and the true (unknown) function in black. On the left the mode-constrained MLE is in blue, and on the right the 0-symmetric MLE is in blue. The empirical df F n is plotted in green in the third row. In the last row, we plot Y n,L − H 0 n,L (blue) and Y n,R − H 0 n,R (purple) to illustrate Theorem 2.4 B (left plot), the corresponding symmetry-constrained process (blue) to illustrate Theorem 2.4 C (right plot), and Y n − H n in red (both plots) to illustrate Theorem 2.4 A. In all the plots, dashed vertical red lines give S n ( ϕ n ) and dashed vertical blue lines give knots of the constrained estimator (which frequently overlap). The solid blue line is the specified mode value for the modeconstrained MLE. Figure 3 gives plots of . The left and right plot are each one simulation with sample size n = 200 and n = 2000, respectively, from a N (0, 1) distribution. The plots show improvements by G 0 n and F 0 n over F n . The MC and UC lines are indistinguishable when n = 2000 since one needs to plot locally to the mode 0 to see differences between F 0 n and F n when n is large.

Outlook and further problems
Motivated by likelihood ratio test considerations as well as potential uses in several semiparametric settings, we have introduced estimators for a log-concave density known to satisfy a further constraint of either having a known mode or of being symmetric (about 0). Our estimators are based on the maximum likelihood principle. The constrained MLE's that we develop are more challenging to compute and to study theoretically than certain naïve estimators discussed in Subsection 2.2, but have much better theoretical behavior. We developed a fast algorithm for computation of the estimators which is made available in the logcondens.mode package for the R programming language. We found that the constrained MLE's are consistent and indeed we presented the n −2/5 rate of convergence, globally and locally (with some proofs given in Doss (2013b)). We found the pointwise asymptotic distribution of the MLE's, in fact; this necessitated studying and characterizing certain limit processes that govern the limit distributions of the MLE's. Studying the limit processes in the constrained cases seems to be somewhat more challenging than in the unconstrained case (i.e., than in the case given in Theorem 4.1), because the definitions and characterizing conditions for the constrained limit processes depend on certain knots (of the limit process) in complicated ways. Nonetheless, our proofs of Theorem 4.2 and Theorem 4.4 are different and shorter than the proof of Theorem 4.1; for the latter, Groeneboom, Jongbloed and Wellner (2001a) initially characterized the process on an interval [−c, c] and then through further tightness-type arguments they showed that one can let c → ∞. In the proofs of Theorem 4.2 and Theorem 4.4, we argue directly about a process on (−∞, ∞), skipping the step of considering the process on [−c, c] and allowing for more direct proofs of the results.
The following are interesting questions beyond those already posed in the introduction that are motivated by the present work.
(a) One motivation for the study of f 0 n given here has been the likelihood ratio tests and confidence intervals for the mode introduced in Doss and Wellner (2019). But the constrained estimators may be of interest for the study of semiparametric two-and k−sample problems with (constrained) log-concave errors.
with log-concave density f with mode at 0. Other variants of this problem might involve constraining f to be log-concave with mean or median at 0 rather than mode at 0. Constraining f to be symmetric about its mode of 0 and log-concave, as in Balabdaoui and Doss (2018), is also of interest.
(b) In Balabdaoui and Doss (2018), a mixture density g 0 ( Restriction to the case k = 2 is made for identifiability reasons. Can the asymptotic distribution theory we developed in the present paper be extended to the MLE of f 0 (and of g 0 ) in the semiparametric mixture setting? Extensions to the case k > 2 could also be possible and would certainly be interesting.

Proof sketches and outlines
In this section we give some outlines of the proofs of the results in Section 4. Full proofs are given in Section 8. Here we outline the material in each subsection of Section 8.

Subsection 8.2.1:
The main goal of Subsection 8.2.1 is to show the following proposition.
where ( ϕ 0 n ) denotes either the right or left derivative, and We may replace the interval I n by [ξ n − Cn −1/5 , ξ n + Cn −1/5 ] for any ξ n → m.
Then the random variables implied by the O p upper bounds depend on C but not on ξ n .
This proposition is of crucial importance for showing the results in Subsection 4.3 (Theorems 4.7, 4.8, and 4.9). The proof of Proposition 7.1 depends on the following two propositions.
Proposition 7.3. Suppose that Curvature Assumption 2b holds so ϕ 0 (m) < 0. Let τ 0 + (ξ n ) denote the smallest knot of ϕ 0 n strictly greater than ξ n , and let τ 0 − (ξ n ) denote the largest knot of ϕ 0 n strictly smaller than ξ n . Then for all ε > 0 there exists C 0 > 0 such that for any random variables ξ n → p m Proposition 7.2 is proved in Doss (2013b). It is needed to show Proposition 7.3, which is needed to prove Proposition 7.1. The proof of Proposition 7.1 depends on Proposition 7.3, finding points of closeness of ϕ n and ϕ 0 n , and properties of convex functions. The full proof of Proposition 7.1 is given in Doss (2013b) (see Corollary 4.2.7 there). Thus the main goal of Subsection 8.2.1 is to prove Proposition 7.3 about the "gap problem" (a term coined in Balabdaoui and Wellner (2007)) for the constrained MLE near m. The proof depends on constructing certain (somewhat complicated) classes of perturbation functions which can be related to τ 0 + (ξ n ) − τ 0 − (ξ n ), and then applying an argument pioneered by Kim and Pollard (1990) to these perturbations (see Lemma 8.2).

Subsections 8.2.1 and 8.2.3:
Subsection 8.2.1 is devoted to the proof of Theorem 4.2. The proof proceeds by defining an "objective function" for a < 0 < b. Now, ϕ 0 is not a minimizer of this objective function (which has finite bounds of integration) but we can show that ϕ 0 behaves in a sense as if it were a minimizer of φ a,b over a the space of concave functions with mode at 0. We show that, for certain values (t)), in the direction of a function Δ (assumed to be concave with mode at 0), are either nonnegative or in some cases are zero (see Propositions 8.7 and 8.8 for exact statements). This allows us to argue as follows. We want to show the uniqueness of any ϕ satisfying the characterizing conditions of Theorem 4.2. We assume there exist ϕ 1 and ϕ 2 both satisfying the characterizing conditions. We examine φ a2,b2 (ϕ 1 ) − φ a2,b2 (ϕ 2 ) and φ a1,b1 (ϕ 2 ) − φ a1,b1 (ϕ 1 ) where a i , b i are certain knot points for ϕ i . By Propositions 8.7 and 8.8, we are able to show that both of these differences are no smaller than n −n (ϕ 1 (t) − ϕ 2 (t)) 2 dt > 0 for n > 0 related to the knot points a i , b i . On the other hand, after deriving results about the knots a i , b i and relating the processes ϕ i to the "observed" process Y (see Lemma 8.9), we are able to show that φ a2,b2 (ϕ 1 ) − φ a2,b2 (ϕ 2 ) and φ a1,b1 (ϕ 2 ) − φ a1,b1 (ϕ 1 ) are also nonpositive, by using properties of Brownian motion. Thus the difference n −n (ϕ 1 (t) − ϕ 2 (t)) 2 dt = 0. By letting n → ∞, we see ϕ 1 = ϕ 2 so the proof is complete. The proof is somewhat complicated by the fact that the "knots" of the concave function ϕ 0 are not separated but rather could be a complicated "Cantor-type" set, as described in Sinai (1992), and so their behavior requires careful analysis. The proof of Theorem 4.4 in Subsection 8.2.3 follows a similar argument.

Subsection 8.2.4:
Theorem 4.5 A about the unconstrained estimator is Theorem 2.1 of Balabdaoui, Rufibach and Wellner (2009). Part B of the theorem is then proved in an identical fashion as that theorem, because for n large enough, in an n −1/5 neighborhood of x 0 = m, the constrained and unconstrained estimators satisfy the same characterization. Part C then follows from Part B. The main focus of Subsection 8.2.4 is to show Theorem 4.8. Theorem 4.9 follows in a similar fashion. Theorem 4.8 proves Theorem 4.7 B, which we show now. A similar argument shows that Theorem 4.9 proves Theorem 4.7 C.
Proof of Theorem 4.7 B. Note that, now with σ = 1/ f 0 (m) and a = |ϕ and we note that This gives the constant in the limit distribution for ϕ 0 n (m). For ( ϕ 0 n ) (m), we see from (7.4) that and 1/γ 1 γ 3 2 = D(m, ϕ 0 ). Thus Theorem 4.7 B follows from Theorem 4.8. Next we briefly outline the proof of Theorem 4.8. We define two sets of localized processes. We need to define left-side and right-side processes; for ease of exposition, here we only discuss right-side processes. We let t n,b ≡ m+bn −1/5 . We let s n,R be any knot (sequence) of ϕ 0 n strictly larger than m satisfying n 1/5 (s n,R − m) = O p (1). The first set of localized processes is the "f -processes" (written in terms of the densities and the empirical distribution): where A n,R = n 3/5 F n,R (s n,R ) − F 0 n,R (s n,R ) , and A n,R n 1/5 (t n,b − s n,R ) is shown to be asymptotically negligible. These processes are needed because we can show that where ν n,R = n 1/5 (s n,R −m) and W is a standard Brownian motion with W (0) = 0 (see Lemma 8.17). Furthermore, the characterization from Theorem 2.4 B applies to Y f n,R and H f n,R , so Y f n,R (b) − H f n,R (b) ≥ 0 for b > 0, with equality at certain points (see Lemma 8.18).
On the other hand, the tightness proposition from above, Proposition 7.1, applies to the log-densities. Thus we define the second set of processes, the "ϕ-processes", written in terms of log-densities: where R 0 n is a remainder term. These are related to the f -processes by a Taylor expansion (the delta method). One can translate the characterizing inequalities from the f -processes to the ϕ-processes, to see that Y ϕ n,R (b) − H ϕ n,R (b) ≥ 0 for b ≥ 0 with equality at certain points. By (7.10), Y ϕ n,R can be shown to converge to a limiting Gaussian process. Furthermore, we can apply Proposition 7.1 and the Arzela-Ascoli theorem (after analyzing various remainder terms) to see that H ϕ n,R is tight (Lemma 8.24). Finally, we make a subsequence argument using tightness. By tightness, Prohorov's theorem, and the Skorokhod construction, for any subsequence we can find a further subsubsequence that converges almost surely to a limit process. By using the characterization (Lemma 8.21, and by analysis of various remainder terms), we show that the limit process must satisfy the unique characterizing conditions given by Theorem 4.2. This shows the limit is the same along all subsequences and so the unique process ϕ 0 given in Theorem 4.2 is the limit, which completes the proof. The argument showing that the characterizing conditions pass from the finite sample processes to the limit process is somewhat complicated by the fact that the integrands in question are defined to begin at random knot points.
Another issue of note is that ( ϕ 0 n ) is a discontinuous function. We must choose or find an appropriate metric space in which to study its convergence; the metric we choose is the so-called M 1 Skorokhod metric, which differs from the (perhaps more standard) J 1 Skorokhod metric (referred to as "the" Skorokhod metric in chapter 12 of Billingsley (1999)). The J 1 metric unfortunately does not allow multiple jumps to approximate a single jump, whereas the M 1 metric does. Since we do not have a proof that multiple jumps of ( ϕ 0 n ) do not approximate a single jump in the limit, we must use the M 1 metric. See Lemma 8.23 and the preceding text for further discussion.

Proofs for Section 2
Proofs of Theorems 2.1 A, 2.2 A, 2.4 A and Corollaries 2.7 A and 2.8 A may be found in Pal, Woodroofe and Meyer (2007), Rufibach (2006), and Dümbgen and Rufibach (2009 In the following proofs, we let C n,m denote the (random) class of concave functions with knots at the Z i 's and support on [X (1) , X (n) ], and let K n,m denote the class of concave functions ϕ with knots at the Z i 's and where e ϕ is a density with support [X (1) , X (n) ].
Proof of Theorem 2.1. Proof of Theorem 2.1 B: As in the unconstrained case (Theorem 2.1 of Dümbgen and Rufibach (2009)), it is easy to argue that the solution is piecewise linear with knots at the Z i 's, and that ϕ 0 n is flat either directly to the left of the mode or directly to the right of the mode as long as the mode is not a data point. If the mode is equal to one of the X i 's then the proof given by Rufibach (2006) for the unconstrained MLE existence applies directly. Thus assume the mode is not one of the X i . Consider a sequence {v j } which has limit coordinates γ = (γ 1 , . . . , γ N ) which may be ±∞. Let ϕ γ be the piecewise linear function given by linearly interpolating γ. Then ϕ γ has a flat modal region on [Z i , Z j ] for some i < j. Since e ϕγ (x) dx = 1, we have ϕ γ (m) ≤ log(1/(Z j − Z i )). Since m is the mode of e ϕγ no coordinate of γ is positive infinity, if one of the coordinates is −∞ then Ψ n (ϕ γ ) = −∞. This shows we can consider the continuous function Ψ n on a compact set so it achieves a maximum. The proof that if ψ n maximizes Ψ n over C m then e ψn(x) dx = 1 as well as the proof that ψ n is unique are as in the unconstrained case (Rufibach (2006)).
For Theorem 2.1 C, note that where the last argmax is over log-concave functions with mode at 0 and which integrate to 1/2 on [0, ∞).
The proof of Theorem 2.2 B is standard. See Doss (2013b). For the proof of Theorem 2.4 B we need to introduce a certain cone C ⊆ R d (defined, e.g., on page 13 of Rockafellar (1970)). We say the cone C is (finitely) generated by a 0) and (x) + = max(x, 0). Proposition 8.1. C n,m is a convex cone with finite generating set given by Proof. It is clear that C n,m is a cone because concavity and the mode are preserved under positive scaling. For ϕ ∈ C n,m , with ϕ ( Proof of Theorem 2.4. Proof of Part B: First we assume ϕ 0 n is the MLE and use (2.5) to show that (2.8) and (2.9) hold. Using the generating functions described in Proposition 8.1 as our Δ yields equations (2.8) and (2.9) via integration by parts. That is, for t ≤ m, we choose Δ(x) = (x − t) − (which is concave with m as a mode). Then integration by parts yields and, recalling that we have already shown F 0 n (X (n) ) = 1, this is equivalent to F 0 n (x)dx, so we have (2.9). We get equality at some knot points also: set Δ(x) = (x − b) + where b ≥ m is any RK. Then, by the definition of a RK, Δ is an allowable perturbation because ϕ 0 n (b+) − ϕ 0 n (b−) > 0 so for some δ small enough, ϕ 0 n + δΔ is still concave with mode at m. Using this Δ we get F 0 n (x)dx, and thus for any b ≥ m that is a RK we have the inequality both ways, We have thus shown that (2.8) and (2.9) hold with the appropriate equalities. Now we will show the reverse implication. We assume (2.8) and (2.9) hold and consider Δ with support [X (1) , X (n) ] and piecewise linear with knots at the Z i . These are all the Δ's we need to consider, since the rest were ruled out previously by elementary considerations. We also need ϕ 0 n + εΔ to be concave with mode m. Now, we do not know if m will be a NK or a LK or a RK, so we argue by cases. If m is a knot for ϕ 0 n in one direction, without loss of generality, we can say that m is a RK and we have c m F n (x)dx = c m F 0 n (x)dx for any c > m that is also a knot. Recall that we have defined the indices 1 = j 1 , . . . , j l 0 = N so that Z ji are the knots. We write with β j ≥ 0 and all C i ≥ 0. Since m is a RK, m is not also a LK (otherwise m is simply a knot and ϕ 0 n coincides with the unconstrained MLE and the characterization of the unconstrained MLE in (Dümbgen and Rufibach, 2009) implies we are done). This forces C p = 0 (which refers to the interval (Z jp−1 , m = Z jp ]). We thus have as desired, where the inequality follows from noting that −β j n (x)dx by assumption and because β j ≥ 0. We also have for all i except for i = p, by the equality-at-knots assumption. However, for i = p, we have C i = 0. An analogous argument holds for the case where m is an LK and for the case where m is neither an LK nor an RK. Part C follows as in the proof of Theorem 2.1 C; g + n is the MLE over LC 0 of |X 1 |, . . . , |X n |. Thus we apply the result of Part B. Note that 0 ∈ S n ( ψ 0 n ) ∩ [0, |X| (n) ] only if ( ψ 0 n ) (0+) < 0 (so that 0 is a right knot). This is only possible if 0 is an observed data point.
Proof of Corollary 2.7 B and C follow as in the unconstrained case (Corollary 2.5, Dümbgen and Rufibach (2009)).

Proofs for local rates of convergence
This section is devoted to showing, Proposition 7.1, stated above, which is needed for proving Theorems 4.8, and 4.9. See 7 for a discussion of the proof of Proposition 7.1, the full details of which are given in Doss (2013b) (Corollary 4.2.7). Our main goal here is to prove Proposition 7.3.
Proof of Proposition 7.3. We first consider the case ξ n = m. For ease of notation and without loss of generality, we assume m = 0 and abbreviate τ 0 n,+ (0) by τ 0 n,+ and τ 0 n,− by τ 0 n,− . We will argue via a family of perturbations which can be separated into subfamilies, depending on whether 0 is a left-knot (LK), 0 is a right-knot (RK), or 0 is not a knot (NK). If 0 is a one-sided knot (LK or RK), we have different perturbation subfamilies depending on whether τ 0 n,+ > − τ 0 n,− or not. We will start with the case in which 0 is a LK and we construct Δ that has the two properties Whereas in the unconstrained case construction of such an acceptable perturbation function was straightforward (Balabdaoui, Rufibach and Wellner, 2009), in the constrained case, construction of such a Δ that is an acceptable perturbation (i.e. keeps the mode fixed) is much less straightforward. We consider several cases separately.

Lemma 8.2. We continue with the setup of Proposition 7.3. That is, we define
and Δ 1 as in (8.11). Then for all ε > 0, and where K > 0 is from Lemma 8.3, and does not depend on f 0 .
Proof. We examine Δ 1 (x)( f 0 n −f 0 )(x)dx by repeated Taylor expansions, where we let Δ 1 be Δ LK,1 or Δ NK,1 , and we will expand at m, which we again take to be 0, without loss of generality.
and note that F is VC-class with VC-index of 4. Thus Theorem 2.6.7 on page 141 of van der Vaart and Wellner (1996) shows that the entropy bound condition in Lemma A.1 on page 2560 of Balabdaoui and Wellner (2007) holds for F. Then the function F b,R (x), defined to be constant equal to (7/4)R on [b, b + R] and 0 otherwise, is an envelope for F. That F b,R is an envelope is immediate for Δ ∈ F N K,b,R and for the setting where τ 0 n,+ < −τ 0 n,− and Δ ∈ F LK,b,R (and analogously when τ 0 n,+ > −τ 0 n,− and Δ ∈ F RK,b,R ). (For Δ ∈ F N K,b,R , the longer interval has slope ±1 and the other interval has opposite sign slope. For the case τ 0 n,+ < −τ 0 n,− and Δ ∈ F LK,b,R , the interval [τ 0 n,− , τ 0 n,− /2] has slope 1 and the slope on the rest is opposite sign (and analogously for F RK,b,R ).) For the case τ 0 n,+ ≥ −τ 0 n,− and Δ ∈ F LK,b,R , we need only note so that (−τ 0 n,− /4)m 2 ≤ (3/4)τ 0 n,+ . Next, we compute the integral of the envelope squared where f 0 ∞ is the supremum over R of the (log-concave) density f 0 , and is thus universal across b and R. Thus, we can conclude from Lemma A.1 on page 2560 of Balabdaoui and Wellner (2007) that for ε > 0 and with s = 2 and d = 2, for some K > 0.

Proofs for mode-constrained limit process
This entire section is devoted to the proof of Theorem 4.2, and thus throughout this section we take the definitions and assumptions as given in that theorem.
Proof. By Theorem 4.2, displays (4.8) and (4.9), H 0 − Y 0 ≤ 0, which allows us to conclude (8.21) the first line follows since H L − Y L is differentiable on (−∞, 0), and a differentiable function has derivative 0 at a local maximum (see, e.g., Dieudonné (1969), page 153, Problem 3, part (a)). The same argument applies to the second line of (8.21). Now, the following argument holds with probability 1 and for any fixed c > 0. On [0, c], H R has a bounded second derivative, so that there exists a constant a > 0 such thatH We also haveH R +A ≤Ỹ R +A =Ỹ by (4.9), so that, letting MỸ be the greatest convex minorant ofỸ on [0, c], we haveH R + A ≤ MỸ ≤Ỹ , sinceH R + A is convex and belowỸ . Let T = x ∈ [0, c] : MỸ (x) =Ỹ (x) . By the proof of Corollary 2.1 of Groeneboom, Jongbloed and Wellner (2001a) (see also Definition 1 and Theorem 1 of Sinai (1992)), T is a (Cantor-type) set which has Lebesgue measure 0; (8.22) is contained in T , and thus (by (8.21) and) by letting c → ∞, we see that S R is contained in a set which has Lebesgue measure 0. Finally, S R is closed because H R − Y R and (H R − Y R ) are both continuous functions. By an analogous argument, we can conclude that S L is closed and has Lebesgue measure 0 and thus also S 0 is closed and has Lebesgue measure 0. By (4.10), (8.23) (regardless of whether one of τ 0 + or τ 0 − is 0 or not). Thus we now conclude that S 0 ⊂ S 0 as follows. If τ is an element of the set S 0 then for any ε > 0 ( ϕ 0 ) (τ − ε, τ + ε) < 0 (here ( ϕ 0 ) refers to the signed measure corresponding to ( ϕ 0 ) ). This is by the definition of derivative; since does not converge to 0 as δ 0, then for all ε > 0 there is 0 < δ < ε such that (8.24) is < const., i.e., and so the integral on the right-hand side of (8.23) is strictly less than 0. Thus if τ ≥ τ 0 + , then (H R − Y R )(τ ) = 0. For τ > 0, this implies that τ ∈ S R by (8.21). Similarly, if τ ≤ τ 0 − and τ < 0, then (H L − Y L )(τ ) = 0 and τ ∈ S L . We have (τ 0 − , τ 0 + ) ∩ S 0 = ∅ by Lemma 4.3, so we have shown that if τ ∈ S 0 , then τ is either 0 (if one of τ 0 − or τ 0 + is 0) or is in S L or S R , i.e. τ ∈ S 0 . Now, by the proof of Theorem 1 of Sinai (1992), any fixed point t ≥ 0 belongs to T ⊃ S R with probability zero. An analogous statement holds for t ≤ 0 and S L . Thus, if t = 0, t ∈ S 0 and so (H L ) (2) is concave and affine in a neighborhood of t, so (H L ) (3) (t) is well-defined.

By Lemma 8.4,
This is because, by its definition, either τ L < 0, in which case τ L ∈ S L , or there is a sequence τ 0 L,n ⊂ S L with τ 0 L,n < 0 for all n. In this latter case, since . This suggests the following definitions: It is then true by definition that It is furthermore true that is an affine function, and we just verified in (8.26) that A L (t) ≡ 0. An analogous statement holds for H R . Remark 8.5. Note that since S 0 is closed, τ + , τ − are both elements of S 0 . Also, it is possible for H R . This is why we add the point 0 to S 0 . Lemma 8.4 suggests that we can think of ϕ 0 as being piecewise affine (with a potentially uncountable number of knot points), because with probability 1, the union of the open intervals on which ϕ 0 is affine has full Lebesgue measure on the real line (meaning its complement has Lebesgue measure 0). For t ∈ R, we let τ 0 + (t) be the first knot larger than t, and analogously for τ 0 − (t), (8.32) Lemma 8.6. We again assume the full setup of Theorem 4.2. Then, for any (fixed or random) T ≥ 0, with probability 1 there are 'knot points' τ 0 + (T ) and τ 0 − (T ) in S 0 . Proof. We fix T ≥ 0, and we will show that there exists τ + (T ) ∈ S 0 with τ + (T ) > T ≥ 0. We assume for contradiction that ϕ 0 has no knots on (T, ∞), and thus is linear. Thus H R is cubic on [T, ∞), so can be written as for some new random coefficients, B i (where only for i = 0, 1 are B i not equal to A i ). Now, let ϕ(t) = 2 3 t 3 log log t. Then by page 1714 of Lachal (1997) (or from page 238 of Watanabe (1970)), we know that almost surely lim sup which gets larger than 0 for t large enough, as it is almost surely bounded below by a quadratic polynomial (with positive first coefficient) minus 1. This contradicts the fact that Y R (t) − H R (t) ≤ 0 for all t, so we are done. Our argument applies with probability 1 to any T ≥ 0, and thus to the entire sample space of any random T ≥ 0. An identical argument works for showing there exists a knot less than −T .
We will not speak of ϕ 0 as a minimizer of an objective function, but we will instead show that for acceptable Δ perturbations that Δ(t)( ϕ 0 (t)dt−dX(t)) ≥ 0, i.e. ϕ 0 behaves as we would expect a minimizer to behave. (8.33) and thus, by Lemma 8.6, lim sup a→∞ a −a Δ(t)( ϕ 0 (t)dt − dX(t)) ≥ 0. Proof. We use the notation g (a, b]

Proposition 8.7. We assume the full setup of Theorem 4.2. Let
By Lemma 8.4, a ∈ S L , b ∈ S R , and since neither a nor b is 0, we have (F R − X R )(b) = 0 = (F L −X L )(a) and we recall (4.10). Also recalling that (H L −Y L ) = −(F L − X L ), we see that the above display equals where the inequality follows because each of the three lines in the final expression is ≤ 0, as follows. The first line is equal to 0 by (4.7); the third line is ≥ 0 by (4.8) and (4.9), and the fact that Δ is concave so Δ is monotonically nonincreasing so dΔ is a nonpositive measure; similarly, the second line is ≥ 0 because Δ has maximum at 0, so that (H R − Y R )(0), Δ (0+), (H L − Y L )(0), and −Δ (0−) are nonpositive.
The above proof can be extended to Δ such that ϕ 0 (t) + εΔ(t) ∈ G 0 , where G 0 is the set of concave functions with maximum at 0, but we will not need this per se. Rather, in the next result we will express the same idea by showing for knots a < 0 < b that b a ϕ 0 (t) ϕ 0 (t)dt − dX(t) = 0, and re-express this via integration by parts formulae.
Proposition 8.8. We again assume the full setup of Theorem 4.2 and assume that a, b ∈ S 0 , and a < 0 where the last equality is by (4.7). Since ϕ 0 is constant on (τ 0 − , τ 0 + ), and using the notation g[c, d] = g(d) − g(c−), we can write the last expression above as using integration by parts formula (see Lemma 9.1) for the first equality since H L − Y L and H R − Y R are continuous, and using (4.10) for the third equality. The second equality follows from Lemma 8.4, since a knot a and limit of knots τ − are elements of S L , and similarly b, τ + are elements of S R .
Next, we show a tightness-type of results for the bend points. Recall the definition (8.32) of τ 0 − (t) and τ 0 + (t). Lemma 8.10. Let the assumptions of Theorem 4.2 hold. Then, for all ε > 0 there exists M ε such that for all t > 0, where M ε does not depend on t.
Proof. We will show for all t, ε > 0 there exists M = M ε such that P (τ 0 + (t) > t + M ) < ε. The statement for τ 0 − (−t) is analogous. By Lemma 8.6, for any t we can find τ 2 ∈ S 0 where τ 2 < ∞ is taken to be τ 0 + (t); similarly, we can take t ≡ t ε large enough such that with probability 1 − ε there exists a knot 0 < τ 1 < t. To match notation up with Lemma 8.9, we will define Δg andḡ, for any function g, as in the lemma. Since ϕ 0 is affine on [τ 1 , τ 2 ], Lemma 8.9 allows us to conclude that The "if and only if" follows because Y R (t) = Y (t) + A(t) where A(t) is a random affine function. Since for any affine function A(τ ) =:Ā, we see that Since ΔX trivially equals ΔX R , we have shown (8.42). Let M ε > 0 and let B t be the event {0 < τ 1 < t, τ 2 > t + M ε }. We then see where we now show that the last inequality follows from page 1633 in the proof of Lemma 2.4 in Groeneboom, Jongbloed and Wellner (2001a). We have already noted that Y R (τ ) ≤Ȳ R − 1 8 ΔX R Δτ if and only if Y (τ ) ≤Ȳ − 1 8 ΔXΔτ . Then Groeneboom, Jongbloed and Wellner (2001a) show algebraically that this inequality can be rewritten as
The probability we consider in (8.43) is on the event The only cost for this is we need to double our M ε for this to correspond with the probability in (8.45). Thus (8.43) holds, but we do not yet have independence from t because of the t ε in the expression. We easily circumvent this by replacing M ε by t ε +M ε . Now we have shown (8.38) holds with M ε independent of t. Now we show (8.39). Note that we can write an analogous version of (8.43) for t > M ε as because, by the argument we just went through, the probability in the third line is again bounded by (8.46). Note we have t in place of t ε in (8.47), so the above statement is already independent of t as long as t > M ε . Thus we have shown P can be done analogously.
The next result will relate the unconstrained and constrained limit estimators in the Gaussian setting. Then we can say that for all ε > 0, there exists M ε , not depending on t, such that Proof. Define a right-side sequence of knots to be a sequence of points where ν i are knots for ϕ and ν 0 i are knots for ϕ 0 . Similarly, define a left-side sequence of knots Then we argue by the Intermediate Value Theorem and the Mean Value Theorem. First, we assume we are given such a sequence, without loss of generality take it to be a right-side sequence (on the probability 1 event on which Theo-rem 4.1 holds). Then we can say, by our hypotheses, that By the Intermediate Value Theorem we can pick points is a (random) affine function, we can conclude for i = 1, 2, 3 that We apply the Mean Value Theorem and get t i ∈ (x i , x i+1 ) for i = 1, 2 such that Again applying the Mean Value Theorem, we get s ∈ (t 1 , . Now we will construct right-side sequences or left-side sequences of knots and be done. Note that by Lemma 8.10 and the analogous lemma for the unconstrained case, Lemma 2.7, page 1638, of Groeneboom, Jongbloed and Wellner (2001a) (8.54) where the derivatives can be taken to be right or left derivatives.
Proof. This follows from Lemma 8.10 and an argument similar to the finite sample tightness results. We can pick, by Corollary 8.11, t − 2M < s −2 < s −1 < t < s 1 < s 2 < t + 2M where ϕ 0 (s i ) = ϕ(s i ) for i = −2, −1, 1, 2, with probability 1 − ε for M appropriately large. Then and, similarly, where ϕ and ( ϕ 0 ) can be either the left or right derivatives. Thus Let h 0 (t) = −12t 2 . With high probability, the right side of (8.55) is bounded by (8.56) which is less than M + M + 24 · 2M with probability 1 − ε, independently of t, by (2.36) or (2.37) of Lemma 2.7 on page 1638 of Groeneboom, Jongbloed and Wellner (2001a). Thus we have shown the second statement in (8.54), which we will now use to show the first statement in (8.54). We first apply Lemma 9.2 to the difference | ϕ 0 (t) − ϕ(t)| by applying (9.1) to both ϕ 0 − ϕ and to ϕ − ϕ 0 , using the points s −1 and s 1 as a and b, respectively. Then by (9.1), we can bound both of these differences if we can bound both (8.58) since all the other terms are 0 by the definition of the s i . Here we can take the derivatives to be either left or right derivatives. As in (8.56), we can bound The first and last terms are bounded by the second statement in (8.54). The middle term is shown to be bounded by (8.56). The middle term also bounds (8.58). All of this is with probability 1−ε and uniformly in t, so we are done.
For the next lemma, let h 0 (t) = −12t 2 be the "true" concave function.
Lemma 8.13. Let the definitions and assumptions of Theorem 4.2 hold. Then, for all ε > 0 there exists M ε , independent of t, such that where the derivatives can be right or left derivatives.
Recalling h 0 (t) = 12t 2 , we see the previous display equals Thus we can conclude that (8.63) The proof will now proceed as follows. We first will show that the right hand side of (8.63) is finite. For a function g : R → R, we let g b a = sup t∈ [a,b] 2 dλ is finite we will then conclude that ||ϕ 1 (t) − ϕ 2 (t)|| ∞ n → 0 as n → ∞. We then will revisit our earlier argument which showed the right hand side of (8.63) was finite and use this new fact to show that (8.63) is, in fact, 0. This will finish the proof.
Thus, our next step is to show that Note that we only need to control the lim inf n of the right hand side of (8.63) since n −n (ϕ 1 − ϕ 2 ) 2 dλ is non-negative and non-decreasing in n. We will first show that where we can take ϕ i to be the right-derivative, but this choice is inconsequential because of the almost sure continuity of W . Thus, by (8.69) and (8.70), for all n, with probability 1 − ε, we can conclude that Lemma 8.14 shows for i = 1, 2 that Thus since, by (8.68), , and since this argument is perfectly symmetrical and applies to the interval [a 1 n , a 2 n ], we have now shown that the right hand side of (8.63) is O p (1) and thus finite almost surely, as desired. Now that we have shown that ∞ −∞ (ϕ 1 − ϕ 2 ) 2 dλ < ∞ almost surely, we can conclude that almost surely as n → ∞, and now using (8.65) with arguments similar to those used above, we will show ∞ −∞ (ϕ 1 − ϕ 2 ) 2 dλ = 0 almost surely. By (8.65), Lemma 8.14 below allows us to conclude that almost surely b 2 n b 1 n |ϕ 1 −h 0 |dλ → 0. Thus we can reexamine (8.64) and see that the right side is bounded above by where we may choose n large enough to make the inequality occur with probability 1−ε for any positive ε. (1), we have shown that we may choose n large enough that with probability (8.66) Next we show that the other term in (8.63), An (ϕ 1 − h 0 ) 2 − (ϕ 2 − h 0 ) 2 dλ/2, is small. By Lemma 8.14, for any ε > 0 we may pick an M ε such that both | b 2 n b 1 n (ϕ 1 − h 0 ) dλ| and b 2 n − b 1 n are bounded by M ε with probability 1 − ε. Thus, defining ε 2 = ε/M ε we take n large enough such that with probability 1 − ε we have ||ϕ 1 − ϕ 2 || ∞ n < ε 2 . Then let δ(t) = ϕ 1 (t) − ϕ 2 (t) and conclude that and that the above display is bounded above by with probability 1 − 2ε and n large enough. Similarly, with probability 1−2ε. Thus we have shown that with probability approaching 1 both terms in (8.63) are bounded by ε as n goes to infinity. Thus since n −n (ϕ 1 − ϕ 2 ) 2 dλ is non decreasing in n, n −n (ϕ 1 − ϕ 2 ) 2 dλ < ε with probability 1 − ε and thus it must be 0 almost surely.
The following lemma translates Lemma 8.13 into a more direct tightness result.
Lemma 8.14. Let the definitions and assumptions of Theorem 4.2 hold and let h 0 (t) = −12t 2 . Let ϕ i , i = 1, 2, be as in (8.62) and a i n and b i n as defined on page 2439. We then have Furthermore, for i = 1, 2 and any ε > 0 and k > 0, there exist K ε , K ε,k > 0 such that with probability greater than 1 − ε we have (in which we take ϕ i to be either the right of the left derivative) and thus that where K ε and K ε,k do not depend on n. Further, if almost surely ϕ 1 −ϕ 2 ∞ n → 0 as n → ∞ then we can conclude that almost surely as n → ∞, for i = 1, 2. The statements also hold if we replace b 1 n by a 2 n and b 2 n by a 1 n . Proof. (8.68) follows immediately from Lemma 8.10.
Next we will show (8.69) and (8.70). Let g 1 and g 0 be monotone functions. Then for any t ∈ [a, b], we have that and similarly g 0 (t) − g 1 (t) ≤ g 0 (b) − g 0 (a) + g 1 (a) − g 0 (a). Thus By monotonicity and Lemma 8.13, we can say where ϕ i refers to either the left or the right derivative. This is independent of n thanks to the linearity of h 0 . Thus we have shown (8.70). Now we establish (8.69). Fix i ∈ {1, 2}. We will apply Lemma 9.2 twice, with ϕ i as g 1 and h 0 as g 0 , and then with the reverse assignments. We let [a, b] = [n, n + M ]. Regardless of the choice of which is g 1 and which is g 0 , we can bound the first two terms in (9.1), the weighted differences λ(g 1 (n + M ) − g 0 (n+M ))+(1−λ)(g 1 (n)−g 0 (n)), by 2M ε with probability 1−ε, independently of n, by Lemma 8.13. If h 0 is g 0 then for the third term of (9.1) we have to bound (h 0 (n+) − h 0 (n + M −)) = 24M which is independent of n. If ϕ i is g 0 , then for the third term of (9.1) we have that |ϕ i (n+M −)−ϕ i (n+))| is bounded above by which we can again bound independently of n by the linearity of h 0 and Lemma 8.13 with probability 1 − ε. Since [b 1 n , b 2 n ] ⊂ [n, n + M ] with probability 1 − ε for appropriately large M , and since (n + M − t)(t − n)/M ≤ (M/2) 2 /M which is independent of n and t, the bound is independent of n or t. Thus we have shown (8.69). Then (8.71) follows immediately from (8.69) and (8.68), since we at a or at b is larger than ε/2 in absolute value. Since we can take n large enough that |ϕ ω i −h 0 | is less than ε/2 at any a, b > n, by contradiction we have that | n , b 2 n ) : (ϕ ω 1 ) (t) ≤ (ϕ ω 2 ) (t) are both intervals by monotonicity of (ϕ ω 1 ) and linearity of (ϕ ω 2 ) on [b 1 n , b 2 n ], we can conclude that b a |(ϕ ω i ) − h 0 |dλ < ε as desired.

Proof for symmetric limit process
Proof of Theorem 4.4. The proof follows as in the proof of Theorem 4.2 in the previous section, where we replace H R by H + , τ 0 + by τ + + , τ 0 R by τ + R , and we take H L to be 0, and also replace τ L and τ 0 + by 0. In particular, we can see that analogs of the characterizing equalities and inequalities hold, and that an analog of Lemma 4.3 holds. The analog of Lemma 4.3 can immediately be seen to hold since by definition ( ψ 0 ) (2) (t) = 0 for t ∈ (0, τ + + ). Proofs similar to the proofs of Proposition 8.8 and Proposition 8.7 show the following characterizing proposition holds.
And if Δ = ψ 0 then the inequality is an equality. (recalling the definitions of F n,L and F n,R from (2.7)). Additionally, let The terms for the constrained processes which would correspond to B n turn out to be 0. Also, A n,L and A n,R appear to be off by a sign change when compared with A n : this is because of the definitions of our left-and right-processes, which entails, e.g., Note that in Balabdaoui, Rufibach and Wellner (2009), Y f n is denoted by Y loc n and similarly for H f n . The proof proceeds as follows. We will derive the limit distribution for the empirical process-type Y and X terms. We will show that the estimator-type H terms (and appropriate derivatives) are tight, and also satisfy characterizations analogous to those given in Theorem 4.1 and Theorem 4.2. We argue then (by a continuous mapping argument) that a characterization must hold in the limit (along subsequences, using tightness of the H processes) and then apply Theorem 4.1 and Theorem 4.2 to conclude that the limit is as desired.
For 0 < c ≤ ∞, define where "cadlag" means right-continuous functions which have limits from the left. If c = ∞ then we we interpret the definition of C ∞ to mean continuous functions h defined on (−∞, ∞). We let f be the supremum of f over its domain, and this is the distance we use in C c when c < ∞. When c = ∞ we use the topology of convergence on all compacta (see Whitt (1970)). For D c the uniform norm is too strong, so generally one uses a Skorokhod norm (Skorokhod (1956), see also Billingsley (1999)). We endow, for the moment, D c with the J 1 Skorokhod norm (referred to as "the" Skorokhod topology in chapter 12 of Billingsley (1999)). When we come to proving tightness of our Htype processes we will further discuss topological details. Now we focus on the where for all |b| ≤ c, 0 ≤ M n (b) ≤ M = O(1) almost surely. Next we use that B n (t) = W n (t) − tW n (1) where W n (t) = B n (t) + tN is a Brownian motion and N is a standard Normal random variable. Thus (8.77) equals which is equal to This shows that sup b∈ [−c,c] |A n,1 (b) − G n,1 (b)| = o p (1). Using this, we see that the process n 4/5 t n,b m v m dD n dv defined on this probability space equals . This completes the proof for two of the terms and the other four are analogous.
Lemma 8.17. Let P n ≡ (P n,1 , . . . , P n,6 ) be a vector of drift terms where we let

Then the vector of processes
can be defined on a common probability space with a sequence of Brownian motion processes W ≡ W n such that for 0 < c < ∞ sup b∈ [−c,c] | where G n is as in Lemma 8.16.
Proof. We will show that the statement holds for X f n,R . The proof for the other X terms and for all the Y terms are similar. Now, n −3/5 X f n,R (b) equals of a function f over its domain. It will be clear from context which usage is intended.) For a function x and δ > 0, let w s (x, δ) = sup x(t 2 ) − [x(t 1 ), x(t 3 )] , where the sup is taken over t 1 , t 2 , and t 3 such that −c ∨ (t 2 − δ) ≤ t 1 < t 2 < t 3 ≤ c ∧ (t 2 + δ). 1 Note that since sequences that converge in the J 1 topology also converge in the M 1 topology (Whitt (2002)), the weak convergences proved for the empirical processes in Lemma 8.22 still hold when we use the M 1 topology. By Whitt (1980) (see Theorem B.0.2 of Doss (2013b)), F c,M is a complete, separable metric space. And furthermore we have the following. This is the fundamental property we need for tightness arguments, to which we now proceed. Proof. We will discuss the tightness for the left-side processes. The argument for the right-side processes is analogous. Proposition 7.1 shows that for any ε, we can take M > 0 large enough that ( H ϕ n,L ) lies in F c,M with probability 1 − ε. Since F c,M is precompact in D c by Proposition 8.23, ( H ϕ n,L ) is tight. Then ( H ϕ n,L ) is uniformly bounded by Proposition 7.1, and since its derivative is uniformly bounded, and since the set of functions with their values as well as the values of their derivatives uniformly bounded by M is compact in C c (via the Arzela-Ascoli theorem, see e.g. Royden (1988)), we can conclude that ( H ϕ n,L ) is tight in F c,M . Similarly, since integrals on bounded intervals of uniformly bounded functions are also uniformly bounded, and by Lemma 8.25 below, together with the fact that n 1/5 (s n,L −b) is O p (1) by assumption we see that ( H ϕ n,L ) and H ϕ n,L are uniformly bounded, and their respective derivatives are uniformly bounded, so we can again conclude that they are tight. An identical argument works for the right-side processes.
We will want to consider our processes in C ∞ and in D ∞ . For the continuous processes in C ∞ , Corollary 5 of Whitt (1970) says that processes that are tight in C c for all 0 < c < ∞ are then tight in C ∞ . By Theorem 12.9.3 of Whitt (2002) (with Prohorov's theorem, e.g. van der Vaart and Wellner (1996) page 21), processes that are tight in D c , 0 < c < ∞, are tight in D ∞ . For the next lemma, recall the definitions of A n,L , A n,R in (8.74) and (8.75). Lemma 8.25. As n → ∞, |A n,L | → 0 and |A n,R | → 0, almost surely. (8.96) Proof. Because s n,L is strictly less than m, we can apply Corollary 2.7 B, so that |A n,L | = n 3/5 F n,L (s n,L ) − F 0 n,L (s n,L ) ≤ n −2/5 → 0 almost surely. Similarly, since F n,L (X (n) ) = 1 = F n (X (n) ), by the same corollary, |A n,R | → 0 almost surely.
With Lemmas 8.21,8.22,and 8.24 in hand, we can now finish the proof of the theorem. Fix a subsequence n . Let c ×D c ×C c with 0 < c < ∞. This means they are also tight in E ∞ , by the discussion after Lemma 8.24. Thus there exists a subsubsequence n such that Z n ,R and Z n ,L converge weakly. By the Skorokhod construction (see e.g., Chapter 14 of Shorack (2000)), we may assume that the convergence is almost sure (a.s.). Let (Z 0,L , Z 0,R ) be the limit and let Z 0,L = (H R ) defined as in (4.2). There must be a sequence of knots τ n ,R ∈ (S n ( ϕ 0 n ) ∩ (m, ∞)) such that n 1/5 (τ n ,R − m) → τ R a.s. To see that we can take τ n ,R strictly greater than m, by (8.97) we see the only way τ R = 0 is if there exists a sequence of points of S 0 (H (2) R ) strictly greater than 0 and converging to 0. Similarly there is a sequence τ n ,L ∈ (S n ( ϕ 0 n ) ∩ (−∞, m)) such that n 1/5 (τ n ,L → τ L a.s. In our definitions of the (f -and ϕ-) processes, s n,R was any knot strictly greater than m satisfying n 1/5 (s n,R − m) = O p (1), and analogously for s n,L . Take s n ,R = τ n ,R and s n ,L = τ n ,L . Then let s n = (n ) 1/5 τ n ,L − m, τ n ,R − m, τ 0 n ,− − m, τ 0 n ,+ − m , let s n → (τ L , τ R , τ − , τ + ) and let Z 0 = (Z 0,L , Z 0,R , τ L , τ R , τ − , τ + ), where τ − and τ + are the limits of the corresponding terms again by tightness from Proposition 7.3. By Lemma 8.22, if we let Y ≡ Y a,σ as defined in (4.18) with a = |ϕ (2) 0 (m)|/4! and σ = 1/ f 0 (m), then dY .

Technical lemmas
Here is a statement of the general integration by parts formulas for functions of bounded variation, used in our proof of Proposition 8.8. See, e.g., page 102 of Folland (1999) for the definition of bounded variation.
Lemma 9.1 (Folland (1999)). Assume that F and G are of bounded variation on a set [a, b] where −∞ < a < b < ∞ A. If at least one of F and G is continuous, then (a,b] F dG + B. If there are no points in [a, b] where F and G are both discontinuous, then [a,b] F dG + The next lemma is proved in Doss (2013b), page 143, for convex rather than concave functions. Lemma 9.2 (Doss (2013b)). Let g 1 and g 0 be concave functions on [a, b], and let t ∈ [a, b]. Then (9.1)