Optimal experimental design that minimizes the width of simultaneous confidence bands

We propose an optimal experimental design for a curvilinear regression model that minimizes the band-width of simultaneous confidence bands. Simultaneous confidence bands for curvilinear regression are constructed by evaluating the volume of a tube about a curve that is defined as a trajectory of a regression basis vector (Naiman, 1986). The proposed criterion is constructed based on the volume of a tube, and the corresponding optimal design that minimizes the volume of tube is referred to as the tube-volume optimal (TV-optimal) design. For Fourier and weighted polynomial regressions, the problem is formalized as one of minimization over the cone of Hankel positive definite matrices, and the criterion to minimize is expressed as an elliptic integral. We show that the M\"obius group keeps our problem invariant, and hence, minimization can be conducted over cross-sections of orbits. We demonstrate that for the weighted polynomial regression and the Fourier regression with three bases, the tube-volume optimal design forms an orbit of the M\"obius group containing D-optimal designs as representative elements.


Introduction
Suppose that we observe pairs of explanatory variables x i ∈ X and response variables y i ∈ R, i = 1, . . . , N. Here, X ⊂ R is a domain of explanatory variables, and typically, a segment of R. For such data, we consider the regression model where b = (b 1 , . . . , b n ) ⊤ is an unknown coefficient vector, and f (x) = (f 1 (x), . . . , f n (x)) ⊤ , x ∈ X , is a piecewise smooth regression basis vector. For the problem of this paper, we assume that the variance function σ 2 (x) > 0 is known. When σ 2 (x) is not a constant, the regression model is called weighted .
In the case of experimental data, the explanatory variables x i can be chosen arbitrarily within its domain X ⊂ R. The allocation of {x 1 , . . . , x N } ⊂ X to optimize some target function is called optimal experimental design. For example, for D-optimality, we take a function det(Σ) with Σ = Var( b), where b is the ordinary least square (OLS) estimator of b. Here, the information matrix. Following Kiefer and Wolfowitz (1959), in an optimal design, allocation {x 1 , . . . , x N } is regarded as the probability measure over X with mass p i = 1/N at each point x i . We write this discrete probability measure as Viewed from this point, the problem is formalized as that of optimization with respect to the probability measure over X . Most criteria in the literature including the criterion mentioned above are convex or concave functionals of the probability measure, and can be considered in the framework of convex analysis (Wynn, 1985;Pukelsheim, 2006). In this paper, we propose a new non-convex criterion based on simultaneous confidence bands. The pointwise confidence band is based on the confidence region for regressor b ⊤ f (x) at a fixed point x. On the other hand, the simultaneous confidence band is the confidence region for the full regression curve {(x, b ⊤ f (x)) | x ∈ X } ⊂ R 2 . The standard form of the simultaneous confidence band of hyperbolic-type is of the form (1.1) where u ± v stands for the region (u − v, u + v). The threshold c α is determined so that the event (1.1) holds for all x ∈ X with given probability 1 − α (Working and Hotelling, 1929;Scheffé, 1959;Wynn and Bloomfield, 1971;Liu, 2010). The simultaneous confidence bands are useful when x cannot be determined in advance. (See van Dyk (2014) for an application in experimental particle physics.) As shown in the next section, c α is also a functional of the allocation {x 1 , . . . , x N }, and hence, we can consider an optimal design that in some way minimizes both the threshold c α and f (x) ⊤ Σf (x). In fact, from the general equivalence theorem of Kiefer and Wolfowitz (1959), the design measure that minimizes max x∈X f (x) ⊤ Σf (x) coincides with the D-optimal design. Therefore, we propose the use of c α as a criterion of optimal design, and consider the corresponding optimal design as the tube-volume optimal (TV-optimal) design. If a design is optimal under both the tube-volume criterion and the D-criterion, it becomes the universal optimal design to minimize the width of confidence bands (1.1). Indeed c α substantially depends on the design. In Table 4.3 of Section 4.3, we will show how the difference in design affects the width of the simultaneous confidence band.
From its definition, c α is a complicated function of Σ. However, when α is small, c α tends to a simpler function. This approximation is due to the volume-of-tube method used to construct simultaneous confidence bands in curvilinear regression curves (Naiman, 1986;Johansen and Johnstone, 1990;Sun and Loader, 1994;Lu and Kuriki, 2017). The volume-of-tube method is a methodology to approximate the probability of the maximum of a Gaussian random field (Sun, 1993;Takemura, 2001, 2009;Takemura and Kuriki, 2002;Adler and Taylor, 2007). As shown later, c α corresponds to the upper tail probability of the maximum of a Gaussian field, and hence, the volume-of-tube method works well.
As concrete regression models, weighted polynomial and Fourier regressions are mainly covered here. In these models, we will see that there is a group referred to as the Möbius transform that keeps the tube-volume optimal design problem invariant. In general, a group action simplifies problems. (See Section 13 of Pukelsheim (2006) for invariant optimal experimental design.) The use of such group invariance is another subject of this paper.
The outline of this paper is as follows. Section 2 summarizes the volume-of-tube formula to construct approximate simultaneous confidence bands, and formalizes the tubevolume criterion and the corresponding optimal design. Section 3 analyzes the tubevolume optimal designs for Fourier and weighted polynomial regressions. The Möbius group is proved to keep the optimization problem invariant, and hence can be used to reduce the dimension of the problem. Using this consideration, Section 4 identifies the tube-volume optimal design in the weighted polynomial regression and the Fourier regression when n = 3. Some proofs are given in Appendix.
2 Tube-volume optimal design 2.1 Volume-of-tube formula for simultaneous confidence bands In this subsection, we briefly summarize the volume-of-tube method. This is a general methodology used to approximate the probability of the maximum of a smooth Gaussian random process or random field. Here, we describe how this method is used to determine threshold c α .
As mentioned in Section 1, threshold c α should be determined as a solution c = c α of We define the normalized basis vector and its trajectory as respectively. From this definition, the trajectory is a subset of the (n − 1)-dimensional unit sphere: In particular, when X is a segment, γ Σ is a curve on the unit sphere. Let Vol 1 (·) denote the one-dimensional volume, that is, the length. Then, when c is large, the volume-of-tube method provides an approximation to the upper tail probability of the maximum in (2.2). Further, let χ 2 ν denote the chi-square random variable with ν degrees of freedom.
If we admit approximation (2.4), an approximate threshold c α can be determined from the equation This means that the smaller the value of Vol 1 (γ Σ ), the smaller is c α . The statement (ii) above is due to Naiman (1986). Alternative proofs of the inequality can be found in Johnstone and Siegmund (1989) and Takemura and Kuriki (2002). See Lu and Kuriki (2017) for a generalization of Naiman's inequality. By equating (2.5) to be α, we have a conservative threshold for the simultaneous confidence band.

Tube-volume criterion
From (2.5), we find that the smaller the value of Vol 1 (γ Σ ), the narrower is the width of the confidence band. In this subsection, we formalize the experimental design optimization problem of the allocation of explanatory variables to minimize Vol 1 (γ Σ ).
Here, we give our assumptions on f (x).
Assumption 2.2. f : X → R n is a continuous and piecewise C 1 -function. Image f (X ) spans R n .
From elementary geometry, the volume of γ Σ in (2.3) is given by We see that Vol 1 (γ M −1 (c) ) is not convex in c. 3 Tube-volume optimal design for polynomial and Fourier regressions
The polynomial regression is a regression model with basis vector Here, we set the domain X to be the whole real line R. For the polynomial regression, we assume the variance function of form σ 2 (x) = Q(x) n−1 , where Q(x) is an arbitrary positive quadratic function. As a canonical form of this class of variance functions, we use σ 2 P (x) = (1 + x 2 ) n−1 . (3.3) Later, we introduce a parameterization for Q(x) (see (3.20)).
In this subsection, we see that under the tube-volume criterion, the optimization problem for the Fourier regression is equivalent to that for the weighted polynomial regression. That is, the optimization problem in the Fourier regression can be translated to one in the weighted polynomial regression, and vice versa.
The model we discuss is a special case of the model proposed by Dette, et al. (1999), Section 2.2, who study the D-optimality. Further, Dette and Melas (2003) make use of the connection between the weighted polynomial and Fourier regressions. We will return briefly to question on the D-optimality in Section 3.5 below.
From the lemma below, the transformation x = tan(πt) connects the two regression models.
To prove that B is non-singular, consider the integral Here, we used (3.5). The left-hand side is the identity matrix I n by standard orthogonality. Hence, it is enough to check that the integrals in the parentheses of the right-hand side exist. The matrix in the parentheses of the right-hand side of (3.6) is ( which exists for i, j ≤ n. Hence, B is non-singular.
The set of information matrices for the polynomial regression is referred to as the moment cone (Karlin and Studden, 1966). The set of information matrices for the Fourier regression is given by The following lemma gives the equivalence of the Fourier regression and the polynomial regression as the optimization problem for the tube-volume criterion.
Theorem 3.2. Let Vol F (γ M −1 ) and Vol P (γ M −1 ) be the length of γ M −1 given in (2.6) with f (x) being f F (x) in (3.1), and f P (x) in (3.2), respectively. Then, it holds that Proof. The derivatives of f F (t) and f P (x) are denoted by g F (t) = df F (t)/dt and g P (x) = df P (x)/dx, respectively. Then, Theorem 3.2 and (3.9) imply that That is, the optimization problems for the polynomial regression and the Fourier regression are mathematically equivalent. For example, the information matrix for the Fourier regression, and the information matrix for the polynomial regression give the same volume. This equivalence is stated in terms of design measure as follows.
Theorem 3.3. The design t i p i 1≤i≤N for the Fourier regression, and the design x i p i 1≤i≤N , x i = tan(πt i ), for the weighted polynomial regression with variance function σ 2 P (x) give the same volume. If the former is tube-volume optimal in the Fourier regression, then so is the latter in the polynomial regression with variance σ 2 P (x), and vice versa.
In this paper, the (discrete) uniform designs in the Fourier regression and their counterparts in the polynomial regression play important roles. It is known that, in the Fourier regression, the uniform design in which x i are allocated as equally spaced with equal weights is D-optimal (Guest, 1958). Because of the symmetry, it is conjectured that the uniform design is the tube-volume optimal design as well. In Section 4, we prove that this is true for n = 3, and conjecture that it is true for all n.
The n-point discrete uniform design for the Fourier regression symmetric about the origin is t 0 (3.10) For later use, we provide the concrete forms of the information matrix M for the weighted polynomial designs with σ 2 (x) = σ 2 P (x) in (3.3), Lemma 3.4. The information matrix M = (M i,j ) of the weighted polynomial design (3.11) scaled such that M 1,1 = 1 is given by For the proof, see Appendix A.1. When n = 3 and 4, respectively.
The key transform connecting Fourier and polynomial regressions was the tangent transform x = tan(πt). For the same purpose, generalized transforms x = q tan(π(t − θ)) + r, q = 0, can be used. This is a composite map of the tangent transform and the Möbius transform to be discussed below.

The Möbius group action on the moment cone
In this subsection, we introduce the Möbius group (transformation) acting on the set of design measures and the set of information matrices in polynomial regression. We will show that the Möbius group action reduces the dimension of the minimization problem for the tube-volume criterion. For a recent paper in which the Möbius transformation acts on polynomials, see Mackey, et al (2015).
The real Möbius transformation is defined on the extended real numbers R = R ∪ {±∞} as follows: Here, we assume that This forms a group with product The identity element is e = ϕ(·; a, 0, 0, a), a = 0. This is a subgroup of the complex Möbius group referred to as projective general linear group PGL(2, C). Now, let f P (x) = (1, x, . . . , x n−1 ) ⊤ be the polynomial basis. We define an n × n matrix (3.14) We write the factor λ as λ(x; a, b, c, d) instead of λ(x; c, d) to clarify that this is an invariant function under the group action (3.13) in the sense that The proof is straightforward and omitted. When n = 3 and n = 4, is a representation of general linear group GL(2, R) and hence forms a group (Gross and Holman, 1980). The proof of the proposition below is straightforward and omitted.
Proof. We have the following relations: The results in the proposition follow by letting The sets of transformations {ϕ(·; ±s, ∓t, t, s) | s 2 + t 2 = 1} and {ϕ(·; q, r, 0, 1) | q = 0} form subgroups of the Möbius group, which are isomorphic to the orthogonal group O(2, R) and the affine group acting on R, respectively.
Theorem 3.9. Let A ∈ A and M ∈ M P be n × n matrices. Then, AMA ⊤ ∈ M P . Moreover, That is, group A acts on the moment cone M P .
The Möbius group action on the polynomial basis f (x) has been introduced by (3.14). Similarly, we define the Möbius group action on the variance function σ 2 (x) = Q(x) n−1 . This provides a parameterization for the variance function. Using (3.20) Note that σ 2 P (x) = σ 2 P (x; 1, 0, 0, 1). This is always positive because of ad − bc = 0. For as well as (3.15), we have The parameterization (3.20) with (a, b, c, d) is redundant, since Q(x) has only three parameters. The lemma below shows that the stabilizer keeping the variance σ 2 P (·; a, b, c, d) invariant is the orthogonal subgroup with dimension one.
This means that the Cauchy distribution family is closed under the Möbius transform (McCullagh, 1996). See also Kato and McCullagh (2014) for Cauchy families in directional statistics.

Canonical parameterizations for information matrices
As we have shown in Section 2.1, the optimal design problem is optimization with respect to matrix M over the set of information matrices M. Here, M = X f (x)f (x) ⊤ 1 σ 2 (x) dρ(x) ∈ M and the design measure ρ ∈ P is one-to-many. For the sake of optimization, we need to parameterize the set M.
We first consider M P in (3.8) for the polynomial regression, and then interpret the results in terms of M F in (3.9) for the Fourier regression.
The structure of the moment cone M P is well-studied in the context of the classical moment problem. One canonical parameterization for M P is given in Chapter II, Section 3 of Karlin and Studden (1966). The statement is summarized in Proposition 3.1 of Kato and Kuriki (2013).
The counterpart for the moment cone (3.9) for trigonometric functions is obtained using Lemma 3.1.
Theorem 3.14. Let t 0 ∈ (− 1 2 , 1 2 ] be fixed arbitrarily. M ∈ M F is uniquely represented with 2n − 1 parameters (w 0 , . . . , w n−1 , t 1 , . . . , t n−1 ) as A square matrix M = (m i,j ) is said to be Hankel if m i,j = m k,l when i + j = k + l. For example, matrices M in (3.12) and (4.1) are Hankel. Obviously, each M ∈ M P should be an n × n positive definite Hankel matrix. It is known that the converse is also true.
Proposition 3.15. The moment cone M P in (3.8) is characterized as For the proof, see (9.1) of Karlin and Studden (1966), p. 199. This also gives a unique representation of M P with 2n − 1 parameters (m 0 , m 1 , . . . , m 2n−2 ). Theorem 3.9 combined with Proposition 3.15 implies that group A acts on the cone of (positive definite) Hankel matrices. For the Möbius group action on Hankel matrices, see also Rost (1989, 2010).

Invariance under the Möbius group
In this subsection, we consider the polynomial regression. We formalized our optimal experimental design problem to find the minimizer M ∈ M P of Vol 1 (γ M −1 ) in (2.6).
Theorem 3.16. For M ∈ M P and A ∈ A, Theorem 3.16 and Theorem 3.9 imply that the minimizer of Vol 1 (γ M −1 ) with respect to M ∈ M P forms an orbit (or a union of orbits) on M P . ; a, b, c, d). Taking derivatives with respect to x, Therefore, By combining this with and dy =φ(x)dx, we have Theorem 3.17. The volumes of the weighted polynomial design x i p i 1≤i≤N with variance σ 2 P (x; a 0 , b 0 , c 0 , d 0 ), and the design Proof. In (3.18) of the proof of Theorem 3.9, let w i = p i /σ 2 P (x i ; a 0 , b 0 , c 0 , d 0 ). Then, by (3.21), .
This means that information matrices M 1 and M 2 of the two designs satisfy M 1 = AM 2 A ⊤ and hence have the same volume by Theorem 3.16.

D-optimal design for weighted polynomial regression
We characterize the D-optimal design for the weighted polynomial regression as an orbit of the Möbius group action. We start from the fact that in the Fourier regression, the uniform design is D-optimal.
Proposition 3.18 (Guest (1958)). In the Fourier regression with the basis (3.1), among the n-point discrete design, only the uniform design is D-optimal, where θ ∈ − 1 2n , 1 2n is an arbitrary constant. The information matrix at the optimal point is the identity I n .
Let M F be an information matrix of a Fourier design t i p i 1≤i≤n . By making a change of variables y i = tan(πt i ) and y i = ϕ(x i ; a, b, c, d), we have from (3.4), (3.14), and (3.19) that is the information matrix of the design x i p i 1≤i≤n for the weighted polynomial regression with variance function σ 2 P (x; a, b, c, d). Because det(M F ) = det(AB) 2 det(M P ), the Doptimal problem for searching optimal t i and p i in the Fourier regression are equivalent to searching for optimal x i and p i in the weighted polynomial regression. Hence, Proposition 3.18 is translated into the weighted polynomial regression as follows.
Theorem 3.19. In the weighted polynomial regression of degree n − 1 with variance σ 2 P (x; a 0 , b 0 , c 0 , d 0 ), among the n-point discrete design, only the design where t 0 i is given in (3.23), and s, t are arbitrary numbers such that s 2 + t 2 = 1. The information matrix at the Doptimal point is A Proof. Note that where s = cos(πθ), t = sin(πθ).
4 Tube-volume optimal design for n = 3 In the previous section, we discussed the Fourier regression and the polynomial regression having the basis of (3.1) and (3.2), respectively, of a general dimension n. In this section, we treat the case n = 3. This is the simplest non-trivial case, because when n = 2, Vol 1 (γ M −1 ) = 2π irrespective of M.
When n = 3, the problem is reduced to the minimization of The volume becomes an elliptic integral, which does not have an explicit expression in general. Moreover, the number of parameters to be optimized is four. (Note that the integrand h 1 (x)/h 0 (x) is a homogeneous function in m 0 , . . . , m 4 ). We will solve this minimization problem using the Möbius invariance.

Orbital decomposition
The Möbius group action defines an equivalent class on the moment cone M P . We define The orbit passing through M is denoted by The goal of this subsection is to prove the orbital decomposition of the polynomial moment cone M P . Let Theorem 4.1.
where ⊔ is the disjoint union.
Proof. This is a consequence of Lemmas 4.2 and 4.3 below.
Lemma 4.2. For any M ∈ M P and 0 The proofs of Lemmas 4.2 and 4.3 are given in Appendix A.2 and A.3, respectively. Note that the map v → 1 − v 1 + 3v defines a one-to-one correspondence between (1, 1/3) and (1/3, 1), and v = 1/3 is the fixed point of this map.
The stabilizer of A at M ∈ M P is defined as This is a subgroup of A.
Theorem 4.4. When v = 1/3, and when v = 1/3, In particular, 3 ). Proof. The proof follows from the proof of Lemma 4.3. The details are omitted.

Minimization over cross-section
From the orbital decomposition (Theorem 4.1) and the invariance of the volume on an orbit (Theorem 3.16), the optimization problem is reduced to the minimization of The range of v is taken to be (0, 1 3 ] or [ 1 3 , 1). We write Vol 1 (γ M −1 v ) = len(v) shortly. From the definition (2.6),

Note that len(v) is an elliptic integral. The following is the main theorem of this section.
Theorem 4.6. The minimizer of Vol 1 (γ M −1 ) in (2.6) over M ∈ M P is given if and only if M is in the orbit The minimum volume is 4π 2/3.
Proof. Because of Theorem 4.1, it is enough to take the range v ∈ (0, 1 3 ]. We use the inequality The equality holds iff z = 0. Noting that Therefore, len(v) is bounded below by This integral can be evaluated by counting the residues. When v < 1/3, the poles are and ±ix 0 = ±i. Denote the residues for +ix 1 , +ix 2 , and +ix 0 by Res(+ix 1 ), Res(+ix 2 ), and Res(+ix 0 ), respectively. Then, the integral is evaluated as The derivative is (the equality holds iff v = 1/3), the numerator of (4.4) is bounded below by 5 − 38v + 14v 2 + 18v 3 + 9v 4 + 48v which is positive for 0 < v < 1/3. Therefore, d dv len(v) < 0 for 0 < v < 1/3, and len(v) has the unique minimum at v = 1/3. Since s(x; v) ≥ s(x; v), Point v = 1/3 is the unique minimizer, because this is the unique minimizer of len(v). Figure 4.1 depicts the objective function len(v) and its lower bound len(v) for v ≤ 1/3. As shown in (3.12), the information matrix M 1/3 is the counterpart of the information matrix for the uniform design in the Fourier regression.
Recall the decomposition of A(a, b, c, d) in Proposition 3.8. We already know from Theorem 4.4 that, for s 2 + t 2 = 1,
Theorem 4.6 can be written in the following form.
Theorem 4.7. The minimizer of Vol 1 (γ M −1 ) in (2.6) over M ∈ M P is given when and only when M is of the form: Remark 4.8. The minimum tube-volume M ∈ M P is attained when and only when the curve forms a circle. Moreover, in that case, the circle length is 2π 2/3.
Finally, we characterize the tube-volume optimal design as a three-point design. The polynomial design corresponding to the Fourier uniform design is given in (3.12). The tube-volume optimal design is obtained as an orbit of the transformation passing through the design in (3.12). In the following, let (4.5) a three-point uniform design in the Fourier regression.
Theorem 4.9. In the weighted polynomial regression with variance function σ 2 P (x; a 0 , b 0 , c 0 , d 0 ), the three-point tube-volume optimal design is , where . (4.6) Here, t 0 i is defined in (4.5), a, b, c, d are arbitrarily given so that ad − bc = 0, and k > 0 is a constant so that p i = 1. The tube-volume optimal design includes the D-optimal designs as special cases where holds for some s 2 + t 2 = 1.
Theorem 4.10. In the Fourier regression, the three-point tube-volume optimal design is given as where k is a normalizing constant so that i p i = 1, q = 0, and r, θ are arbitrarily given. In particular, the uniform design (D-optimal design) is a tubevolume optimal design.

Numerical comparisons
Here we conduct a small numerical experiment to see the difference of the width of the simultaneous confidence band under optimal and non-optimal designs. The model we use is the Fourier regression f (x) = (1, √ 2 sin(2πx), √ 2 cos(2πx)) ⊤ , x ∈ X = (−1/2, 1/2] with the variance function σ 2 (x) = 1. Three designs Design D 1 is the D-optimal and tube-volume (TV-) optimal design with det(Σ) = 1 and Vol 1 (γ Σ ) = 4π 2/3 = 10.260. Design D 2 is TV-optimal, but is not D-optimal with det(Σ) = 27/8 = 3.375. Design D 3 is considered so that the value of det(Σ) is equal to that of D 2 , and is neither D-optimal nor TV-optimal. The value of D-criterion det(Σ), the tube-volume criterion Vol 1 (γ Σ ), and the width w α (α = 0.1, 0.05, 0.01) of 100(1 − α)% simultaneous confidence bands are listed in Table 4.3. w α is estimated by the upper α quantile of the random variable max x∈X | b ⊤ f (x) − b ⊤ f (x)| by simulation with 10,000 replications. As the theorems state, design D 1 has the shortest simultaneous confidence band. By comparing designs D 2 and D 3 having the same value of det(Σ), we see how the design affects the width of the simulations confidence bands.

Summary and remaining problems
In this paper, we have proposed the tube-volume (TV) criterion Vol 1 (γ Σ ) in (2.6) in experimental design. If a design is tube-volume optimal and simultaneously, D-optimal minimizing max x∈X f (x) ⊤ Σ −1 f (x), the design is optimal that attains the minimum bandwidth of simultaneous confidence bands.
Then, the proposed criterion was applied to Fourier regression model that is a standard model in linear optimal design theory, and weighted polynomial regression model that is mathematically equivalent to the Fourier regression model. The Möbius group keeps the tube-volume criterion invariant, whereas the subgroup O(2, R) of the Möbius group keeps the D-criterion invariant.
Using the Möbius invariance, when n = 3, we found that the tube-volume optimal designs in the Fourier regression and the weighted polynomial regression form an orbit of the Möbius group. The tube-volume optimal designs contain D-optimal designs as special cases. This means that in the Fourier regression, the uniform design is a universal optimal design minimizing both tube-volume criterion and D-criterion.
We conjecture that for all n, the tube-volume optimal design is characterized as an orbit of the Möbius group containing D-optimal designs. One supporting observation is that for small n (n ≤ 6), tube-volume local optimality at the D-optimal designs can be proved by direct calculations. That is, the Hessian matrix of the tube-volume criterion evaluated at the D-optimal design is positive semi-definite, and the null space of the Hessian matrix corresponds to the tangent space of the orbit of Möbius group action. However, the proof for general n remains outstanding.
Throughout the paper, we just dealt with the case where the explanatory variable is one-dimensional. However, the volume-of-tube method works for the construction of the simultaneous confidence bands for regression with multidimensional explanatory variables except for the conservativeness (ii) of Proposition 2.1, and the volume-optimality is welldefined. For example, we can discuss the volume-optimality of the p-variate polynomial regression model with the basis vector By the same argument as the univariate case, we can prove that the multivariate Möbius transform ϕ : R p → R p defined by Kato and McCullagh (2014)) remains the invariance (volume preserving property) Vol p (γ M −1 ) = Vol p (γ (AM A ⊤ ) −1 ) of Theorem 3.16. However, the treatment of the multidimensional case (see, e.g., Lasserre (2009) for the moment cone) remains a future topic of research.

A Appendix: Proofs
A.1 Proof of Lemma 3.4 Proof. Let t k = k/n − (n + 1)/(2n) and x k = tan(πt k ). The (i, j) element of M is Then, apply Lemma A.1 below.

A.2 Proof of Lemma 4.2
The proof of Lemma 4.2 is divided into three parts (lemmas).
Lemma A.2. For M ∈ M P , there exist u, w > 1 such that Proof of Lemma A.2. We start from a canonical form in (3.22): We confirm that equation AMA ⊤ = M has a solution (a, b, c, d) such that ad − bc = 0. It is enough to show that under the assumption a, d = 0, a solution (a, b, c, d) satisfies (For resultant, see, e.g., Prasolov (2004). Applications in statistics can be found in Drton, et al. (2009).) As shown in Lemma A.5 later, in the region u, v > 1, h(u, n) = 0 iff u = 2. Therefore, when u = 2, we have established that x = x * satisfying (A.5) exists, and hence, a solution ( Proof. Fix u and consider h(u, v) as a function of v. Note that h(u, 1) = (u − 1) 3 > 0.