Intrinsic and extrinsic deep learning on manifolds

We propose extrinsic and intrinsic deep neural network architectures as general frameworks for deep learning on manifolds. Specifically, extrinsic deep neural networks (eDNNs) preserve geometric features on manifolds by utilizing an equivariant embedding from the manifold to its image in the Euclidean space. Moreover, intrinsic deep neural networks (iDNNs) incorporate the underlying intrinsic geometry of manifolds via exponential and log maps with respect to a Riemannian structure. Consequently, we prove that the empirical risk of the empirical risk minimizers (ERM) of eDNNs and iDNNs converge in optimal rates. Overall, The eDNNs framework is simple and easy to compute, while the iDNNs framework is accurate and fast converging. To demonstrate the utilities of our framework, various simulation studies, and real data analyses are presented with eDNNs and iDNNs.


Introduction
The last two decades have witnessed an explosive development in deep learning approaches.These approaches have achieved breakthrough performance in a broad range of learning problems from a variety of applications fields such as imaging recognition [29], speech recognition [15], natural language processing [2] and other areas of computer vision [41].Deep learning has also served as the main impetus for the advancement of recent artificial intelligence (AI) technologies.This unprecedented success has been made possible due to the increasing computational prowess, availability of large data sets, and the development of efficient computational algorithms for training deep neural networks.There have been increasing efforts to understand the theoretical foundations of deep neural networks, including in the statistics community [37,34,25,3,38,27,10].
Most of these efforts from model and algorithmic development to theoretical understanding, however, have been largely focused on the Euclidean domains.In a wide range of problems arising in computer and machine vision, medical imaging, network science, recommender systems, computer graphics, and so on, one often encounters learning problems concerned with non-Euclidean data, particularly manifold-valued data.For example, in neuroscience, data collected in diffusion tensor imaging (DTI), now a powerful tool in neuroimaging for clinical trials, are represented by the diffusion matrices, which are 3 × 3 positive definite matrices [1].In engineering and machine learning, pictures or images are often preprocessed or reduced to a collection of subspaces with each data point (an image) in the sample data represented by a subspace [16,39].In machine vision, a digital image can also be represented by a set of k-landmarks, the collection of which form landmark-based shape spaces [24].One may also encounter data that are stored as orthonormal frames [8], surfaces, curves, and networks [28].The underlying space where these general objects belong falls in the general category of manifolds whose geometry is generally well-characterized, which should be utilized and incorporated for learning and inference.Thus, there is a natural need and motivation for developing deep neural network models over manifolds.This work aims to develop general deep neural network architectures on manifolds and take some steps toward understanding their theoretical foundations.The key challenge lies in incorporating the underlying geometry and structure of manifolds in designing deep neural networks.Although some recent works propose deep neural networks for specific manifolds [42,14,21,22], there is a lack of general frameworks or paradigms that work for arbitrary manifolds.In addition, the theoretical understanding of deep neural networks on manifolds remains largely unexplored.To fill in these gaps, in this work, we make the following contributions: (1) we develop extrinsic deep neural networks (eDNNs) on manifolds to generalize the popular feedforward networks in the Euclidean space to manifolds via equivariant embeddings.The extrinsic framework is conceptually simple and computationally easy and works for general manifolds where nice embeddings such as emquivariant embeddings are available; (2) we develop intrinsic deep neural networks (iDNNs) for deep learning networks on manifolds employing a Riemannian structure of the manifold; (3) we study theoretical properties such as approximation properties and estimation error of both eDNNs and iDNNs, and (4) we implement various DNNs over a large class of manifolds under simulations and real datasets, including eDNNs, iDNNs and tangential deep neural networks (tDNNs), which is a special case of iDNNs with only one tangent space.
The rest of the paper is organized as follows.In Section 2, we introduce the eDNNs on manifolds and study their theoretical properties.In Section 3, we propose the iDNNs on manifolds that take into account the intrinsic geometry of the manifold.The simulation study and the real data analysis are carried out in Section 4. Our work ends with a discussion.

eDNNs and equivariant embeddings
Let M be a d-dimensional manifold.Let (x i , y i ), i = 1, . . ., n be a sample of data from some regression model with input x i ∈ X = M and output y i ∈ Y = R, and we propose deep neural networks for learning the underlying function f : M → R. The output space can be Y = {1, . . ., k} for a classification problem.In this work, we propose to develop two general deep neural network architectures on manifolds based on an extrinsic and an intrinsic framework, respectively.The first framework employs an equivariant embedding of a manifold into the Euclidean space and builds a deep neural network on its image after embedding, which is the focus of this section, while the intrinsic framework utilizes Riemannian or intrinsic geometry of the manifold for designing the deep neural networks (Section 3).Our initial focus will be on proposing appropriate analogs of feed-forward neural networks on manifolds which are popular DNNs in the Euclidean space and suitable objects for theoretical analysis.The theoretical properties of the proposed geometric DNNs will be studied.
Before describing our proposed frameworks, we introduce our mathematical definition of DNNs and related classes.A DNN f with depth L and a width vector p = (p 0 , where A l : R p l−1 → R p l is an affine linear map defined by A l (x) = W l x + b l for p l × p l−1 weight matrix W l and p l dimensional bias vector b l , and σ l : R p l → R p l is an element-wise nonlinear activation map with the ReLU activation function σ(z) = max{0, z} as a popular choice.We referred to the maximum value max j=1,...,L p j of the width vector as the width of the DNN.We denote θ as the collection of all weight matrices and bias vectors: , the parameters of the DNN.Moreover, we denote by θ 0 the number of non-zero parameter values (i.e., the sparsity) and by θ ∞ the maximum of parameters.We denote by F(L, (p 0 ∼ P ∼ p L+1 ), S, B) the class of DNNs with depth L, input dimension p 0 , width P , output dimension p L+1 , sparsity S and the maximum of parameters B. For simplicity, if the input and output dimensions are clear in the context, we write F(L, P, S, B) = F(L, (p 0 ∼ P ∼ p L+1 ), S, B).
Let J : M → R D be an embedding of M into some higher dimensional Euclidean space R D (D ≥ d) and denote the image of the embedding as M = J(M ).By definition of an embedding, J is a smooth map such that its differential dJ : T x M → T J(x) R D at each point x ∈ M is an injective map from its tangent space T x M to T J(x) R D , and J is a homeomorphism between M and its image M .Our idea of building an extrinsic DNN on manifold relies on building a DNN on the image of the manifold after the embedding.The geometry of the manifold of M can be well-preserved with a good choice of embedding, such as an equivariant embedding which will be defined rigorously in Remark 2.2 below.The extrinsic framework has been adopted for the estimation of Fréchet means [5], regression on manifolds [31], and construction of Gaussian processes on manifolds [30], which have enjoyed some notable features such as ease of computations and accurate estimations.
The key idea of proposing an extrinsic feedforward neural network on a manifold M is to build a one-to-one version of its image after the embedding.More specially, we say that f is an extrinsic deep neural network (eDNN) if f is of the form with a DNN f .We denote the eDNN class induced by F(L, P, S, B) as The extrinsic framework is very general and works for any manifold where a good embedding, such as an equivariant embedding, is available.Under this framework, training algorithms in the Euclidean space, such as the stochastic gradient descent (SGD) with backpropagation algorithms, can be utilized working with the data (J(x i ), y i ), i = 1, . . ., n, with the only additional computation burden potentially induced from working higher-dimensional ambiance space.In our simulation Section 4, the extrinsic DNN yields better accuracy than the Naive Bayes classifier, kernel SVM, logistic regression classifier, and the random forester classifier for the planar shape datasets.Due to its simplicity and generality, there is a potential for applying eDNNs in medical imaging and machine vision for broader scientific impacts.
Remark 2.1.In [36] and [6], a feedforward neural network was used for nonparametric regression on a lowerdimensional submanifold embedded in some higher-dimensional ambient space.It showed that with appropriate conditions on the neural network structures, the convergence rates of the ERM would depend on the dimension of the submanifold d instead of the dimension of the ambient space D. In their framework, they assume the geometry of the submanifold is unknown.From a conceptual point of view, our extrinsic framework can be viewed as a special case of theirs by ignoring the underlying geometry.In this case, the image of the manifold M = J(M ) can be viewed as a submanifold in R D , so their results follow.On the other hand, our embedding framework allows us to work with very complicated manifolds, such as the quotient manifolds for which no natural ambient coordinates are available.An example is the planar shape which is the quotient of a typically high-dimensional sphere consisting of orbits of equivalent classes, with the submanifold structure only arising after the embedding.And such an embedding is typically not isometric.
In [6], the charts were constructed by intersecting small balls in R D with the submanifold M .In our case, we provide explicit charts of the submanifold based on the knowledge of the geometry of the original manifold M and the embedding map J that works with the ambient coordinates in R D .
Remark 2.2.One of the essential steps in employing an eDNN is the choice of the embedding J, which is generally not unique.It is desirable to have an embedding that preserves as much geometry as possible.An equivariant embedding is one type of embedding that preserves a substantial amount of geometry.Figure 1 provides a visual illustration of equivariant embedding.Suppose M admits an action of a (usually 'large') Lie group H. Then we say that J is an equivariant embedding if we can find a Lie group homomorphism φ : H → GL(D, R) from H to the general linear group GL(D, R) of degree D acting on M such that for any h ∈ H and p ∈ M .The definition seems technical at first sight.However, the intuition is clear.If a large group H acts on manifolds such as by rotation before embedding, such an action can be preserved via φ on the image M , thus potentially preserving many of the geometric features of M , such as its symmetries.Therefore, the embedding is geometry-preserving in this sense.For the case of the planar shape, which is a collection of shapes consisting of k-landmarks modular Euclidean motions such as rotation, scaling, and translation, which is a quotient manifold of a sphere of dimension S 2k−3 , and the embedding can be given by the Veronese-whitnning embedding which is equivariant under the special unitary group.Another example that's less abstract to understand is the manifold of symmetric positive definite matrices whose embedding can be given as the log map (the matrix log function) into the space of symmetric matrices, and this embedding is equivariant with respect to the group action of the general linear group via the conjugation group action.See Section 4 for some concrete examples of equivariant embeddings for well-known manifolds, such as the space of the sphere, symmetric positive definite matrices, and planar shapes.

Approximation analysis for eDNNs
In this section, we study the ability of the eDNN class in approximating an appropriate smooth class of functions on manifolds.First, we define the ball of β-Hölder functions on a set U ∈ R D with radius K as denotes the β-Hölder norm defined as Here, ∂ m f denotes the partial derivative of f of order m and N 0 := N ∪ {0}.To facilitate smooth function approximation on manifolds, following [36], we impose an additional smooth assumption on local coordinates which project inputs in an ambient space to a lower dimensional space.
The next theorem reveals the approximation ability of the eDNN class.For a measure of approximation, we consider the sup norm defined as smooth local coordinates, we can apply Theorem 2 in [36], there exists a network f ∈ F(L, Therefore, we get the desired result.

Statistical risk analysis for eDNNs
In this section, we study the statistical risk of the empirical risk minimizer (ERM) based on the eDNN class.We assume the following regression model A natural question to ask is whether the ERM type of estimators such as fn defined above achieve minimax optimal estimation of β-Hölder smooth functions on manifolds, in terms of the excess risk where the expectation is taken over the random variable x ∼ P x .Theorem 2. Assume the model ( 3) with a d-dimensional compact manifold M ⊂ R D and an embedding map Moreover, assume that J(M ) has smooth local coordinates.Then the ERM estimator feDNN over the eDNN class F eDN N (L, P, S, B = 1) in ( 4) with L log(n), n P n d/(2β+d) and S n d/(2β+d) log n satisfies Proof.For any f1 , f2 ∈ F(L, P, S, B = 1), we have f2 Hence the entropy of the eDNN class F eDN N (L, P, S, B = 1) is bounded by that of F(L, P, S, B = 1).Thus, by Lemmas 4 and 5 of [37], we have Therefore, by Theorem 1, if we take L, P and S as in the theorem, we get the desired result.

The iDNN architectures on a Riemannian manifold
Despite the generality and computational advantage enjoyed by eDNNs on manifolds proposed in the previous section, one potential drawback is that an embedding is not always available on complex manifolds such as some intrinsic structure spatial domains.In this section, we propose a class of intrinsic DNNs on manifolds (iDNNs) by employing the intrinsic geometry of a manifold to utilize its exponential and log maps with respect to a Riemannian structure.Some works construct a DNN on the manifold via mapping the points on the manifold to a single tangent space (e.g., with respect to some central points of the data) or proposing DNNs on specific manifolds, in particular, matrix manifolds [19,14].Using a DNN on a single tangent space approximation cannot provide a good approximation of a function on the whole manifold.Below we provide a rigorous framework for providing a local approximation of a function on a Riemannian manifold via Riemannian exponential and logarithm maps and thoroughly investigate their theoretical properties.
The key ideas here are to first cover the manifold with images of the subset of tangent spaces U 1 , . . ., U k under the exponential map, approximate a local function over the tangent space using DNNs, which are then patched together via Y.F et al the transition map and a partition of unity on the Riemannian manifold.Specifically, let {x 1 , . . ., x K ∈ M } be a finite set of points, such that for an open set of subsets U k ⊂ T x k M with k = 1, . . ., K, one has Namely, one has exp x k (U k ), exp x k , k = 1 . . ., K as the charts of the manifold M .
For each k = 1, . . ., K one has orthonormal basis v k1 , . . ., v kd ∈ T x k M and respectively the normal coordinates of Thus The normal coordinate allows one to perform elementwise non-linear activation to tangent vectors easily.For example, any 1 ≤ k < l ≤ K one has the transition map on A compact manifold M always admits a finite partition of unity , and for every x ∈ M there is a neighbourhood of x where all but a finite number of functions are 0 (e.g., Proposition 13.9 of [40]).Therefore, for each function f : M → R, we can write As a result, one can model the compositions , for which we propose to use DNN.This idea gives rise to our iDNN architecture . Figure 2 illustrates the core ideas of the iDNN architecture.Given a set of points {x 1 , . . ., x K } ⊂ M , we define the iDNN class with depth L, width P , sparsity S and the maximum of parameters B as

Approximation analysis for iDNNs
In this section, we investigate the approximation theory for the iDNN for smooth functions on manifolds.Theorem 3. Let M ⊂ R D be a d-dimensional compact manifold.Assume that exp x k ∈ C γ D (U k ) for γ > β for every k = 1, . . ., K. Then there exist positive constants c 1 , c 2 and c 3 depending only on D, d, β, K and the surface area of M such that for any η ∈ (1, 0), Proof.We construct a DNN approximating f 0k = f 0 • exp x k for each k = 1, . . ., K. Note that f 0k is β-Hölder smooth by assumption.Therefore, by Theorem 1 in [36], there exist DNNs . ., K on the manifold M , the input data X is mapped to the kth chart U k after the log map log x k (.).Afterward, the transformed data is fed into the DNN f k on each chart k.The final prediction Y is given by the partition of unity τ (.) Remark 3.1.[36] and [6] propose feedforward neural networks on a manifold that's embedded in a higher-dimensional Euclidean space.In the approximation theory of [36] and [6], they utilize local charts and partition of unities, but due to the unknown geometry of the manifold, they need to use DNNs to approximate the local charts ψ j s, the partition of unities functions as well as the mappings f • ψ −1 j .Under our iDNN framework, we utilize the Riemmanian geometry of the manifold and the log map.Further, the partition of utilities functions can be constructed so there is no need to approximate them with DNNs.

Statistical risk analysis for iDNNs
In this section, we study the statistical risk of the ERM over the iDNN class given by fiDNN = argmin f ∈F iDN N (L,P,S,B) for the nonparametric regression model (3) where the true function f 0 is β-Hölder smooth on a manifold.The following theorem shows that the iDNN estimator attains the optimal rate.We omit the proof since it is almost the same as the proof of Theorem 2 except using the approximation result for the iDNN class given in Theorem 3. and S n d/(2β+d) log n satisfies

Y.F et al
Therefore, the entropy of F iDN N (L, P, S, B) is bounded by the K-times of the entropy of the class F(L, P, S, B).So by the same way as in the proof of Theorem 2, we get the desired.

Simulations study and real data analysis
Applications will illustrate the practical impact and utilities of our methods to simulated data sets and some important real data sets, such as in the context of the AFEW database, HDM95 database, the ADHD-200 dataset, an HIV study, and others.The proposed eDNNs, tDNNs, and iDNNs will be applied to learning problems such as regression and classification on various manifolds, including the sphere, the planar shapes, and the manifold of symmetric positive definite matrices, which are the most popular classes of manifolds encountered in medical diagnostics using medical imaging and image classification in digital imaging analysis.For the eDNN models, we list explicit embeddings below and the corresponding lie groups that act on them equivariantly.For the iDNN models, we elaborate the exponential map and inverse-exponential (log) map on those manifolds.As mentioned before, the tDNN model is the special case of the iDNN model when K = 1, which utilizes the exponential map and inverse-exponential map as well.

Sphere
One of the simplest manifolds of interest is the sphere in particular in directional statistics and spatial statistics [12,32,11,23,18].Statistical analysis of data from the two-dimensional sphere S 2 , often called directional statistics, has a fairly long history [12,32,11].Modeling on the sphere has also received recent attention due to applications in spatial statistics, for example, global models for climate or satellite data [23,18].
To build the eDNN on the sphere, first note that S d is a submanifold of R d+1 , so that the inclusion map J serves as a natural embedding of S d into R d+1 .It is easy to check that J is an equivariant embedding with respect to the Lie group H = SO(d + 1), the group of d + 1 by d + 1 special orthogonal matrices.Intuitively speaking, this embedding preserves a lot of symmetries of the sphere.On the other hand, one can use the geodesics (in this case, the big circles on the sphere) for which the closed-form exponential map and inverse-exponential map are available to construct the iDNN model.Furthermore, given the base points 2 by utilizing the bump function on the sphere.
In this simulation study, we consider the classification problem in terms of the Von Mises-Fisher distribution (MF) on the sphere S 2 , which has the following density: where κ is a concentration parameter with µ a location parameter.Then we simulate the data from K different classes on the sphere S d via a mixture of MF as: Here x ij is the jth sample from ith class, µ i is the mean for the ith class, and κ is the dispersion for all classes.We first generated 10 means u i1 , ..., u i10 from the MF distribution for ith class.Then for each class, we generated N observations as follows: for each observation x ij , we randomly picked m ij from u k1 , ..., u k10 with probability 1/10, and then generated a MF(m ij , κ 2 ), thus leading to a mixture of MF distribution.Moreover, κ 1 controls the dispersion of the intermediate variable m ij while κ 2 controls the dispersion of observations x ij .Figure 3 shows observations from the mixture model on the sphere under different dispersions.
In the following simulation, we follow the mixture model on the hyper-sphere S 2 , S 10 , S 50 with K = 2, N = 2000, κ 1 = 4, κ 2 = 20 and divide the data into 75 percent training set and 25 percent test set.We repeat this split 50 times.
Then we compare the eDNN, tDNN, iDNN models to other competing estimators via the classification accuracy on the test set in Table 1.
For competing estimators, we consider the k-nearest neighbors (kNN), the random forest (RF), the logistic regression (LR), and the support vector machine (SVM) with the radial basis function (RBF) kernel.The tuning parameters in each method are selected by evaluation on a validation data set whose size is 25% of the training set.
For all DNN models, we apply a network architecture of 5 hidden layers with the numbers of widths (100, 100, 100, 100, 100).The DNN model is the same as the eDNN model on Euclidean since the embedding map from the sphere to the higher Euclidean space is the identity map.In the tDNN model, we consider the F rechet  mean of the training set as the base point and transform all data in the batch to tangent vectors before feeding to the neural network.In the iDNN model, we consider the north and south poles (±1, 0, .., 0) as base points and use the neural network with the same structure for all tangent spaces.All models are trained with Adam optimizer [26].As shown in Table 1, our tDNN model and iDNN model outperform other competing estimators.Specifically, our tDNN models achieve the best accuracy 94.88 ± 0.53 and 97.13 ± 0.39 in the low dimensional cases.Our iDNN models obtained the best result 80.72 ± 0.94 and 68.43 ± 1.20 in the high dimensional spaces.
, be a set of k landmarks.The planar shape Σ k 2 is the collection of z's modulo under the Euclidean motions, including translation, scaling, and rotation.One has Σ k 2 = S 2k−3 /SO(2), the quotient of sphere by the action of SO(2) (or the rotation), the group of 2 × 2 special orthogonal matrices; A point in Σ k 2 can be identified as the orbit of some u ∈ S 2k−3 , which we denote as σ(z).Viewing z as a vector of complex numbers, one can embed Σ k 2 into S(k, C), the space of k × k complex Hermitian matrices, via the Veronese-Whitney embedding (see, e.g., [4]): One can verify that J is equivariant (see [24]) with respect to the Lie group with its action on Σ k 2 induced by left multiplication.We consider a planar shape data set, which involves measurements of a group of typically developing children and a group of children suffering the ADHD (Attention deficit hyperactivity disorder).ADHD is one of the most common Y.F et al psychiatric disorders for children that can continue through adolescence and adulthood.Symptoms include difficulty staying focused and paying attention, difficulty controlling behavior, and hyperactivity (over-activity).In general, ADHD has three subtypes: (1) ADHD hyperactive-impulsive, (2) ADHD-inattentive, (3) Combined hyperactive-impulsive and inattentive (ADHD-combined).ADHD-200 Dataset (http://fcon_1000.projects.nitrc.org/indi/adhd200/) is a data set that records both anatomical and resting-state functional MRI data of 776 labeled subjects across 8 independent imaging sites, 491 of which were obtained from typically developing individuals and 285 in children and adolescents with ADHD (ages: 7-21 years old).The planar Corpus Callosum shape data are extracted, with 50 landmarks on the contour of the Corpus Callosum of each subject (see [17]).See Figure 4 for a plot of the raw landmarks of a normal developing child and an ADHD child) After quality control, 647 CC shape data out of 776 subjects were obtained, which included 404 (n 1 ) typically developing children, 150 (n 2 ) diagnosed with ADHD-Combined, 8 (n 3 ) diagnosed with ADHD-Hyperactive-Impulsive, and 85 (n 4 ) diagnosed with ADHD-Inattentive.Therefore, the data lie in the space Σ 50  2 , which has a high dimension of 2 × 50 − 4 = 96.The tuning parameters in each method are selected by evaluation on a validation data set whose size is 25% of the training set.For all DNN models, we utilize the same network architecture of 5 hidden layers with the numbers of width (100, 100, 100, 100, 100).The DNN model is applied to the raw data, while the eDNN model is applied to the embedded data by Veronese-Whitney embedding.And the preshape data (normalized raw data) lying in the hyperspere S 100 is used for the tDNN model and iDNN model.In the iDNN model, we chose the north pole and south pole (±1, 0, .., 0) as base points and utilized the geometry of the hypersphere as before.In the tDNN model, we pick the F rechet mean of the training set as the base point and transform all data in a batch to tangent vectors before feeding to the neural network.All models are trained with Adam optimizer.The competition results can be observed in Table 3.
Our tDNN model achieves the best accuracy at 65.84 ± 3.10 among 50 splits in the 2 classes case.Also, our iDNN model showed the best result of 63.55 ± 3.80 in the 4 classes case.Covariance matrices are ubiquitous and attractive in machine learning applications due to their capacity to capture the structure inside the data.The main challenge is to take the particular geometry of the Riemannian manifold of symmetric positive definite (SPD) matrices into consideration.The space SPD(d) of all d × d positive definite matrices belongs to an important class of manifolds that possesses particular geometric structures, which should be taken into account for building the DNNs.[13] investigates its Riemannian structure and provides somewhat concrete forms of all its geometric quantities.[9] studies different notions of means and averages in SPD(3) with respect to different distance metrics and considers applications to DTI data and covariance matrices.
Under the Riemannian framework of tensor computing [35], several metrics play an important role in machine learning on SPD matrices.Generally, the Riemannian distance d(P 1 , P 2 ) between two points P 1 and P 2 on the manifold is defined as the length of the geodesic γ P1→P2 , i.e., the shortest parameterized curve connecting them.In the SPD manifold, the distance under the affine metric could be computed as [35]: .
Other important natural mappings to and from the manifold and its tangent bundle are the logarithmic mapping Log P0 and the exponential mapping Exp P0 at the point P 0 .Under the affine metric, those two mappings are known in closed form: where T P0 denotes the tangent space at P 0 .Furthermore, we consider the log map on the matrix as the embedding J, mapping SPD(d) to Sym(d), the space of the symmetric matrix.For example, let P ∈ SPD(d) with a spectral decomposition P (l) = U ΣU T , we have the log-map of A as log(P ) = U log(Σ)U T where log(Σ) denotes the diagonal matrix whose diagonal entries are the logarithms of the diagonal entries of Σ.Moreover, the embedding J is a diffeomorphism, equivariant with respect to the actions of GL(d, R), the d by d general linear group.That is, for H ∈ GL(d, R), we have log(HP H T ) = H log(P )H −1 .
In the context of deep networks on SPD, we build up our model in terms of SPDNet introduced by [20], which mimicked the classical neural networks with the stage of computing an invariant representation of the input data points and a second stage devoted to performing the final classification.The SPDNet exploited the geometry based on threefold layers: The BiMap (bilinear transformation) layer, analogous to the usual dense layer; the induced dimension reduction eases the computational burden often found in learning algorithms on SPD data: X (l) = W (l) T P (l−1) W (l) with W (l) semi-orthogonal.
The ReEig (rectified eigenvalues activation) layer, analogous to the ReLU activation, can also be seen as an Eigenregularization, protecting the matrices from degeneracy: X (l) = U (l) max Σ (l) , I n U (l) T , with P (l) = U (l) Σ (l) U (l) T .
Under our framework, the SPDNet is both an eDNN and a tDNN model.The LogEig layer applies the logarithmic mapping log I (P ) = vec U (l) log Σ (l) U (l) T , which is identical to the transformation in the LogEig layer.Thus, SPDNet can also be viewed as a tDNN model.In our experiments, we only consider tDNN models as one tangent space from the base point is sufficient to cover the entire manifold.Our eDNN models on SPD(p) consist of 3 BiMap layers, 3 ReEig layers, one LogEig layer (for embedding), and a 5-layer DNN with 100 hidden nodes per layer.In tDNN models, we replace the LogEig layer with the intrinsic logarithmic mapping under different metrics.
In our experiments, we evaluate the performance of tDNN and eDNN models on the AFEW and HDM05 datasets using the same setup and protocol as in [20].The AFEW dataset [7] includes 600 video clips with per-frame annotations of valence and arousal levels and 68 facial landmarks, depicting 7 classes of emotions.The HDM05 dataset [33] contains over three hours of motion capture data in C3D and ASF/AMC formats, covering more than 70 motion classes across multiple actors.We divide the data into a 75-25 percent training-test split, with 10 repetitions, and use the validation set (25 percent of training data) to tune hyperparameters.We implement tDNN models on both affine metrics and log-Euclidean metrics, using the Frechet mean of the batch as the base point.As shown in Table 4, our tDNN model under the Log-Euclidean metric achieves the best results on both datasets, with a 35.85 ± 1.49 accuracy on the AFEW dataset and 62.59 ± 1.35 accuracy on the HDM05 dataset.Table 4: The accuracy of the test set was reported.We follow the setup and protocols in [20]

Discussion
In this work, we develop intrinsic and extrinsic deep neural network architectures on manifolds and characterize their theoretical properties in terms of approximation error and statistical error of the ERM based estimator.The neural networks explore the underlying geometry of the manifolds for learning and inference.Future work will be focused on developing convolutional neural networks in manifolds for image classifications of manifold-values images, which have abundant applications in medical imaging and computer vision.

Figure 1 :
Figure 1: An simple illustration of equivariant embeddings manifold and J : M → R D be an embedding map.Assume that J(M ) has smooth local coordinates.Then there exist positive constants c 1 , c 2 and c 3 depending only on D, d, β, K and the surface area of M such that for any η ∈ (1, 0),

Theorem 4 .
Assume the model (3) with a d-dimensional compact manifold M isometrically embedded in R D .Then the ERM estimator fiDNN over the iDNN class F iDN N (L, P, S, B = 1) in (7) with L log(n), n P n d/(2β+d)

Figure 3 :
Figure 3: Observations for K = 2 classes from the mixture MF distribution, N = 100.The nonlinear boundary between the two classes becomes hard to see with bare eyes due to the surging variance of the data as the κ 1 , κ 2 dropping, which makes the classification problem harder.
4 different classes.We also divided the dataset into a 75 percent training set and a 25 percent test set and evaluated the classification accuracy in the test set compared to other learning methods.Since the sample size is unbalanced, the total number of some classes is too small, i.e., ADHD-Hyperactive case.We also considered the classification with two classes by combing those ADHD samples into one class shown in the right figure inFigure 4.
(a) Mean shapes of different classes

Figure 4 :
Figure 4: CC shapes n, where x 1 , . . ., x n ∈ M are i.i.d inputs following a distribution P x on the manifold and 1 , . . ., n are i.i.d.sub-Gaussian errors.We consider the ERM over the eDNN class such that f ∈F eDN N (L,P,S,B)

Table 1 :
The test accuracy is calculated over 50 random split.The 5-layers network (with 100 hidden nodes in each layer) is used for our DNN models in all experiments.Our tDNN model achieved the best result when the dimension was low S 2 , S 10 , while our iDNN is the best in high-dimension cases (S 50 , S 100 ).Moreover, our tDNN, iDNN models show better accuracy than the classical DNN, especially in high-dimensional cases.

Table 2 :
Demographic information about processed ADHD-200 CC shape dataset, including disease status, age, and gender.As shown in the table 2, we consider the classification problem with

Table 3 :
The average accuracy on the test dataset is calculated over 50 random splits.The 5-layers network (with 100 hidden nodes in each layer) is used for our DNN models in all experiments.Consequently, our tDNN model obtains the best accuracy in the 2 classes case while our iDNN model achieves the best accuracy in the 4 classes case.Furthermore, all our eDNN, tDNN and iDNN models outperform the classical DNN model, indicating the advantages of our frameworks.
and our tDNN models outperform the eDNN (SPDNet) under both log and affine metrics.