Flexible Bayesian Dynamic Modeling of Correlation and Covariance Matrices

Modeling correlation (and covariance) matrices can be challenging due to the positive-definiteness constraint and potential high-dimensionality. Our approach is to decompose the covariance matrix into the correlation and variance matrices and propose a novel Bayesian framework based on modeling the correlations as products of unit vectors. By specifying a wide range of distributions on a sphere (e.g. the squared-Dirichlet distribution), the proposed approach induces flexible prior distributions for covariance matrices (that go beyond the commonly used inverse-Wishart prior). For modeling real-life spatio-temporal processes with complex dependence structures, we extend our method to dynamic cases and introduce unit-vector Gaussian process priors in order to capture the evolution of correlation among components of a multivariate time series. To handle the intractability of the resulting posterior, we introduce the adaptive $\Delta$-Spherical Hamiltonian Monte Carlo. We demonstrate the validity and flexibility of our proposed framework in a simulation study of periodic processes and an analysis of rat's local field potential activity in a complex sequence memory task.


INTRODUCTION
Modeling covariance matrices-or more broadly, positive definite (PD) matrices-is one of the most fundamental problems in statistics.In general, the task is difficult because the number of parameters grows quadratically with the dimension of the matrices.The complexity of the challenge increases substantially if we allow dependencies to vary over time (or space) in order to account for the dynamic (non-stationary) nature of the underlying probability model.In this paper, we propose a novel solution to the problem by developing a flexible and yet computationally efficient inferential framework for both fixed and dynamic covariance matrices.
While our proposed method in this paper is also based on the separation strategy (Barnard et al., 2000) and the Cholesky decomposition, unlike the existing methods it represents each entry of the correlation matrix as a product of unit vectors.This in turn provides a flexible framework for modeling covariance matrices without sacrificing interpretability.Additionally, this framework can be easily extended to dynamic settings in order to model real-life spatiotemporal processes with complex dependence structures (e.g., brain signals during cognitive tasks).This proposed model has the potential impact to model dynamic brain connectivity.A number of new methods have been developed (Cribben et al., 2012;Fiecas & Ombao, 2016;Lindquist et al., 2014;Ting et al., 2015;Prado, 2013) for brain connectivity analyses.One advantage of this model is that it provides both a nonparametric Bayesian model and an efficient inferential method for modeling the complex dynamic dependence among multiple stochastic processes that is common in the study of brain activity.
To address the constraint for correlation processes (positive definite matrix at each time having unit diagonals and off-diagonal entries with magnitudes no greater than 1), we introduce unitvector Gaussian process priors.There are other related works, e.g.generalized Wishart process (Wilson & Ghahramani, 2011), and latent factor process (Fox & Dunson, 2015), that explore the product of vector Gaussian processes.In general they do not grant full flexibility in simultaneously modeling the mean, variance and correlation processes.For example, latent factor based models link the mean and covariance processes through a loading matrix, which is restrictive and undesirable if the linear link is not appropriate, and thus are outperformed by our proposed flexible framework (See more details in Section 4•2).Other approaches to model non-stationary processes use a representation in terms of a basis such as wavelets (Nason et al., 2000;Park et al., 2014;Cho & Fryzlewicz, 2015) and the SLEX (Ombao et al., 2005), which are actually inspired by the Fourier representations in the Dahlhaus locally stationary processes (Dahlhaus, 2000;Priestley, 1965).These approaches are frequentist and do not easily provide a framework for inference (e.g., obtaining confidence intervals).
This main contributions of this paper are: (a.) a sphere-product representation of correlation/covariance matrix is introduced to induce flexible priors for correlation/covariance matrices and processes; (b.) a general and flexible framework is proposed for modeling mean, variance, and correlation processes separately; (c.) an efficient algorithm is introduced to infer correlation matrices and processes; (d.) posterior contraction phenomena are studied for both mean and covariance (correlation) processes.
The rest of the paper is organized as follows.In the next section, we present a geometric view of covariance matrices and extend this view to allow covariance matrices to change over time.In Section 3, we use this geometrical perspective to develop an effective and computationally efficient inferential method for modeling static and dynamic covariance matrices.Using simulated data, we will evaluate our method in Section 4. In Section 5, we apply our proposed method to local field potential (LFP) data recorded from the hippocampus of rats performing a complex sequence memory task.In the final section, we conclude with discussions about future work.

STRUCTURED BAYESIAN MODELING OF COVARIANCE (CORRELATION)
To derive flexible models for covariance and correlation matrices, we start with the Cholesky decomposition of covariance, form a sphere-product representation, and finally obtain the separation decomposition in Barnard et al. (2000) with correlations represented as products of unit vectors.The sphere-product representation is amenable for the inferential algorithm to handle the resulting intractability, and hence lays the foundation for full flexibility in choosing priors.
Any covariance matrix Σ = [σ ij ] > 0 is symmetric positive definite, and hence has a unique Cholesky decomposition Σ = LL T where the Cholesky factor L = [l ij ] is a lower triangular matrix such that σ ij = min{i,j} k=1 l ik l jk .Then each variance σ 2 i := σ ii can be written in terms of the corresponding row For Σ > 0, it is equivalent to require all the leading principal minors {M i } to be positive: Based on this observation, for i ∈ {1, • • • , D}, l i can be viewed as a point on a sphere with radius σ i excluding the equator, denoted as . Therefore the space of the Cholesky factor in terms of its rows can be written as a product of spheres.Note is the sufficient and necessary condition for Σ = LL T to be a covariance matrix.We present probabilistic models involving covariance matrices in the following generic form: where T , and the half-vectorization in row order, vech T , transforms the lower triangular matrix L into a vector

2
. 1 Alternatively, we can make use of the separation strategy (Barnard et al., 2000) to decompose covariances in terms of standard deviations and correlations: Σ = diag(σ)P diag(σ).Then we have a unique Cholesky decomposition for the correlation matrix P = [ρ ij ] = L * (L * ) T , where the Cholesky factor L * = diag(σ −1 )L can be obtained by normalizing each row of L. The magnitude requirements for correlations are immediately satisfied by the Cauchy-Schwarz inequality: where S i−1 0 := S i−1 0 (1).Similarly, (3) is the sufficient and necessary condition for P = L * (L * ) T to be a correlation matrix.Then we have the following alternative structured model In what follows, we will show that the above framework includes the inverse-Wishart prior as a special case, but it can be easily generalized to a broader range of priors for additional flexibility.Such flexibility enables us to better express prior knowledge, control the model complexity and speed up computation in modeling real-life phenomena.

2•1. Connection to the Inverse-Wishart Prior
There are some interesting connections between the spherical product representations (1) (3) and the early development of the Wishart distribution (Wishart, 1928).The original Wishart distribution was derived by orthogonalizing multivariate Gaussian random variables (See more details in Sverdrup, 1947;Anderson, 2003).
Suppose Σ is a random sample from the inverse-Wishart distribution W −1 D (Ψ, ν) with the scale matrix Ψ > 0 and the degree of freedom ν ≥ D. Therefore, Σ −1 ∼ W D (Ψ −1 , ν).Denote C as the Cholesky factor of Ψ −1 , i.e.Ψ −1 = CC T .Then Σ −1 has the following Bartlett decomposition (Anderson, 2003;Smith & Hocking, 1972) where the Bartlett factor, T, is a lower triangular matrix.Now taking the inverse of the first equation in (5) yields the reversed Cholesky decomposition2 Σ = UU T , where U := T −T is an upper triangular matrix.The following theorem describes the density of U, which enables us to treat the inverse-Wishart distribution as a special instance of strategy (2) or (4).
1 For each i ∈ {1, • • • , D}, given σ i , there are only (i − 1) free parameters on S i−1 0 (σ i ), so there are totally Proof.See Section A in the supplementary file.
If we normalize each row of U = diag(σ)U * and write u * ij = u ij /σ i with σ i = u i , then the following joint prior of (σ, U * ) is inseparable in general: However, we can conditionally model variance and correlation factor as p(σ|U * ) and p(U * |σ) respectively, similarly as in our proposed scheme (2) or (4).Figure 1 verifies the validity of our proposed method (4) by showing that marginal densities of entries in Σ estimated from MCMC samples obtained by such conditional construction closely match the corresponding results by direct sampling in a normal-inverse-Wishart example (Section 4•1).This representation facilitates the construction of more flexible prior distributions for covariance matrix detailed below.
2•2.More Flexible Priors Since σ and L * have independent priors in (4), in what follows we mainly focus on the scheme (4), and for simplicity, we denote the normalized Cholesky factor as L. Also, following Barnard et al. (2000), we assume a log-Normal prior on σ: log(σ) ∼ N (ξ, Λ).
The spherical product representation in (4) enables us to develop a broader class of priors on correlation.If two variables, y i and y j (assuming i < j) are known to be uncorrelated a priori, i.e. 0 = ρ ij = l i , l j , then we can choose a prior that encourages l i ⊥ l j , e.g.l jk ≈ 0 for k ≤ i.In contrast, if we believe a priori that there is a strong correlation between the two variables, we can specify that l i and l j be linearly dependent, e.g., by setting [l jk ] k≤i ≈ ±l i .Putting more prior probability on the diagonal elements of L renders fewer non-zero off-diagonal elements, Fig. 2: Symmetric squared-Dirichlet distributions Dir 2 (α) defined on the 2-sphere with different settings for concentration parameter α = α1.The uniform distribution on the simplex, Dir(1), becomes non-uniform on the sphere due to the stretch of geometry (left); the symmetric Dirichlet distribution Dir( 1 2 1) becomes uniform on the sphere (middle); with α closer to 0, the induced distribution becomes more concentrated on the polar points (right).which in turn leads to a larger number of perpendicular rows; that is, such a prior favors zeros in the correlation matrix P.More generally, one can map a probability distribution defined on the simplex onto the sphere and consider the following squared-Dirichlet distribution.
Definition 1 (Squared-Dirichlet distribution).A random vector l i ∈ S i−1 is said to have a squared-Dirichlet distribution with parameter Then l i has the following density To induce a prior distribution for the correlation matrix P = LL T , one can specify priors on row vectors of L, l i ∼ Dir 2 (α i ) for i = 2, • • • , D. To encourage small correlation, we choose α i so that the probability density concentrates around the (two) poles of S i−1 0 , e.g.0 < α ik α ii for k < i. Figure 2 illustrates the density heat maps of some symmetric squared-Dirichlet distributions Dir 2 (α1) on S 2 .It is interesting that the squared-Dirichlet distribution induces two important uniform prior distributions over correlation matrices from Barnard et al. (2000) in an effort to provide flexible priors for covariance matrices, as stated in the following theorem.
THEOREM 2 (UNIFORM DISTRIBUTIONS).Let P = LL T .Suppose l i ∼ Dir 2 (α i ), for i = 2, • • • , D, are independent, where l i is the i-th row of L. We have the following , then P follows a jointly uniform distribution, that is, p(P) ∝ 1.
Proof.See Section A in the supplementary file.
Another natural spherical prior can be obtained by constraining a multivariate Gaussian random vector to have unit norm.We denote unit-vector Gaussian random variable l i ∼ N S i (µ, Σ) ) and l i 2 = 1.This is later generalized to a vector Gaussian process constrained to a sphere that serves as a suitable prior for modeling correlation processes.This conditional Gaussian essentially defines the Fisher-Bingham distribution (a.k.a.generalized Kent distribution, Kent, 1982;Mardia & Jupp, 2009), which reduces to the von Mises-Fisher distribution (Fisher, 1953) if Σ = I, and the uniform distribution on S i−1 0 if in addition µ = 0. See more details in Section F•1 of the supplementary file.

2•3. Dynamically Modeling the Covariance
We can generalize the proposed framework for modeling covariance/correlation matrices to the dynamic setting by adding subscript t to variables in the model (2) and the model (4), thus called dynamic covariance and dynamic correlation models respectively.We focus the latter in this section.One can model the components of σ t as independent dynamic processes using, e.g.ARMA, GARCH, or log-Gaussian process.For L t , we use vector processes.Since each row of L t has to be on a sphere of certain dimension, we require the unit norm constraint for the dynamic process over time.We refer to any multivariate process l i (x) satisfying l i (x) ≡ 1, ∀x ∈ X as unit-vector process (uvP).A unit-vector process can be obtained by constraining an existing multivariate process, e.g. the vector Gaussian process (vGP), as defined below.
Definition 2 (Vector Gaussian process).A D-dimensional vector Gaussian process Z(x is a collection of D-dimensional random vectors, indexed by x ∈ X , such that for any finite set of indices {x 1 , • • • , x N }, the random matrix Z N ×D := (Z(x 1 ), • • • , Z(x N )) T has the following matrix normal distribution and K is the kernel matrix with elements K ij = C(x i , x j ).
Note for each k = 1, • • • D, we have the marginal GP Z k (x) ∼ GP(µ k , C).In the above definition, we require a common kernel C for all the marginal GPs, whose dependence is characterized by the cross covariance V D×D .For simplicity, we often consider µ ≡ 0 and Restricting vGP Z(•) to sphere yields a unit-vector Gaussian process (uvGP) Figure 3 shows a realization of vector GP Z t , unit-vector GP (forming rows of) L t and the induced correlation process P t .
In what follows, we focus on multivariate time series; therefore, we use the one dimensional time index t ∈ X = R + .The overall dynamic correlation model can be summarized as follows: where a constant mean function n i = (0, • • • , 0, 1) is used in the uvGP prior for l i (t).This model (8) captures the spatial dependence in the matrix Σ t , which evolves along the time; while the Fig. 3: A realization of vector GP Z t (left), unit-vector GP (forming rows of) L t (middle) and the induced correlation process P t (right).temporal correlation is characterized by various GPs.The induced covariance process Σ t is not a generalized Wishart process (Wilson & Ghahramani, 2011), which only models Cholesky factor of covariance using GP.Though with GP, dynamic covariance model may work similarly as the dynamic correlation model ( 8), yet the latter provides extra flexibility in modeling the evolution of variances and correlations separately.In general such flexibility could be useful in handling constraints for processes, e.g.modeling the dynamic probability for binary time series.

2•4. Posterior Contraction Theorem
We now provide a theorem on the posterior contraction of the dynamic covariance model before we conclude this section.Because the posterior contraction for mean regression using Gaussian process has been thoroughly studied in the literature (van der Vaart & van Zanten, 2008a, 2009, 2011;Yang & Dunson, 2016), we only investigate the covariance regression and set µ t ≡ 0. We leave the posterior contraction of the dynamic correlation model (8) for future work.Note, the Cholesky decomposition of covariance matrix Σ = LL T is unique if all the diagonal entries of L are positive.Therefore in the remaining of this section, we identify Cholesky factors up to a column-wise sign, i.e.L ∼ L diag(− j∈J e j ) for J ⊂ {1, • • • , D} where e j is the j-th column of identity matrix I D .
In most cases, Gaussian process L t can be viewed as a tight Borel measurable map in a Banach space, e.g. a space of continuous functions or L p space.It is well known that the support of a centered GP is equal to the closure of the reproducible kernel Hilbert space (RKHS) H associated to this process (Lemma 5.1 of van der Vaart & van Zanten, 2008b).Because the posterior distribution necessarily puts all its mass on the support of the prior, the posterior consistency requires the true parameter L 0 governing the distribution of the data to fall in this support (van der Vaart & van Zanten, 2008a).Following van der Vaart & van Zanten (2008a,b, 2011), we express the rate of the posterior contraction in terms of the concentration function where • is the norm of the Banach space where the GP L takes value, Π is the GP prior and H is the associated RKHS with norm • H .Under certain regularity conditions, the posterior contracts with increasing data expressed in n at the rate ε n → 0 satisfying and h(p 1 , p 2 2 as the Hellinger distance between p 1 and p 2 .Now we state the main theorem of posterior contraction.THEOREM 3 (POSTERIOR CONTRACTION).Let L − I be a Borel measurable, zero-mean tight Gaussian random element in ∞ (X ) D(D+1)/2 .Suppose that L 0 is contained in the support of L and let φ L 0 be the function in (9) with the uniform norm • ∞ .Then, the posterior distribution relative to the prior ) → 0 for any sufficiently large constant M and ε n given by (10).
Proof.See Section B in the supplementary file.

POSTERIOR INFERENCE
Now we obtain the posterior probability of mean µ t , variance σ t , Cholesky factor of correlation L t , hyper-parameters γ := (γ µ , γ σ , γ L ) and ρ := (ρ µ , ρ σ , ρ L ) in the model ( 8).Denote the realization of processes µ t , σ t , L t at discrete time points {t n } N n=1 as µ N ×D , σ N ×D , L N ×D×D respectively.Transform the parameters τ := log( σ), η := log(ρ) for the convenience of calculation.We use a Metropolis-within-Gibbs algorithm and alternate updating the model parameters µ, τ , L, γ, η.Note, sampling the posterior of L is the most challenging due to the unit norm constraint of each l i .In the following we will use a MCMC algorithm defined on spheres to handle such constraints.Refer to Section C•1 of the supplement for the details of other parameters.3•1.Spherical HMC Spherical Hamiltonian Monte Carlo (SphHMC, Lan et al., 2014;Lan & Shahbaba, 2016) is a Hamiltonian Monte Carlo (HMC, Duane et al., 1987;Neal, 2011) algorithm on spheres proposed to handle norm constraints in sampling.It can be used to sample each row of the Cholesky factor of a correlation matrix with unit 2-norm constraint.The following notation q is instantiated as l i .
Assume a probability distribution with density function f (q) is defined on a (D − 1) dimensional sphere with radius r, S D−1 (r).Due to the norm constraint, there are (D − 1) free parameters q −D := (q 1 , • • • , q D−1 ) that can be viewed as the Cartesian coordinates for the manifold S D−1 + (r).To induce Hamiltonian dynamics on the sphere, we define the potential energy for position q as U (q) := − log f (q).Endowing the canonical spherical metric on the Riemannian manifold S D−1 (r), we introduce the auxiliary velocity vector v|q ∼ N (0, G(q) −1 ) and define the associated kinetic energy as (Girolami & Calderhead, 2011).The resulting Lagrangian dynamics can by numerically approximated by a splitting technique which yields the following geometric integrator (Lan et al., 2014;Lan & Shahbaba, 2016): where g(q) := ∇ q Ũ (q), P(q) := I D − r −2 qq T .( 11) defines a mapping T h : (q, v) → (q , v ).Denote u 2 P(q) := u T P(q)u.After applying such integrator T times, a proposal ) is accepted with the following probability We can prove the following limiting result (Beskos et al., 2011).
THEOREM 4. Let h → 0 we have the following energy conservation Proof.See Section D in the supplementary file.
Inspired by the 'No-U-Turn' sampler (Hoffman & Gelman, 2014), we develop Adaptive Spherical Hamiltonian Monte Carlo (adp-SphHMC) (Section D•3 in the supplement) that automatically tunes the number of leapfrog steps and the step size in sampling l i ∈ S i−1 0 .To sample L (or L t ), we could update each row vector l i ∈ S i−1 0 according to (11) (in parallel), and accept/reject vech T (L) (or vech T (L t )) simultaneously based on (12) in terms of the sum of total energy of all components.We refer to the resulting algorithm as ∆-Spherical HMC (∆-SphHMC).
The computational complexity involving GP prior is O(N 3 ), and that of the likelihood evaluation is O(M D 2 ).MCMC updates of µ N ×D , σ N ×D , L N ×D×D have complexity O(N D), O(N D) and O(N D 2 ) respectively.To scale up applications to larger dimension D, one could preliminarily classify data into groups, and arrange the corresponding blocks of their covariance/correlation matrix in some 'band' along the main diagonal.More specifically, we can assume L t is w-band lower triangular matrix for each time t, i.e. l ij = 0 for i < j or i − j ≥ w, then the resulting covariance/correlation matrix will be (2w − 1)-banded.In this way the complexity of likelihood evaluation and updating L will be reduced to O(M wD) and O(N wD) resepctively.Therefore the total computational cost would scale linearly with the dimension D. This technique will be explored in Section F•3 of the supplementary file.

SIMULATION STUDIES
In this section, we use simulated examples to illustrate the advantage of our structured models for covariance.First, we consider a normal-inverse-Wishart problem, which has been used to verify our structured models (2) (4) in Section 2•1.We investigate flexible priors in Section 2•2 using this example.Then we test our dynamical modeling method in Section 2•3 on a periodic process model.Our model manifests full flexibility compared to a state-of-the-art nonparametric covariance regression model based on latent factor process (Fox & Dunson, 2015).

4•1. Normal-inverse-Wishart Problem
Consider the following example involving inverse-Wishart prior It is known that the posterior of Σ|Y is still an inverse-Wishart distribution: We consider dimension D = 3 and generate Y with µ 0 = 0, Σ = Σ 0 = 1 11 (I + 11 T ) for N = 20 data points so that the prior is not overwhelmed by data.Now we examine the flexibility of alternative priors in providing prior information for correlation.Specify the following squared-Dirichlet prior (7) for L in the structured model ( 4) where we consider three cases i) α = 1, α 0 = 1; ii) α = 0.1, α 0 = 1; iii) α = 0.1, α 0 = 10.We generate 10 6 prior samples (according to (15)) and posterior samples (by ∆-SphHMC) for L respectively and covert them to P = LL T .For each entry ρ ij , the marginal prior/posterior densities (based on samples) with with maximal likelihood estimates (MLEs) are plotted in Figure 4.With more and more weight (through α) put around the poles (last component) of each factor sphere, the priors become increasingly dominant that the posteriors (red dash lines) almost fall on priors (blue solid lines) when α = (0.1, 0.1, 10).In this extreme case, the squared-Dirichlet distribution induces priors in favor of trivial (zero) correlations.Similar results regarding the Bingham prior and von Mises-Fisher prior are reported in Section F•1 of the supplementary file.

4•2. Simulated Periodic Processes
In this section, we investigate the performance of our dynamic model (8) on the following periodic process example To fit the data using the model ( 8), we set s = 2, a = (1, 1, 1), b = (0.1, 10 −3 , 0.2), m = (0, 0, 0) for all settings, V = (1, 0.5, 1) for N = 20 and V = (1, 1, 0.3) for N = 200.We also add an additional nugget of 10 −5 I n to all the covariance kernel of GPs to ensure non-degeneracy.We run MCMC for 1.5 × 10 5 iterations, burn in the first 5 × 10 4 and subsample 1 for every 10.Based on the resulting 10 4 posterior samples, we estimate the underlying mean functions and covariance functions and plot the estimates in Figure 5. Full Flexibility can be granted by our method (8) because it models mean, variance and correlation processes separately.This is particularly useful if they behave differently.It contrasts latent factor based models that tie mean and covariance processes together.One of the state-of-the-art models of this type is Bayesian nonparametric covariance regression (Fox & Dunson, 2015): We tweak the simulated example ( 16) for D = 2 to let mean and correlation processes have higher frequency than variance processes, as shown in the dashed lines in Figure 6.We generate M = 10 trials of data over N = 200 evenly spaced points.In this case, the true mean processes µ(x) and true covariance processes Σ(x) behave differently but are modeled with a common loading matrix Λ(x) in model ( 17).This imposes difficulty on (17) to have a latent factor process ψ(x) that could properly accommodate the heterogeneity in mean and covariance processes.
Figure 6 shows that due to this reason, latent factor based model ( 17) (upper row) fails to generate satisfactory fit for all of the mean, variance and correlation processes.Our fully flexible model (8) (bottom row), on the contrary, successfully produces more accurate characterization for all of them.See more discussion in Section 6 and more details in Appendices F•2 and F•3.
5. ANALYSIS OF LOCAL FIELD POTENTIAL ACTIVITY Now we use the proposed model ( 8) to analyze a local field potential (LFP) activity dataset.The goal of this analysis is to elucidate how memory encoding, retrieval and decision-making arise from functional interactions among brain regions, by modeling how their dynamic connectivity varies during performance of complex memory tasks.Here we focus on LFP activity data recorded from 24 electrodes spanning the dorsal CA1 subregion of the hippocampus as rats performed a sequence memory task (Allen et al., 2014(Allen et al., , 2016;;Ng et al., 2017;Holbrook et al., 2017).The task involves repeated presentations of a sequence of odors (e.g., ABCDE) at a single port and requires rats to correctly determine whether each odor is presented 'in sequence' (InSeq; e.g., ABCDE; by holding their nosepoke response until the signal at 1.2s) or 'out of sequence' (OutSeq; e.g., ABDDE; by withdrawing their nose before the signal).In previous work using the same dataset, Holbrook et al. (2016) used a direct MCMC algorithm to study the spectral density matrix of LFP from 4 selected channels.However, they did not examine how their correlations varied across time and recording site.These limitations are addressed in this paper.
We focus our analyses on the time window from 0ms to 750ms (with 0 corresponding to when the rat's nose enters the odor port).Critically, this includes a time period during which the behavior of the animal is held constant (0-500ms) so differences in LFP reflect the cognitive processes associated with task performance, and, to serve as a comparison, a time period near 750ms during which the behavioral state of the animal is known to be different (i.e., by 750ms the animal has already withdrawn from the port on the majority of OutSeq trials, but is still in the port on InSeq trials).We also focus our analyses on two sets of adjacent electrodes (electrodes 20 and 22, and electrodes 8 and 9), which allows for comparisons between probes that are near each other (<1mm; i.e., 20:22 and 8:9) or more distant from each other (>2mm; i.e., 20:8, 20:9, 22:8, and 22:9).See Section F•4 in the supplement for the figure showing M = 20 trials of these LFP signals from D = 4 channels under both InSeq and OutSeq conditions.Our main objective is to quantify how correlations among these LFP channels varied across trial types (InSeq vs OutSeq) and over time (within the first 750ms of trials).To do so, we discretize the time window of 0.75 seconds into N = 300 equally-spaced small intervals.Under each experiment condition (InSeq or OutSeq), we treat all the signals as a 4 dimensional time series and fit them using our proposed dynamic correlation model (8) in order to discover the evolution of their relationship.
We set s = 2, a = (1, 1, 1), b = (1, 0.1, 0.2), m = (0, 0, 0), V = (1, 1.2, 2); and the general results are not very sensitive to the choice of these fine-tuning parameters.We also scale the discretized time points into (0, 1] and add an additional nugget of 10 −5 I n to the covariance kernel of GPs.We collect 7.5 × 10 4 samples, burn in the first 2.5 × 10 4 and subsample 1 for every 10.The resulting 10 4 samples yield estimates of correlation processes as shown in Figure 7 for betafiltered traces (20-40Hz) but similar patterns were also observed for theta-filtered traces (4-12Hz; see the supplement).The bottom panel of Figure 7 shows the dissimilarity between correlation processes under different conditions measured by the Frobenius norm of their difference.
Our approach revealed many important patterns in the data.First, it showed that electrodes near each other (20:22 and 8:9) displayed remarkably high correlations in their LFP activity on InSeq and OutSeq trials, whereas correlations were considerably lower among more distant electrodes (20:8, 20:9, 22:8, and 22:9).Second, it revealed that the correlations between InSeq and OutSeq matrices evolved during the presentation of individual trials.These results are consistent with other analyses on learning (see, e.g., Fiecas & Ombao, 2016).As expected, InSeq and OutSeq activity was very similar at the beginning of the time window (e.g., before 350ms), which is before the animal has any information about the InSeq or OutSeq status of the presented odor, but maximally different at the end of the time window, which is after it has made its response on OutSeq trials.Most important, however, is the discovery of InSeq vs OutSeq differences before 500ms, which reveal changes in neural activity associated with the complex cognitive process of identifying if events occurred in their expected order.These findings highlight the sensitivity of our novel approach, as such differences have not been detected with traditional analyses.

CONCLUSION
In this paper, we propose a novel Bayesian framework that grants full flexibility in modeling covariance and correlation matrices.It extends the separation strategy proposed by Barnard et al. (2000) and uses the Cholesky decomposition to maintain the positive definiteness of the correlation matrix.By defining distributions on spheres, a large class of flexible priors can be induced for covariance matrix that go beyond the commonly used but restrictive inverse-Wishart distribution.Adaptive ∆-Spherical HMC is introduced to handle the intractability of the resulting posterior.Furthermore, we extend this structured scheme to dynamical models to capture complex dependence among multiple stochastic processes, and demonstrate the effectiveness and efficiency in Bayesian modeling covariance and correlation matrices using a normal-inverse-Wishart problem, a simulated periodic process, and an analysis of LFP data.In addition, we provide both theoretic characterization and empirical investigation of posterior contraction for dynamically covariance modeling, which to our best knowledge, is a first attempt.
In future work, we will explore the low-rank structure of covariance and correlation matrices to further scale our method to problems of greater dimensionality.For example, we can adopt the similar factor analysis as in Fox & Dunson (2015), and assume vech T (L t ) ∈ (S k ) D for some k D. While our research has generated interesting new findings regarding brain signals during memory tasks, one limitation of our current analysis on LFP data is that it is conducted on a single rat.The proposed model can be generalized to account for variation among rats.In the future, we will apply this sensitive approach to other datasets, including simultaneous LFP recordings from multiple brain regions in rats as well as BOLD fMRI data collected from human subjects performing the same task.

A. CONNECTION TO KNOWN PRIORS
The following lemma is essential in proving that our proposed methods (2) (4) generalize existing methods in specifying priors, including the inverse-Wishart distribution, and two uniform distributions (Barnard et al., 2000) as well.
Lemma A.1.Let Σ = UU T be the reversed Cholesky decomposition of Σ.The Jocobian of the transformation Σ → U is Let P = LL T be the Cholesky decomposition of P. The Jacobian of the transformation L → P is Proof.Note we have Taking vec on both sides and applying its property  1980).Thus according to (Lemma3.4 and Lemma4.1 in Magnus & Neudecker, 1980) we have By similar argument, we have Thus it completes the proof.
Proof of Theorem 1.We know that the density of By Lemma A.1 we have Then the proof is completed.
Proof of Theorem 2. To prove the first result, we use Lemma A.1 On the other hand, from Equation ( 8) in Barnard et al. (2000), we have the density of marginally uniform distribution: where P ii is the i-th principal minor of P. Similarly by Lemma A.1 we can prove the second result Therefore we have finished the proof.

B. POSTERIOR CONTRACTION
For the Gaussian likelihood models p i ∼ N (µ i (t), Σ i (t)) for i = 0, 1, we first bound the Hellinger distance, Kullback-Leibler distance and variance distance V (p 0 , p 1 ) = E 0 (log(p 0 /p 1 )) 2 with the uniform norm in the following lemma.Notation means "smaller than or equal to a universal constant times".

Lemma B.1. For any bounded measurable functions Σ
with Cholesky decompositions Taking expectation of (B.1) with respect to p 0 yields the following Kullback-Leibler divergence Consider µ i ≡ 0. By the non-negativity of K-L divergence we have for general Therefore we can bound K-L divergence where we use Now take expectation of squared (B.1) with respect with p 0 to get the following variance distance Consider µ i ≡ 0 and we can bound the variance distance by similar argument as (B.3) where we use the fact Lastly, the squared Hellinger distance for multivariate Gaussians can be calculated Consider µ i ≡ 0. Notice that 1 − x ≤ − log x, and by (B.2) we can bound the squared Hellinger distance using similar argument in (B.3) Therefore we complete the proof.
Define the following coordinate concentration function as in ( 9) denote the minimum number of balls of radius ε that a cover B in a metric space with metric d, which is named ε-covering number for B. Now we prove a theorem similar as Theorem 2.1 of van der Vaart & van Zanten (2008a).
THEOREM B.1.Let L be a Borel measurable, zero-mean Gaussian random element in a separable Banach space (B D(D+1)/2 , • ) with RKHS (H D(D+1)/2 , • H ) and let L 0 be contained in the closure of H D(D+1)/2 in B D(D+1)/2 .For any numbers ε n > 0 satisfying (10) for φ L0 given by (9), and any C > 1 with e −Cnε 2 n < 1/2, there exists a measurable set We apply Theorem 2.1 of van der Vaart & van Zanten (2008a) to each coordinate n therefore we have (B.5c) by inverting the inequality.Now we are ready to prove the theorem of posterior concentration.
Proof of Theorem 3. The proof follows in the same vein of Theorem 3.1 of van der Vaart & van Zanten (2008a) to combine Theorem 2.1 of Ghosal et al. (2000) and Theorem B.1.First apply Theorem B.1 for L − I to obtain B n for L satisfying conditions (B.5).
We choose the set P n of Ghosal et al. (2000) equal to {p L : L ∈ B n }, with B n ⊂ ∞ (X ) D(D+1)/2 being the measurable set as in Theorem B.1.With Lemma B.1 the 4ε n -entropy of P n relative to Hellinger distance is bounded above by the 3ε 2 n -entropy of the set B n relative to the uniform distance, which is bounded by 6Cnε 2 n by Theorem B.1.This verifies (2.2) fo Ghosal et al. (2000).The prior probability Π(P c n ) outsider the set P n in (2.3) of Ghosal et al. (2000) is bounded by the probability of the event {L / ∈ B n }, which is bounded by e −Cnε 2 n , by Theorem B.1.Finally, by the second and third inequalities of Lemma B.1, the prior probability in (2.4) of Ghosal et al. (2000), is bounded below by the probability of the event { L − L 0 < 2ε n }, which is bounded below by e −nε 2 n , by Theorem B.1.
Remark B.1.In principle, the smoothness of GP should match the regularity of the true parameter to achieve the optimal rate of contraction (van der Vaart & van Zanten, 2008a, 2011).One can scale GP, e.g. using an inverse-Gamma bandwidth, to get optimal contraction rate for every regularity level so that the resulting estimator is rate adaptive (van der Vaart & van Zanten, 2009, 2011).One can refer to Section 3.2 of (van der Vaart & van Zanten, 2011) for posterior contraction rates using squared exponential kernel for GP.We leave further investigation on contraction rates in the setting of covariance regression to future work.
Remark B.2.Here the GP prior L defines a (mostly finite) probability measure on the space of bounded functions.The true parameter function L 0 is required to be contained in the support of the prior, the RKHS of L. The contraction rate depends on the position of L 0 relative to the RKHS and the small-ball probability Π( L < ε).
T and y * mn := (y mn − µ n ) • e −τn where • is the Hadamard product (a.k.a.Schur product), i.e. the entry-wise prod- We use a Metropolis-within-Gibbs algorithm and alternate updating the model parameters µ, τ , L, γ, η.We now list the parameters and their respective updates one by one.(γ).Note the prior for γ is conditionally conjugate given * = µ, τ, or L, where [condition] is 1 with the condition satisfied and 0 otherwise.(η).Given * = µ, τ, or L, we could sample η * using the slice sampler (Neal, 2003), which only requires log-posterior density and works well for scalar parameters, . By the definition of vGP, we have µ|γ µ , η µ ∼ MN N ×D (0, K µ , I D ); therefore, vec( µ)|γ µ , η µ ∼ N N D (0, I D ⊗ K µ ).On the other hand, one can write where (Tracy & Dwyer, 1969;Magnus & Neudecker, 1979).Therefore, the prior on vec( µ) is conditionally conjugate, and we have Using a similar argument by matrix Normal prior for τ , we have vec( τ . Therefore, we could use the elliptic slice sampler (ESS, Murray et al., 2010), which only requires the log-likelihood where P −1 We could sample from its posterior distribution using the ∆-Spherical Hamiltonian Monte Carlo (∆-SphHMC) described below.The logposterior density of L is The derivative of log-likelihood with respect to L n and the derivative of log-prior with respect to li can be calculated as Derivation of the geometric integrator for SphHMC The Lagrangian dynamics on the sphere S D−1 (r) with the first (D − 1) coordinates is where Γ(q −D ) = r −2 G(q −D ) ⊗ q −D is the Christoffel symbols of second kind (see details in Lan & Shahbaba, 2016, for r = 1).This dynamics (D.1) can be split into the following two smaller dynamics: where (D.2a) is the equation of geodesic on manifold S D which has analytical solution; and (D.2b) has analytical solution.Both define volume preserving maps.
The mapping I : q −D → q = (q −D , q D ) can be viewed as an imbedding of S D−1 + into R D .Denote its Jacobian as dI(q) := . Then we have Then Equation (D.2a) has the following solution with full coordinates and Equation (D.2b) has the following solution in full coordinates So numerically updating (D.4) for h/2, updating (D.3) for h and updating (D.4) for another h/2 yield the integrator (11).

D•2. Reformulating Acceptance
At the end of the numerical simulation, a proposal (q T , v T ) is accepted according to the following probability Such classic definition of acceptance probability can be reformulated by replacing ∆E in (D.5) with With ( 11) we can write where P(q Accumulating the above terms over τ = 1, • • • , T yields the reformulated acceptance probability (12).

D•3. Adaptive Spherical HMC
There are two tuning parameters in HMC and its variants: the step size h and the number of integration (leapfrog) steps T .Hand tuning heavily relies on domain expertise and could be inefficient.Here, we adopt the 'No-U-Turn' idea from Hoffman & Gelman (2014) and introduce a novel adaptive algorithm that obviates manual tuning of these parameters.
First, for any given step size h, we adopt a rule for setting the number of leapfrog steps based on the same philosophy as 'No-U-Turn' (Hoffman & Gelman, 2014).The idea is to avoid waste of computation occurred (e.g. when the sampler backtracks on its trajectory) without breaking the detailed balance condition for the MCMC transition kernel.S D−1 (r) is a compact manifold where any two points q(0), q(t) ∈ S D−1 (r) have bounded geodesic distance πr.We adopt the stopping rule for the leapfrog when the sampler exits the orthant of the initial state, that is, the trajectory measured in geodesic distance is at least π 2 r, which is equivalent to q(0), q(t) < 0. On the other hand, this condition may not be satisfied within reasonable number of iterations because the geometric integrator (11) does not follow a geodesic (great circle) in general (only the middle part does), therefore we set some threshold T max for the number of tests, and adopt the following 'Two-Orthants' (as the starting and end points occupy two orthants) rule for the number of leapfrogs: Alternatively, one can stop the leapfrog steps in a stochastic way based on the geodesic distance travelled: These stopping criteria are already time reversible, so the recursive binary tree as in 'No-U-Turn' algorithm (Hoffman & Gelman, 2014) is no longer needed.Lastly, we adopt the dual averaging scheme (Nesterov, 2009) for the adaptation of step size h.See Hoffman & Gelman (2014) for more details.We summarize our Adaptive Spherical Hamiltonian Monte Carlo (adp-SphHMC) in Algorithm 1.
Proof of Theorem 4. With the second equation of Lagrangian dynamics (D.1) we have Then we have the first equality hold because Lastly, from the first equation of Lagrangian dynamics (D.1) Therefore the second equality is proved.

E. GRADIENT CALCULATION IN NORMAL-INVERSE-WISHART PROBLEM
We use the representation (4) and derive log-posterior (log-likelihood and log-prior) and the corresponding gradients for (13) using matrix calculus.E•1.Gradients of log-likelihood Denote y * n := (y n − µ 0 )/σ.Then the log-likelihood becomes We calculate the gradient of log-likelihood with respect to σ And with the transformation τ = log(σ) it becomes | and thus we have where Taking differential directly on g n (U * ) := − 1 2 y * n T (U * ) −T (U * ) −1 y * n , and noting that differential and trace operators are exchangeable, we have Conversion from differential to normal derivative form in the numerator layout (Minka, 1997; revised 12/00) yields where diag acting on a vector forms a diagonal matrix while the action a matrix means extracting the diagonal vector.• is the Hadamard product (a.k.a.Schur product), i.e. the entrywise product.
Again by the exchangeability between differential and trace, we have Therefore we have . Lastly, the log-priors for (15) and their gradients after transformation τ := log(σ) are calculated The bottom row can be written as where 1 L denotes a lower-triangular matrix with l −1 ij being its (i, j) entry (i ≥ j).

F. MORE NUMERICAL RESULTS
F•1. Flexibility of von Mises-Fisher Prior and Bingham Prior Definition F.1 (Fisher-Bingham / Kent distribution).The probability density function of the Kent distribution for the random vector l i ∈ S i−1 is given by where i k=2 β k = 0 and 0 ≤ 2|β k | < κ and the vectors {γ k } i k=1 are orthonormal.Remark F.1.The parameters κ and γ 1 are called the concentration and the mean direction parameter, respectively.The greater the value of κ, the higher the concentration of the distribution around the mean direction γ 1 .The choice of γ 1 could impact our priors when modeling correlations.Parameters {β k } i k=2 determine the ellipticity of the contours of equal probability.The vectors {γ k } i k=2 determine the orientation of the equal probability contours on the sphere.(Fisher, 1953;Mardia & Jupp, 2009), denoted as vMF(κ, γ 1 ).If κ = 0, then it defines an antipodally symmetric distribution, named Bingham distribution (Bingham, 1974), denoted as Bing(A), with As before, to induce smaller correlations, one can put higher prior probabilities for l i on the poles of S i−1 .For example, we might consider l i ∼ vMF(κ, n i ), or l i ∼ Bing(ζ diag(n i )), where n i := (0, • • • , 0, 1) T is denoted as the north pole.Now let's consider the following von Mises-Fisher prior (Fisher et al., 1987;Fisher, 1953;Mardia & Jupp, 2009) for l i , the i-th row of the Cholesky factor L of correlation matrix P in the structured model (4).
Definition F.2 (Von Mises-Fisher distribution).The probability density function of the von Mises-Fisher distribution for the random vector l i ∈ S i−1 is given by where κ ≥ 0, µ = 1 and the normalization constant C i (κ) is equal to where I v denotes the modified Bessel function of the first kind at order v. Denote l i ∼ vMF(κ, µ).
Since we have no prior knowledge about the mean direction µ, we choose µ = n i = (0 i−1 , 1) T that favors the polar direction, i.e.
Definition F.3 (Bingham distribution).The probability density function of the Bingham distribution for the random vector l i ∈ S i−1 is given by where M is an orthogonal orientation matrix, Z is a diagonal concentration matrix, and 1 F 1 (•; •; •) is a confluent hypergeometric function of matrix argument.Denote l i ∼ Bing(M, Z).
Note, according to Bingham (1974), this distribution is defined for Z up to an arbitrary scalar matrix ζ 0 I. Therefore, we consider M = I and Z = ζ diag(n i ) that favors the polar direction, i.e.   show the scalability, but rather to investigate the robustness of our dynamic model (8) in terms of full flexibility.
We generate M = 20 trials of data over N = 100 evenly spaced points over [0, 1].The true mean, variance and correlation functions are modified from the example (16) using the Clausen functions (Clausen, 1832).Seen from Figure F.3, they behave more intricately with higher heterogeneity among those processes.This could impose further challenge for latent factor based models like (17) compared to D = 2.We repeat the experiments in Section 4•2 and compare our dynamic model ( 8) with the latent factor process model ( 17) by Fox & Dunson (2015).To aid the visualization, we subtract the estimated process from their true values and plot the error functions in Figure F.4.Even if we have tried our best to tune the parameters, e.g.L, the number of basis functions, and k, the size of latent factors, the latent factor process model (Fox & Dunson, 2015) is outperformed by our flexible dynamic model (8) in reducing estimation errors.16) for D = 100 dimensions to test the scalability of our dynamic model ( 8).However instead of the full covariance, we only consider a diagonal covariance matrix plus 4 non-zero off-diagonal entries σ 1,2 (σ 2,1 ) and σ 99,100 (σ 100,99 ).We focus on the correlation process in this example thus set µ t ≡ 0 and σ t ≡ 1, for t ∈ [0, 1].More specifically when generating data {y t } with (16), if i / ∈ {2, 100} we set i-th rows L i = S i = e i with e i being the i-th row of identity matrix.To apply our dynamical model (8) in this setting, we let L t have 'w-band' structure with w = 2 at each time t.Setting s = 2, a = 1, b = 0.1, m = 0 and V = 10 −3 , N = 100 and M = 100, we repeat the MCMC runs for 7.5 × 10 4 iterations, burn in the first 2.5 × 10 4 and subsample 1 for every 10 to obtain 5 × 10 3 posterior samples in the end.Based on those samples, we estimate the underlying correlation functions and only plot ρ 1,2 , ρ 49,50 and ρ 99,100 in Figure F.5.With the 'w-band' structure, we have less entries in the covariance matrix and focus on the 'in-group' correlation.Our dynamical model ( 8) is sensitive enough to discern the informative non-zero components from the non-informative ones in these correlation functions.Unit-vector GP priors provide flexibility for the model to capture the changing pattern of informative correlations.The left panel of Figure F.5 shows that the model (8) correctly identify the non-zero components ρ 1,2 and ρ 99,100 and characterize their evolution.The right panel shows that the 2-norm distance between the estimated and true correlation matrices, P (t) − P (t) 2 , is small, indicating that our dynamic model (8) performs well with higher dimension in estimating complex dependence structure among multiple stochastic processes.

F•4. More Results on the Analysis of LFP data
In Section 5, we studied the LFP data collected from the hippocampus of rats performing a complex sequence memory task.Here we observe a similar dynamic pattern of correlation matrices under two conditions that diverge after 500ms, indicating the neural activity associated with the cognitive process of identifying whether events occurred in their expected order.
We also did a study of the correlation evolution on the full 12 channels and revealed the block structure of those channels and the same changing pattern under different experimental conditions discovered with the chosen 4 channels in Section 5.This video demonstrates the result of 12 channels https://www.youtube.com/watch?v=NMUUic0IDsM.

Fig. 1 :
Fig. 1: Marginal posterior densities of σ ij in the normal-inverse-Wishart problem.Solid blue lines are estimates by ∆-SphHMC and dashed red lines are estimates by direct sampling.

Fig. 4 :
Fig. 4: Marginal posterior, prior (induced from squared-Dirichlet distribution) densities of correlations and MLEs with different settings for concentration parameter α.

Fig. 5 :
Fig. 5: Estimation of the underlying mean functions µ t (left in each of 4 subpannels) and covariance functions Σ t (right in each of 4 subpannels) of 2-dimensional periodic processes.M is the number of trials, and N is the number of discretization points.Dashed lines are true values, solid lines are estimates and shaded regions are 95% credible bands.
Note in Figure 5, both M and N have effect on the amount of data information thereafter on the posterior contraction but the contraction rate may depend on them differently.Both mean and covariance functions have narrower credible bands for more discretization points N (comparing N = 20 in the first row with N = 200 for the second row).On the other hand, both posteriors contract further with more trials M (comparing M = 10 in the first column agains M = 100 for the second column).In general the posterior of mean function contracts to the truth faster than the posterior of covariance function.With M = 100 trials and N = 200 discretization points, both mean and covariance functions are almost recovered by the model (8).

Fig. 6 :
Fig. 6: Estimation of the underlying mean functions µ t (left column), variance functions σ t (middle column) and correlation function ρ t (right column) of 2-dimensional periodic processes, using latent factor process model (upper row) and our flexible model (lower row), based on M = 10 trials of data over N = 200 evenly spaced points.Dashed lines are true values, solid lines are estimates and shaded regions are 95% credible bands.
dvecΣ = (U ⊗ I)dvecU + (I ⊗ U)dvecU T Applying the elimination L D on both sidesdvechΣ = L D [(U ⊗ I)K D dvecU T + (I ⊗ U)dvecU T ] = L D (K D + I)(I ⊗ U)dvecU T = 2L D N D (I ⊗ U)L T D dvechU T = 2L D N D L T D D T D (I ⊗ U)L T D dvechU Twhere K D is the commutation matrix such that K D vecA = vecA T for matrix A D×D , N D := (K D + I)/2, and D D is the duplication matrix which is regarded as the inverse of the elimination matrix L D .The last equation is by D D L D N D = N D = N T D (Lemma2.1 and Lemma3.5 in Magnus & Neudecker,

Fig
Fig. F.1: Marginal posterior, prior (induced from von Mises-Fisher distribution) densities of correlations and MLEs with different settings for concentration parameter κ, estimated with 10 6 samples.

Fig. F. 2 :
Fig. F.2: Marginal posterior, prior (induced from Bingham distribution) densities of correlations and MLEs with different settings for concentration parameter ζ, estimated with 10 6 samples.
we consider i) ζ = 1; ii) ζ = 10; iii) ζ = 100.The log-prior and its gradient are calculated as follows log pWe repeat the above experiment with the Bingham prior for l i .The posteriors, priors and maximal likelihood estimates (MLE) of correlations with different ζ's are plotted in Figure F.2 respectively.With larger concentration parameter ζ, the posteriors are pulled more towards the induced priors and concentrate on 0.F•2.More Comparison to Latent Factor Process ModelThe example of simulated periodic process in Section 4•2 is consider for D = 2 for simplicity and convenience of visualization.Here we consider higher dimension D = 10.The purpose here is not to

Fig
Fig. F.3: Simulated data y over the underlying mean functions µ t (left), the variance functions Σ t , and the correlation functions P t (right) of 10-dimension periodic processes.

Fig
Fig. F.4: Estimated error functions of the underlying mean µ t (left column), variance σ t (middle column) and correlation ρ t (right column) of 10-dimensional periodic processes, using latent factor process model (upper row) and our flexible model (lower row), based on M = 20 trials of data over N = 100 evenly spaced points.Solid lines are estimated errors and shaded regions are 95% credible bands.

Fig
Fig. F.5: Posterior estimation of the underlying correlation functions P t (left) and its 2-norm distance to the truth (right) of 100-dimensional periodic processes with 2-band structure, based on M = 100 trials of data over N = 100 discretization points.Dashed lines are true values, solid lines are estimates and shaded regions are 95% credible bands.
Figure F.6 shows 12 locations from CA1 subregion of the hippocampus of the rat where LFP signals are recorded.Figure F.7 shows M = 20 trials of these LFP signals from D = 4 channels under both InSeq and OutSeq conditions.
Figure F.8 shows the theta-filtered traces (4-12Hz; left panel) and the estimated correlation processes under different experiment conditions (InSeq vs OutSeq; right panel).
Fig. F.6: Locations of recorded LFP signals in CA1 subregion of the rat's hippocampus.