Bayesian Uncertainty Quantification for Low-Rank Matrix Completion

We consider the problem of uncertainty quantification for an unknown low-rank matrix $\mathbf{X}$, given a partial and noisy observation of its entries. This quantification of uncertainty is essential for many real-world problems, including image processing, satellite imaging, and seismology, providing a principled framework for validating scientific conclusions and guiding decision-making. However, existing literature has mainly focused on the completion (i.e., point estimation) of the matrix $\mathbf{X}$, with little work on investigating its uncertainty. To this end, we propose in this work a new Bayesian modeling framework, called BayeSMG, which parametrizes the unknown $\mathbf{X}$ via its underlying row and column subspaces. This Bayesian subspace parametrization enables efficient posterior inference on matrix subspaces, which represents interpretable phenomena in many applications. This can then be leveraged for improved matrix recovery. We demonstrate the effectiveness of BayeSMG over existing Bayesian matrix recovery methods in numerical experiments, image inpainting, and a seismic sensor network application.


Introduction
Low-rank matrices play a vital role in modeling many scientific and engineering problems, including (but not limited to) image processing, satellite imaging, and network analysis. In such applications, however, only a small portion of the desired matrix (which we denote as X ∈ R m1×m2 in this article) can be observed. The reasons for this are two-fold: (i) the cost of observing all matrix entries can be high, requiring expensive computational, experimental, or communication expenditure; (ii) there can be missing observations at individual entries due to sensor malfunction, experimental failure, or unreliable data transmission. The matrix completion problem aims to complete the missing entries of X from a partial (and often-times noisy) observation. Matrix completion has attracted much attention since the seminal works of Candès and Tao (2010), Candès and Recht (2009), and Recht (2011). The theory and methodology behind point estimation are now well-understood for matrix completion, under the assumption that X is low-rank, with various convex and non-convex optimization algorithms developed for performing this recovery.
However, much of the literature (a detailed review is in Section 1.1) has focused on the completion, i.e., point estimation, of X, with little work on exploring the uncertainty of such estimates. In many scientific and engineering applications, such estimates are much more useful when coupled with a measure of uncertainty. The principled characterization (and reduction) of this uncertainty is known as uncertainty quantification (UQ), see, e.g., Smith (2013). UQ is becoming increasingly important in various applications, providing a principled framework for validating scientific conclusions and guiding decision-making.
In this paper, we address the problem of UQ for the matrix completion problem from a Bayesian perspective. We propose a novel Bayesian modeling framework, called BayeSMG, which quantifies uncertainty in the desired matrix X via posterior sampling on its underlying subspaces. BayeSMG can be viewed as a hierarchical Bayesian extension of the singular matrix-variate Gaussian (SMG) distribution (see Gupta and Nagar, 1999;Mak and Xie, 2018), with hierarchical priors on matrix subspaces. A scalable posterior sampling algorithm is then derived for BayeSMG, which leverages the efficient subspace sampling algorithms proposed in Hoff (2007) and Hoff (2009). By integrating the subspace structure for posterior inference, we show that BayeSMG enjoys improved recovery performance and better interpretability compared with existing Bayesian models in extensive numerical experiments and a real-world seismic sensor network application.

Existing literature
Much of the existing literature on inferring X from partial observations falls under the topic of matrix completion -the completion (or point estimation) of X from observed entries. Early works in this area include the seminal works of Candès and Tao (2010), Candès and Recht (2009), and Recht (2011), which established conditions for exact completion via nuclear-norm minimization, under the assumption that observations are uniformly sampled without noise. This is then extended to the noisy matrix completion setting, where entries are observed with noise; important results include Candès and Plan (2010), Keshavan et al. (2010), Koltchinskii et al. (2011), and Negahban and Wainwright (2012), among others. There is now a rich body of work on matrix completion; recent overviews include Davenport and Romberg (2016) and Chi et al. (2019). However, completion focuses solely on the point estimation of matrix entries and does not provide uncertainty quantification on those unobserved. In scenarios where only a few entries are observed(see motivating applications), this uncertainty can be as valuable as point estimates in assessing the quality of the recovered matrix.
The current research literature has generally focused on point estimation of the unknown matrix X. The problem of quantifying uncertainties in X has been relatively unexplored, but it is nonetheless an important one given the motivating applications. One recent pioneering work on this is Chen et al. (2019), which proposed entrywise confidence intervals for both convex and non-convex estimators on X, via debiasing using low-rank factors of the matrix. The resulting debiased estimators admit nearly precise nonasymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals for missing matrix entries and low-rank factors. Our approach has several distinctions from this work. First, the latter is a frequentist approach with appealing theoretical guarantees, whereas our approach is Bayesian and yields a richer quantification of uncertainty on X via a hierarchical Bayesian model. Second, to derive elegant theoretical results, the latter requires a sample size complexity condition on X, similar to the minimum sample size condition in standard matrix completion analysis (see, e.g., Candès and Recht, 2009). Our UQ approach, in contrast, is applicable for any sample size n on X, particularly for the "small-n" setting where observations are limited and uncertainty quantification is most needed.
Another approach for quantifying uncertainty is via Bayesian modeling. There is a growing literature on Bayesian matrix completion, of which the most popular approach is the Bayesian Probabilistic Matrix Factorization (BPMF) method in Salakhutdinov and Mnih (2008). BPMF adopts the following probabilistic model on X: is an upper bound on matrix rank. Each row of the factorized matrices M and N are then assigned i.i.d. Gaussian priors N (μ M , Σ M ) and N (μ N , Σ N ), respectively. Conjugate normal hyperpriors are then assigned on the row and column means μ M ∼ N (0, Σ M β), μ N ∼ N (0, Σ N β), with Inverse-Wishart hyperpriors on row and column covariance matrices The hyperparameters β and W are typically specified to provide weakly-or non-informative priors. This model allows for an efficient Gibbs sampler, which performs conjugate sampling on each row of M and each row of N , along with conjugate updates on the mean vectors (μ M , μ N ) and covariance matrices (Σ M , Σ N ). With this, the BPMF can be shown to tackle problems as large as the Netflix dataset, with millions of user-movie ratings. A similar Bayesian model was proposed in Mai and Alquier (2015), with priors on each entry of M and N . Many other existing Bayesian matrix completion methods (e.g., Lawrence and Urtasun, 2009;Zhou et al., 2010;Babacan et al., 2011;Alquier et al., 2014) can be viewed as variations or extensions of this BPMF framework.
From a modeling perspective, the key novelty in the BayeSMG model is that it requires orthonormality in the factorized matrices, whereas the BPMF does not. Such a factorization can be viewed as parametrizing X via its singular value decomposition (SVD). This yields several advantages for our method, which we demonstrate later. First, by explicitly parametrizing row and column subspaces as model parameters, BayeSMG can incorporate prior knowledge on subspaces within the prior specification of such parameters. This prior information is often available in many signal processing and image processing problems, e.g., known signal structure or image features. Second, BayeSMG allows for direct inference on subspaces of X via posterior sampling, which is of direct interest in many problems, e.g., in sensor network localization (Zhang et al., 2020; an application we tackle later on) and topology identification problems (Eriksson et al., 2012). For subspace inference, our approach avoids performing an additional SVD step for every posterior sample (compared to the BPMF), which significantly speeds up inference for high-dimensional problems. Finally and perhaps most importantly, BayeSMG can leverage this posterior learning on subspaces to provide improved inference on X. Compared to the BPMF, our approach can yield faster posterior contraction for unobserved entries when the underlying matrix has a low-rank structure, in both numerical simulations and applications. It enables a more accurate estimate and more precise uncertainty quantification of X over the BPMF.
The BayeSMG model also provides several novel theoretical insights. In Section 4, we show that the maximum a posteriori (MAP) estimator takes the form of a regularized matrix estimator, which provides a connection between the proposed method and existing matrix completion techniques. We also show that the BayeSMG model provides a probabilistic model on matrix coherence (Candès and Recht, 2009). Coherence has been widely used in the matrix completion literature as a theoretical condition for recovery, which measures the "recoverability" of a low-rank matrix. Through this, we then establish an error monotonicity result for BayeSMG, which provides a reassuring check on the UQ performance of the proposed model.
The paper is organized as follows. Section 2 introduces the BayeSMG model. Section 3 presents an efficient posterior sampling algorithm for X via manifold sampling on its subspaces. Section 4 reveals connections between the BayeSMG model and coherence, and its impact on error convergence. Section 5 investigates numerical experiments with synthetic and image data. Section 6 explores a real-world seismic sensor network application. Section 7 concludes with discussions.

The SMG model
We first describe the Singular Matrix-variate Gaussian (SMG) distribution, and how it can be utilized for modeling matrix subspaces.

Problem set-up
Let X ∈ R m1×m2 be the matrix of interest, and assume X is low-rank, i.e., R := rank(X) m 1 ∧ m 2 . Let [m] := {1, · · · , m}. Suppose X is sampled with noise at an index set Ω ⊆ [m 1 ] × [m 2 ] of size |Ω| = n, yielding observations: (2.1) Here, Y i,j is the observation at entry indexed by (i, j), corrupted by noise i,j . In this work, we assume i,j ∼ N(0, η 2 ), i.e., the noise on each entry follows an i.i.d. Gaussian distribution with zero mean and variance η 2 . Furthermore, let Y Ω := (Y i,j ) (i,j)∈Ω ∈ R n denote the vector of noisy observations, and let X Ω c be the vector of unobserved matrix entries, where Ω c := ([m 1 ] × [m 2 ]) \ Ω is the set of unobserved indices.
With this framework, the desired goal of uncertainty quantification (UQ) can be made more concrete. Given noisy observations Y Ω , we wish to not only estimate the unobserved matrix entries X Ω c , but also quantify a notion of uncertainty on both observed or unobserved entries (since observation noise is present).

SMG model
We adopt the following SMG model for the low-rank matrix X, which we assume to be normal with a zero mean.
Definition 2.1 (SMG model, Definition 2.4.1 of Gupta and Nagar, 1999). Let Z ∈ R m1×m2 be a random matrix with entries Z i,j The random matrix X has a singular matrix-variate Gaussian (SMG) distribution if We will denote this as X ∼ SMG(P U , P V , σ 2 , R).
In other words, a realization from the SMG distribution can be obtained by first (i) simulating a matrix Z from a Gaussian ensemble with variance σ 2 , i.e., a matrix with i.i.d. N (0, σ 2 ) entries, then (ii) performing a left and right projection of Z using the projection matrices P U and P V . Recall that the projection operator P U = U U T ∈ R m1×m1 maps a vector in R m1 to its orthogonal projection on the R-dimensional subspace U spanned by the columns of U . By performing this projection, the resulting matrix X = P U ZP V can be shown to be of rank R < m 1 ∧ m 2 , with its row and column spaces U and V corresponding to the subspaces for P U and P V . The matrix X also lies in the . With a small choice of R, this provides a flexible probabilistic model for the low-rank matrix X.
The SMG distribution provides several appealing properties for modeling low-rank matrices. First, it provides a prior modeling framework on the matrix X involving its row and column subspaces U and V. It is known from Chikuse (2012) that, for each projection operator P ∈ R m×m of rank R, there exists a unique R-dimensional hyperplane (or an R-plane) in R m containing the origin which corresponds to the image of such a projection. It connects the space of rank R projection matrices and the Grassmann manifold G R,m−R , the space of R-planes in R m . Viewed this way, the projection matrices parametrizing X ∼ SMG(P U , P V , σ 2 , R) encode useful information on the row and column spaces of X. Second, since the projection of a Gaussian random vector is still Gaussian, the left-right projection of the Gaussian ensemble Z results in each entry of X being Gaussian-distributed as well. It is useful for deriving a UQ property of the BayeSMG model.
We now show several distributional properties of the SMG model: (a) The density of X is given by where etr(·) := exp{tr(·)}.
(b) Consider the block decomposition of P V ⊗ P U : Conditional on the observed noisy entries Y Ω , the unobserved entries Here, γ 2 = η 2 /σ 2 , and where ⊗ is the Kronecker product, and (2.5) Remark. Lemma 2.1 reveals two key properties of the SMG model. First, prior to observing data, part (a) shows that the low-rank matrix X lies on the space T , and follows a degenerate multivariate Gaussian distribution with mean zero and covariance matrix σ 2 (P V ⊗ P U ). Second, after observing the noisy entries Y Ω , part (b) shows that the conditional distribution of X Ω c (the unobserved entries in X) given Y Ω is still multivariate Gaussian, with closed-form expressions for its mean vector X P Ω c and covariance matrix Σ P Ω c in (2.4).

Can we directly use the SMG model for UQ?
Lemma 2.1 provides a closed-form posterior distribution for the low-rank matrix X after observing the noisy observations Y Ω . It points to a potential way for computing confidence intervals on each entry in X, assuming the underlying row and column subspaces U and V are known. Of course, in practice, such subspaces are never known with certainty. One solution might be to plug in point estimates of U and V (estimated from data) within the predictive equations in Lemma 2.1, to directly estimate unobserved entries and their uncertainties. We investigate the efficacy of this plug-in approach via a simple numerical example.
The simulation set-up is as follows. Let m = m 1 = m 2 = 8 be the row and column dimensions of the matrix, and let R = 2 be its rank. We first simulate two random orthonormal matrices U and V of size m × R, via a truncated SVD on an m × m matrix with i.i.d. U[0, 1] entries. With P U = U U T and P V = V V T , the "true" lowrank matrix is then simulated from the SMG model X ∼ SMG(P U , P V , σ 2 = 1, R = 2). Finally, noisy observations are sampled via (2.1) with noise variance η 2 = 0.5 2 . In total, 36 entries are observed (56.25% of total entries), with such entries chosen uniformly at random. From this, we can obtain point estimates of the subspaces U and V, by first estimating X via nuclear norm minimization (Candès and Plan, 2010), a popular method for matrix completion, and then taking the row and column subspaces for this matrix estimate via SVD. These subspace estimates are then plugged into the expressions in Lemma 2.1 for UQ. This process is then replicated for 50 times. Figure 1(a) plots, for a representative simulation run, the point estimates and 95% plug-in confidence intervals (CIs) for each matrix entry using Lemma 2.1, with its corresponding true value marked in red. We see that these intervals provide poor coverage performance since many of the true matrix entries are not within these intervals. For this replication, the coverage ratio is only 43.8%, and across the 50 replications, the average coverage ratio is only 46.1%, meaning only around half of the confidence intervals cover the true entries. This poor coverage suggests that this CI approach (with plug-in subspace estimates) can significantly underestimate the underlying uncertainty of point estimates, which is unsurprising since uncertainty for subspace estimation is not incorporated when using Lemma 2.1. Figure 1(b) plots, for a representative simulation, the point estimates and 95% posterior predictive intervals using the proposed BayeSMG method, which accounts for subspace uncertainty by assigning hierarchical priors on subspaces U and V from the SMG model. We see that our approach yields much better coverage: the 95% intervals, which are now slightly wider, cover the true matrix entries well. For this replication, the coverage ratio is at 95.3%, and across the 50 replications, the average coverage ratio is 93.9%, which is much closer to the nominal coverage rate of 95% than the earlier plug-in approach. This shows the proposed method can indeed provide better uncertainty quantification of X via a fully-Bayesian model specification on matrix subspaces.

Model specification
We now present the hierarchical specification for the proposed Bayesian SMG model, or BayeSMG for short. We begin by first introducing the matrix von Mises-Fisher (vMF) distribution, which will serve as prior models for the row and column orthonormal frames U and V . We then present a Gibbs sampling algorithm that makes use of a reparametrization of the SMG model for efficient posterior sampling.
The matrix von Mises-Fisher distribution (Khatri and Mardia, 1977;Mardia and Jupp, 2009) provides a useful class of distributions on the row and column frames, which lie on a so-called Stiefel manifold. A Stiefel manifold (Chikuse, 2012) consists of all orthonormal subspaces of rank R in the space of R m ; this is denoted as V R,m hereafter. The matrix vMF distribution assumes the following probability density function of matrix W on V R,m : where 0 F 1 (; ·; ·) is the hypergeometric function, and F ∈ R m×R is the concentration matrix. We denote this distribution by W ∼ MF(m, R, F ). The matrix vMF distribution provides conditionally conjugate priors for a wide range of multivariate models, including cluster analysis (Gopal and Yang, 2014) and factor models (Hoff, 2013). One appeal of this class of distribution is that it can be efficiently sampled. Hoff (2009)  We will leverage this useful family of priors via the following reparametrization of the BayeSMG model.
The following proposition gives a nice reformulation of the SMG model under uniform subspace priors on U and V: k=1 not necessarily in decreasing order. Then: 1. The singular vectors U and V follow independent priors MF(m 1 , R, 0) and MF(m 2 , R, 0), respectively.
2. The singular values diag(D) = (d k ) R k=1 follow the repulsed normal distribution, with density: The proof of this proposition is provided in the supplementary material (Yuchi et al., 2022). The first part of the proposition shows that the use of uniform priors on the projection matrices P U and P V corresponds to independent MF(m 1 , R, 0) and MF(m 2 , R, 0) priors for the singular vectors U and V , which are uniform priors on the Stiefel manifolds V R,m1 and V R,m2 , respectively. The second part shows that the singular values in D follow the repulsed normal distribution, which is closely connected with the distribution of singular values for a Gaussian ensemble (Shen, 2001).
This proposition then motivates the following reparametrization of the BayeSMG model: is the repulsed normal distribution in (3.2), and the priors on U , V and D are independently specified. When little is known a priori on matrix subspaces, one can set the concentration matrices as F 1 = F 2 = 0, which provides non-informative priors on U and V . In problems where some prior information is available on matrix subspaces, one can elicit a good choice of prior parameters for the vMF priors via a moment matching approach (Wang and Zhou, 2009). We show in the next section that this reparametrization allows for a Gibbs sampling algorithm which makes use of conditionally conjugate priors for efficient posterior sampling.
Finally, we complete the Bayesian specification by assigning the following priors on the variance parameters σ 2 and η 2 : where IG(α, β) is the Inverse-Gamma distribution with shape and rate parameters α and β. Table 1 summarizes the full Bayesian model specification for BayeSMG.

Posterior sampling
Using the reparametrized model (3.3), we now present a subspace Gibbs sampler for posterior sampling on the BayeSMG model, specifically on the parameters Θ = {U , D, V , σ 2 } given partial and noisy observations Y Ω . We first introduce the sampler under complete observation of the noisy matrix Y , then describe a data imputation procedure for posterior sampling under partial observations Y Ω .
Consider first the setting where complete observations on Y are obtained. It can then be shown (see supplementary material for a full derivation) that the full conditional distributions of U , D, V and σ 2 take the form: (3.5) One can then perform the above full conditional updates cyclically for posterior sampling on [Θ|Y ] via Gibbs sampling. These full conditional sampling steps are related to the Gibbs sampler proposed in Hoff (2007) for probabilistic SVD. As mentioned previously, there are efficient sampling algorithms for the matrix vMF distribution (Hoff, 2009;Jauch et al., 2020), which enable efficient full conditional sampling on U and V . The full conditional distribution of D follows the aforementioned repulsed normal distribution with a location shift of μ (denoted as RN (μ, δ 2 )), with density: where d k > 0, k = 1, · · · R. We have found that this can be quite efficiently sampled via a Metropolis-Hastings sampler (Metropolis et al., 1953), with an "independent" proposal distribution (i.e., independent of the current state) set as a non-central, multivariate t-distribution with mean vector μ and scale parameter δ.
Consider now the setting where only partial noisy observations Y Ω are available. We describe a posterior sampling algorithm for [Θ|Y Ω ], which makes use of a modification on the above Gibbs sampler on [Θ|Y ]. The idea is to first sample from the joint distribution [Θ, Y Ω c |Y Ω ] of both the parameters Θ and unobserved noisy entries Y Ω c , then take only the marginal samples of parameters Θ. With an initialization of Θ = Θ , the joint distribution [Θ, Y Ω c |Y Ω ] can be sampled via the following Gibbs sampling steps: Since the missing entries Y Ω c is assumed to be conditionally independent of the observed entries Y Ω given X = U DV T , this is equivalent to sampling [Y Ω c |X], which amounts to simulating the observation noise in Y Ω c given ground truth X Ω c .

Gibbs sampling:
T -total samples Step (i) can be viewed as a data imputation step, which imputes missing entries in the noisy matrix Y .
Step (ii) performs the earlier posterior sampling steps for parameters Θ given the full noisy matrix Y .
It is worth noting that step (i) depends on an implicit assumption that the entries are either missing completely at random (MCAR) or missing at random (MAR); see Little and Rubin (2019) for further discussion on missing data modeling. When the entries are missing not at random (MNAR), the sampling of [Y Ω c |Y Ω , Θ ] can become much more complicated, since it would depend on the underlying MNAR mechanism for missing entries. One approach is to adopt a probabilistic model for the MNAR entries (see, e.g., Hernández-Lobato et al., 2014 for one such model), then sample [Y Ω c |Y Ω , Θ ] given this model. There are, however, several limitations to this approach: (i) the conditional distribution [Y Ω c |Y Ω , Θ ] may be computationally expensive to sample from in the MNAR setting, and (ii) in the case of misspecification for the MNAR model, the resulting recovery of the matrix X can be highly biased and inaccurate. In the absence of prior information on how matrix entries are missing (which is the case in many applications), it may be preferable to adopt Algorithm 1 for posterior inference. We will show later (in Section 5.2) that the BayeSMG is empirically robust to mild violations of this implicit MAR assumption for missing entries.
Algorithm 1 summarizes the above steps for the posterior sampling algorithm. The algorithm is first initialized with estimates U [0] , D [0] , and V [0] obtained from a nuclearnorm completion of X (Carson et al., 2012), and σ 2 [0] is randomly initialized from the prior (3.4). Next, the missing noisy entries Y Ω c are imputed using step (i), then a posterior draw is made using step (ii) via the Gibbs steps in (3.5). This is then iterated until a desired number of posterior samples is obtained. Using the posterior samples of (U [t] , D [t] , V [t] ) at each iteration t, we can obtain a sample X [t] = U [t] D [t] V T [t] from the desired posterior distribution [X|Y Ω ]. These posterior samples {X [t] } T t=1 can then be used for the target goal of uncertainty quantification: the mean of such samples provides a point estimateX for the recovered matrix, and its variability aroundX provides a measure of uncertainty for this recovery.
While the computational complexity of this algorithm is difficult to establish given the complex manifold sampling steps, we found this posterior sampler to be quite efficient and scalable in practice. For a relatively large 256 × 256 matrix, the sampler takes around 1 minute to generate T = 1000 samples on a standard laptop computer (Intel i7 CPU and 16GB RAM), which is quite efficient given the size of the matrix. We will report computation times for larger matrices in the numerical studies later.

Inference on matrix rank
The BayeSMG model as presented above assumes the rank of the matrix X is known, which is often not the case in practice. There has been some literature on this problem of rank estimation for matrix inference. Shapiro et al. (2018) investigates a lower bound of the matrix rank needed for the matrix completion problem to be stable. Hoff (2007) proposes a Bayesian dimension selection method that models the dimension of matrix subspaces via a singular value decomposition (SVD), thus allowing for a Gibbs sampler for sampling the matrix singular vectors, singular values, and rank. While one can conceptually adopt a similar fully Bayesian approach for rank R here, we have found such an approach to be too computationally expensive for the high-dimensional matrices in later numerical experiments, where m 1 and m 2 can be on the order of thousands. This is because Algorithm 1 needs to be performed for each choice of rank R, which can be expensive for large m 1 and m 2 . For such high-dimensional applications, we instead favor the following maximum a posteriori (MAP) approach for rank inference, which sacrifices a richer quantification of uncertainty for computational efficiency and scalability.
Consider the MAP estimate of the unknown matrix X, which can formulated as: (3.7) Here, [X|R] follows the BayeSMG prior specification (3.3) given matrix rank R, and [R] is a prior distribution assigned on matrix rank. Under uniform subspace priors and a flat prior on R over {1, · · · , m 1 ∧ m 2 }, it can be shown (see Section 4.1 for a full derivation) that the MAPX can be well-approximated by the nuclear-norm formulation: Here, X * is the nuclear norm of X (the sum of its singular values, see Candès and Tao, 2010), and λ is a regularization parameter. The optimization problem (3.8) can be efficiently solved via convex optimization algorithms (see Section 1.1 for further details).
In practice, λ can be estimated via cross-validation (Friedman et al., 2017) on the observed entries Y Ω . We first divide these entries into multiple folds. For each fold, we first use nuclear-norm minimization (3.8) to estimate the entries of the particular fold. Then we compute the cross-validation error for these estimates. We then select the optimal tuning parameter λ * such that it is the λ that minimizes the sum of these cross-validation errors for all folds.
With this estimate λ * , an (approximate) MAP estimateX can be obtained by solving (3.8) with λ = λ * . This in turn yields an approximate MAP estimate of R via the rank of the matrix estimateX. Finally, this rank estimate can be plugged into Algorithm 1 for uncertainty quantification on matrix X. For high-dimensional problems with either m 1 or m 2 large, this plug-in MAP approach for rank estimation can yield significant computational savings over a fully Bayesian treatment.

Theoretical insights
We now provide some theoretical insights on the BayeSMG model. We first discuss an interesting link between the maximum-a-posterior (MAP) estimator and regularized estimators in the literature, then present a connection between model uncertainty from the BayeSMG model and coherence, which is then used to prove an error monotonicity result on uncertainty quantification.

Connection to regularized estimators
The following lemma reveals a connection between the BayeSMG model and existing completion methods: Table 1, with F 1 = F 2 = 0, η 2 and σ 2 fixed, and a uniform prior on rank R. Conditional on Y Ω , the MAP estimator for X becomes

Lemma 4.1 (MAP estimator). Assume the BayeSMG model in
The MAP estimatorX in (4.1) connects the proposed model with existing deterministic matrix completion methods (see Davenport and Romberg, 2016 and references therein). Consider the following approximation to the MAP formulation (4.1). Treating log(2πσ 2 )rank 2 (X) as a Lagrange multiplier, one can view this as a constraint on rank 2 (X), or equivalently, on rank(X). Replacing rank(X) by its nuclear norm X * (its tightest convex relaxation, see Keshavan et al., 2010), and treating this new constraint as a Lagrange multiplier, the optimization in (4.1) becomes: for some choice of λ > 0 and α ∈ (0, 1). Using (4.2) to approximate (4.1), we can then view the MAP estimator as an analogue of the elastic net estimator (Zou and Hastie, 2005) from linear regression for noisy matrix completion.
To see the connection between the MAP estimatorX and existing matrix completion methods, set α = 1 in (4.2). The problem then reduces to the nuclear-norm formulation in (3.8), which is widely used for deterministic matrix completion (Candès and Recht, 2009;Candès and Tao, 2010;Recht, 2011). This provides an intuitive connection between the proposed Bayesian model and existing completion methods, which we leveraged earlier for efficient inference on matrix rank.

Uncertainty and coherence
Consider next the following definition of subspace coherence from Candès and Recht (2009), ignoring scaling factors: Definition 4.1 (Coherence, Definition 1.2 of Candès and Recht, 2009). Let U ∈ G R,m−R be an R-plane in R m , and let P U be the orthogonal projection onto U. The coherence of subspace U with respect to the i-th basis vector, e i , is defined as μ i (U) := P U e i 2 2 , and the coherence of U is defined as μ(U) = max i=1,...,m μ i (U).

In words, coherence measures how correlated a subspace U is with the basis vectors
suggests that U is highly correlated with the i-th basis vector e i , in that the projection of e i onto U preserves much of its original length; a small value of μ i (U) suggests that U is nearly orthogonal with e i , so a projection of e i onto U loses most of its length. Figure 2 visualizes these two cases using the projection of three basis vectors on a two-dimensional subspace U. Note that the projection of the red vector onto U retains nearly unit length, so U has near-maximal coherence for this basis. The projection of the black vector onto U results in a considerable length reduction, so U has near-minimal coherence for this basis. The overall coherence of U, μ(U), is largely due to the high coherence of the red basis vector.
In matrix completion literature, coherence is widely used to quantify the recoverability of a low-rank matrix X. Here, the same notion of coherence arises in a different context within the proposed model's uncertainty quantification. Lemma 2.1 provides the basis for this connection. Consider first the case where no matrix entries have been observed. From Lemma 2.1(a), vec(X) follows the degenerate Gaussian distribution N {0, σ 2 (P V ⊗ P U )}. The variance of the (i, j)-th entry in X can then be shown to be: Hence, before observing data, the model uncertainty for entry X i,j is proportional to the product of coherences for the row and column spaces U and V, corresponding to the i-th and the j-th basis vectors. Put another way, BayeSMG assigns greater variation to matrix entries with higher subspace coherence in either its row or column index. It is quite appealing given the original role of coherence in matrix completion, where larger row (or column) coherences imply greater "spikiness" for entries; our framework accounts for this by assigning greater model uncertainty to such entries.
Consider next the case where noisy entries Y Ω have been observed. Let us adopt a slightly generalized notion of coherence:

Definition 4.2 (Cross-coherence). The cross-coherence of subspace U with respect to the basis vectors e i and e i is defined as
The cross-coherence ν i,i (U) quantifies how correlated the basis vectors e i and e i are, after a projection onto U. For example, in Figure 2, the pair of red / blue projected basis vectors have negative cross-coherence for U, whereas the pair of blue / black projected vectors have positive cross-coherence. When i = i , this cross-coherence reduces to the original coherence in Definition 4.1.

Define now the cross-coherence vector
. From equation (2.4) in Lemma 2.1, the conditional variance of entry X i,j for an unobserved index (i, j) ∈ Ω c becomes: where ν i,j := ν i (U) • ν j (V), and • denotes the entry-wise (Hadamard) product. The expression in (4.4) yields a nice interpretation. From a UQ perspective, the first term in (4.4), μ i (U)μ j (V), is simply the unconditional uncertainty for entry X i,j , prior to observing data. The second term, ν T i,j [R N (Ω) + γ 2 I] −1 ν i,j , can be viewed as the reduction in uncertainty, after observing the noisy entries Y Ω . This uncertainty reduction is made possible by the correlation structure imposed on X, via the SMG model; (4.4) also yields valuable insight in terms of subspace correlation. The first term μ i (U)μ j (V) can be seen as the joint correlation between (i) row space U to row index i, and (ii) column space V to column index j, prior to any observations. The second term can be viewed as the portion of this correlation explained by observed indices Ω.

Error monotonicity
This link between coherence and uncertainty then sheds insight on expected error decay. This is based on the following proposition: Table 1, with F 1 = F 2 = 0 and fixed σ 2 and η 2 . Let Y Ω contain the noisy entries at Ω ⊆ [m 1 ] × [m 2 ], and let Y Ω∪(i,j) contain an additional noisy observation at (i, j) ∈ Ω c . For any index (k, l) ∈ [m 1 ] × [m 2 ], the expected variance of X k,l can be decomposed as

Proposition 4.1 (Variance reduction). Suppose X follows the BayeSMG model in
where Var(X k,l |Y Ω∪(i,j) ) is provided in (4.4), and Remark. Proposition 4.1 shows, given observed indices Ω, the reduction in uncertainty (as measured by variance) for an unobserved entry X k,l , after observing an additional entry at index (i, j). The last term in (4.5) quantifies this reduction, and can be interpreted as follows. For an unobserved index (k, l) / ∈ Ω ∪ (i, j), the amount of uncertainty reduction is related to the "signal-to-noise" ratio, where the signal is the conditional squared-covariance between the "unobserved" entry X k,l and the "to-be-observed" entry X i,j , and the noise is the conditional variance of the "to-be-observed" entry.
The insight of error monotonicity then follows: Then 2 N +1 (k, l) ≤ 2 N (k, l) for any (k, l) ∈ [m 1 ] × [m 2 ] and N = 1, 2, · · · . Remark. This corollary shows that, for any sampling sequence and any index (k, l), the expected squared-error in estimating X k,l with the conditional mean X P k,l is always monotonically decreasing as more samples are collected. This is intuitive since one expects to gain greater accuracy and precision on the unknown matrix X as more entries are observed. The fact that the proposed model quantifies this monotonicity property provides a reassuring check on our UQ approach.

Numerical experiments
We now investigate the performance of the proposed BayeSMG method in numerical experiments and compare it to the BPMF method (Salakhutdinov and Mnih, 2008), a popular Bayesian matrix completion method in the literature.

Synthetic data
For the first numerical study, we assume the true matrix X ∈ R 24×24 is generated from the SMG distribution, i.e., as X ∼ SMG(P U , P V , σ 2 = 1, R = 2), with uniformly sampled subspaces U and V. The entries are assumed to be missing-at-random and the observed entries are contaminated by noise with a variance η 2 = 0.05 2 , which we presume to be known. The prior specifications are as follows. For BayeSMG, we assign a weakly-informative prior σ 2 ∼ IG(0.01, 0.01) on the variance parameter σ 2 , with non-informative manifold hyperparameters F 1 = F 2 = 0. For BPMF, we assign the recommended weak Inverse-Wishart priors on covariance matrices Σ M ∼ IW(R = 2, I), Σ N ∼ IW(R = 2, I). We then ran 10,000 MCMC iterations for both methods, with the first 2,000 samples taken as burn-in. Standard MCMC convergence checks were performed via trace plot inspection (see Figure 3 (b)) and the Gelman-Rubin statistic (Gelman and Rubin, 1992).
We employ two metrics to compare the posterior contraction and UQ performance of these two methods. The first is the Mean Frobenius Error (MFE), defined as The MFE calculates the Frobenius norm of the difference between the posterior predictive samples {X [t] } T t=1 and the original matrix X. A smaller MFE suggests better recovery and faster posterior contraction for the desired matrix X. The second metric is the Mean Spectral Distance (MSD), defined as where U (or U ) is any frame in subspace U (or U ). The MSD calculates the spectral distance (Calderbank et al., 2015) between the posterior samples {U [t] } T t=1 for the row subspaces (equivalently, {V [t] } T t=1 for the column subspaces) and the true row subspace U (equivalently, the true column subspace V). A smaller MSD suggests better recovery and posterior contraction for matrix subspaces.
The first two plots in Figure 3(a) visualize the true matrix X and the observed Y Ω , with 20% of the entries observed uniformly-at-random. Here, the rank R is estimated via the approximate MAP approach in Section 3.3. The two subsequent plots visualize the posterior mean estimates for X using BayeSMG and BPMF. We see that the BayeSMG method provides visually better recovery of the matrix X, with a lower posterior MFE than the BPMF method. The first two plots in Figure 3(b) visualize the true and estimated row spaces using BayeSMG and BPMF. We again see that BayeSMG gives a visually better recovery of the row space of X (the same holds for its column space), with a lower posterior MSD than BPMF. The next two plots show the trace plots for the first-row coherence μ 1 and the first matrix entry X 1,1 , which is unobserved. We see that the posterior samples for BayeSMG concentrate tightly around the true coherence and matrix values, whereas the posterior samples for BPMF fluctuate much more around the truth. The above observations suggest that when the matrix is generated from the assumed prior model, BayeSMG yields much faster posterior contraction than BPMF, leading to more accurate and precise estimates of X and its subspaces. Next, we will show in the following image recovery and seismic sensor applications that the BayeSMG method provides similar improvements over BPMF via modeling and integrating subspace information.

Image inpainting
Image inpainting is a fundamental problem in image processing (Bertalmio et al., 2000;Cai et al., 2010), which aims to recover and reconstruct images with missing pixels and noise corruption. It appears in numerous applications where image data are susceptible to unreliable data transmission and scratches. Take, for example, the problem of solar imaging (Xie et al., 2012). When a satellite transmits an image of the sun back to the earth, many pixels will inevitably be lost or corrupted due to the instabilities in the transmission process. The missing pixels would become a problem when the image is scaled up. In this case, the quantification of image uncertainty can be as important as the recovery, since this UQ provides insight into the quality of recovered image features in different regions. There has been some work on applying deterministic matrix completion methods for image in-painting (e.g., Xue et al., 2017), but little has been done on uncertainty quantification. Our method addresses the latter goal.
We consider the aforementioned solar imaging problem, where the matrix X is a 256 × 256 image solar flare. The pixel intensity value is encoded from 0 to 255 and represents the use of pseudo-color in the images. We then normalize pixel intensities to have zero mean and unit variance. Half of the pixels in this image are observed uniformly at random, then corrupted by Gaussian noise η 2 = 0.05 2 . We note that, for this problem, the recovery and UQ of the row and column subspaces are of interest as well. This is because image features are often represented in the row and column spaces. Here, these subspaces may represent domain-specific, interpretable phenomena, such as different classes of solar flares, certain shapes, and sunspots. Furthermore, human eyes are typically not as sensitive to high-frequency image features; therefore, a few SVD components can often capture the vital features of an image, making its rank low. For BayeSMG and BPMF, we estimate the rank to be R = 18 following the approximate MAP approach in Section 3.3, and perform 1,000 iterations of MCMC, with a burnin period of 200. As before, MCMC convergence checks were performed via trace plot inspection and standard diagnostics. Figure 4 shows the original solar image, its partial observations, and the recovered image using BayeSMG and BPMF via its posterior predictive mean, as well as its corresponding uncertainties via its 95% highest posterior density (HPD) interval width (Hyndman, 1996). We see that the BayeSMG method provides a much better recovery, with a noticeably lower MFE of 31.0 compared to the BPMF method (350.8). Visually, we see that the BayeSMG recovery captures the key features of the image, e.g., different types of solar flares. The BPMF recovery, on the other hand, loses much of the smallerscale features and contains significant blocking defects. One plausible explanation is when a low-rank subspace structure is present in X (as is the case here), the proposed method can better learn and integrate this structure for improved recovery. Apart from that, an inspection of the HPD plots shows that the BayeSMG provides more accurate estimates of the recovered image, with narrow posterior HPD intervals across the whole matrix. In contrast, the BPMF is much more uncertain of its recovery: its entry-wise posterior density intervals are considerably larger, particularly for pixels with low intensities. Computation-wise, the posterior sampling for BayeSMG can be carried out within one minute on a standard laptop (Intel i7 processor with 16GB RAM), which is quite fast considering the relatively large image size.
Additionally, we study the effect of noise on BayeSMG performance. We consider the same solar image problem, where half of the normalized matrix entries are observed and corrupted with noise. We then tested Gaussian errors with various variances η 2 = 0.05 2 , 0.1 2 , 0.3 2 , and 0.5 2 . Figure 5 shows the recovered images and the posterior estimateη of the noise standard deviation in each case. The MFE for the four  cases are 31. 00, 35.39, 57.48 and 75.83, respectively. The quality of recovery improves as noise decreases, which is as expected. For small to moderate noise levels, we see that BayeSMG yields a good recovery of the solar flare image, suggesting that it is quite robust to noise. In all four cases, the posterior estimateη is slightly larger than the actual noise standard deviation η. One reason may be that the estimated noise level η captures both the true error, as well as small variations in estimating the low-rank matrix X from the few observed entries. This difference becomes smaller as η increases, which is unsurprising since the error variance would dominate the underlying low-rank matrix signal.
To demonstrate the scalability of BayeSMG, we consider next a much higher-dimensional image of the Georgia Tech campus. This image is converted to a gray-scale matrix of size 1911×3000 and standardized to zero mean and unit variance. As before, half of the pixels are observed uniformly at random, then corrupted by a Gaussian noise η 2 = 0.05 2 . To reduce computation time for posterior sampling, we fix the rank as R = 30 for both BayeSMG and BPMF, instead of estimating the rank using the procedure in Section 3.3. We run the MCMC sampler for 500 iterations after a burn-in period of 100. Figure 6 shows the true image, its partial observations, and the recovered image from BayeSMG as well as its corresponding uncertainty. The MFE of this recovery is Figure 7: Performance of BayeSMG on MNAR image pixels. In the first row, the first image is the original matrix, the second is the noisy matrix with entries sampled uniformly at random (MAR), and the third is its recovery estimate via the posterior mean of BayeSMG. In the second row, the first image is the noisy matrix with entries sampled MNAR, and the second image is its recovery estimate via BayeSMG.
1005.1, which is again noticeably smaller than that for the BPMF recovery (3004.8). We see that the recovered BayeSMG image captures the original image's main features, which shows that the proposed method can learn and integrate the subspace structure for recovery. As before, the BayeSMG is quite confident of this completion, with narrow posterior HPD intervals over all pixels. Despite this being a much larger image, we can still carry out BayeSMG on the same standard laptop, albeit with a time of close to two hours. It suggests that the proposed method can yield effective probabilistic matrix recovery in high-dimensional settings.
Recall from Section 3.2 that the proposed posterior sampler for BayeSMG implicitly assumes the matrix entries are missing at random. To see how robust BayeSMG is to slight deviations from this MAR assumption, we investigate the recovery performance of BayeSMG for a 256 × 256 lighthouse image, where the entries are missing in a notat-random setting. In particular, we consider the MNAR case where image pixels with a higher intensity value (i.e., darker) are more likely to be observed, and pixels with a lower intensity value (i.e., lighter) are more likely to be missing. Here, 40% of the entries with intensities higher than the population median are observed randomly, 25% of entries with intensities equal to the median are observed randomly, and 10% of the remaining entries are observed randomly. Overall, around 25.1% of image pixels are observed using this scheme, but the probability of missing for a single pixel depends on the true pixel intensity. Figure 7 shows the sampled image pixels for this MNAR setting with its corresponding image recovery via the posterior mean of the BayeSMG method. For comparison, we also show the sampled pixels under an MCAR setting (where every entry is observed independently with a probability of 25%), with its corresponding image recovery via BayeSMG. We estimate the ranks in both scenarios via the procedure in Section 3.3. For the MNAR case, the MFE is 154.35, compared to an MFE of 148.33 for the MCAR case. While the error is slightly higher for the MNAR case (around 4% larger), we see from Figure 7 that there is little discernible difference visually between the recovered images in both cases. It suggests that the proposed BayeSMG sampler appears to be quite robust to mild violations of the implicit missing-at-random assumption for Al- gorithm 1. However, if prior information on the MNAR nature of the missing entries is known, then we can integrate such information within BayeSMG, yielding further improvements in recovery performance (see Section 3.2).

Seismic sensor network recovery
Seismic imaging is applied widely for finding oil and natural gas beneath the surface of the earth. Ambient Noise Seismic Imaging (ANSI) (Bensen et al., 2007) is a relatively new technique for seismic imaging with great potential. It uses "ambient noises" instead of actively collected signals and is non-invasive to the environment (compared to the traditional active imaging techniques). ANSI has proved useful for imaging shallow earth structures; it utilizes the pairwise cross-correlation function between signals recorded by seismic sensors followed by time-frequency analysis. From these cross-correlations, we can determine the time delay between each pair of sensors. These pairwise time delays are then combined into a data matrix, which is useful for further seismic studies. In a recent study (Xu et al., 2019) on the Old Faithful geyser at Yellowstone National Park, 133 sensors were deployed in its vicinity to collect ambient noise signals for investigating geological structures. Figure 8(a) shows the locations of these sensors.
One shortcoming of ANSI, however, is that many pairwise cross-correlations do not contain identifiable signals. In other words, the peak in the cross-correlation is unobserved since ANSI works on weak ambient noises. This missing data then results in missing entries in the 133 × 133 data matrix. To determine whether a cross-correlation is "missing", we first identify which correlations have an unsatisfactory signal-to-noise ratio (SNR), by inspecting the standard deviation ξ outside of the main wave lobe relative to the magnitude of the wave peak g. The correlation is deemed missing if g/ξ < 20. We note that entries on this cross-correlation matrix X are observed with Figure 9: Performance comparison between BayeSMG and BPMF on the ambient noise cross-correlation time delay data matrix. The first plot (from the left) shows the observed entries in the delay. matrix, with missing entries in white. The second plot shows the completed matrix via the posterior mean from BayeSMG. The third and fourth plots visualize the widths of the entry-wise 95% HPD intervals from BayeSMG and BPMF. noise due to background vibrations caused by bubble collapse and boiling water. Here, the standard deviation of the noise is estimated to be η = 0.05 from an inspection of sensor readings during the period when only noise signals are present; this is then used to initialize η in the Gibbs sampler. Figure 9 shows the observed noisy matrix entries Y Ω .
To proceed with ANSI analysis, one would then need to estimate missing entries in the delay data matrix X. Bensen et al. (2007) shows that such a matrix is indeed lowrank. Here, uncertainty quantification is crucial for estimating geologic structure and identifying the source of activities. With this uncertainty, engineers can better interpret the wave tomography generated from time delay estimates, and identify parts where estimates are accurate and where they are not. This in turn impacts the accuracy of analysis downstream, which subsequently provides greater insight into reconstruction quality. Figure 9 visualizes the recovery and UQ performance from BayeSMG and BPMF, using an estimated rank of R = 15 via the approach in Section 3.3. We see that the BayeSMG yields much more precise estimates (i.e., narrower HPD interval widths) compared to the BPMF. In particular, when an entire row or column of X is missing, the uncertainties returned by BPMF can be very high, which reduces the usefulness of its recovered entries. On the contrary, the proposed BayeSMG method, by leveraging subspace information, can yield more precise inference on these missing rows and columns. One underlying reason is that the BayeSMG approach explicitly integrates subspace modeling for recovery and UQ. From the visualization of Y Ω in Figure 9, we see that there are clearly-seen bright stripes on the left and top edges of the plot, which strongly suggests the presence of low-rank subspaces in X. It is not a surprise since we know several sensors have highly correlated signals due to their proximity. The BayeSMG appears to exploit this subspace structure to provide more confident predictions. The BPMF yields much higher uncertainty in inference, particularly in rows and columns with little to no observations. While the ground truth for the entire matrix X is not known for this sensor network, we would expect from previous experiments that the BayeSMG yields improved recovery performance over the BPMF, particularly in rows and columns with few observations. With posterior samples on X in hand, we can then use its subspace information to locate (or match) a few sensors that contain highly correlated signals with each other. This sensor matching is helpful in seismology studies since we can use it to estimate the dimension and the capacity of the hydrothermal reservoir of the geyser (Wu et al., 2017). We first perform an SVD step on the posterior meanX, and find the singular vector with the largest singular value. We then inspect all the rows of the matrixX, and select the rows most aligned with this vector. We check these rows to locate the most significantly correlated sensors. Figure 8(b) shows the locations of the 12 most correlated sensors and their relative directions from each other. The identified sensors are among the closest to the Old Faithful geyser, and their related observations are dominated by the highly fractured and porous geological structure underground adjacent to the geyser. Using readings from these sensors, researchers can identify a different pattern of the waveform in tremor signals, which suggests a variety of geological structures underneath the geyser.

Conclusion
We proposed a new BayeSMG model for uncertainty quantification in low-rank matrix completion. A key novelty of the BayeSMG model is that it parametrizes the unknown matrix X via manifold prior distributions on its row and column subspaces. This Bayesian subspace parametrization allows for direct posterior inference on matrix subspaces, which we can use for improved matrix recovery. We introduced a Gibbs sampler for posterior inference, which provides efficient posterior sampling even for matrices with dimensions on the order of thousands. Additionally, we showed that BayeSMG provides a probabilistic interpretation for subspace coherence, which we can use to show an error monotonicity result for UQ. We then showed the effective recovery and UQ performance of BayeSMG on simulated data, image data, and an application for seismic sensor network recovery. Codes for the BayeSMG sampler with illustrative examples will be released in a package in MATLAB.
For future work, it would be interesting to design locations for observations to control the uncertainties, exploring the connection with experimental design literature, e.g., integrated mean-squared error designs (Sacks et al., 1989) or distance-based designs (Mak and Joseph, 2018). The exploration of this Bayesian uncertainty quantification for guiding sequential sampling of entries (see Mak et al., 2021) is also of interest. We would also like to investigate further the problem of rank estimation for matrix completion, including theoretical guarantees and an efficient fully Bayesian implementation, extending the work of Hoff (2007). Another interesting topic to explore is an extension of the i.i.d. Gaussian error assumption to account for skewed or spatially correlated errors.