Electronic Journal of Statistics

On estimation of the diagonal elements of a sparse precision matrix

Samuel Balmand and Arnak S. Dalalyan

Full-text: Open access

Abstract

In this paper, we present several estimators of the diagonal elements of the inverse of the covariance matrix, called precision matrix, of a sample of independent and identically distributed random vectors. The main focus is on the case of high dimensional vectors having a sparse precision matrix. It is now well understood that when the underlying distribution is Gaussian, the columns of the precision matrix can be estimated independently form one another by solving linear regression problems under sparsity constraints. This approach leads to a computationally efficient strategy for estimating the precision matrix that starts by estimating the regression vectors, then estimates the diagonal entries of the precision matrix and, in a final step, combines these estimators for getting estimators of the off-diagonal entries. While the step of estimating the regression vector has been intensively studied over the past decade, the problem of deriving statistically accurate estimators of the diagonal entries has received much less attention. The goal of the present paper is to fill this gap by presenting four estimators—that seem the most natural ones—of the diagonal entries of the precision matrix and then performing a comprehensive empirical evaluation of these estimators. The estimators under consideration are the residual variance, the relaxed maximum likelihood, the symmetry-enforced maximum likelihood and the penalized maximum likelihood. We show, both theoretically and empirically, that when the aforementioned regression vectors are estimated without error, the symmetry-enforced maximum likelihood estimator has the smallest estimation error. However, in a more realistic setting when the regression vector is estimated by a sparsity-favoring computationally efficient method, the qualities of the estimators become relatively comparable with a slight advantage for the residual variance estimator.

Article information

Source
Electron. J. Statist. Volume 10, Number 1 (2016), 1551-1579.

Dates
Received: June 2015
First available in Project Euclid: 31 May 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1464710241

Digital Object Identifier
doi:10.1214/16-EJS1148

Mathematical Reviews number (MathSciNet)
MR3507373

Zentralblatt MATH identifier
1342.62088

Subjects
Primary: 62H12: Estimation

Keywords
Precision matrix sparse recovery penalized likelihood

Citation

Balmand, Samuel; Dalalyan, Arnak S. On estimation of the diagonal elements of a sparse precision matrix. Electron. J. Statist. 10 (2016), no. 1, 1551--1579. doi:10.1214/16-EJS1148. https://projecteuclid.org/euclid.ejs/1464710241.


Export citation

References

  • E. D. Andersen and K. D. Andersen. The mosek interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In, High Performance Optimization, pages 197–232. 2000.
  • T. W. Anderson., An introduction to multivariate statistical analysis. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition, 2003.
  • O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data., J. Mach. Learn. Res., 9:485–516, June 2008.
  • A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models., Bernoulli, 19(2):521–547, May 2013.
  • A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: Pivotal recovery of sparse signals via conic programming., Biometrika, 98(4):791–806, 2011.
  • A. Belloni, V. Chernozhukov, and L. Wang. Pivotal estimation via square-root Lasso in nonparametric regression., Ann. Statist., 42(2):757–788, 2014a.
  • A. Belloni, M. Rosenbaum, and A. B. Tsybakov. An $l_1,l_2,l_\infty$-regularization approach to high-dimensional errors-in-variables models. Technical Report CREST, arxiv :1412.7216, 2014b.
  • P. Bühlmann and S. A. van de Geer., Statistics for High-dimensional data:methods, theory and applications. Springer series in statistics. Springer-Verlag Berlin Heidelberg, 2011.
  • T. Cai, W. Liu, and X. Luo. A constrained L1 minimization approach to sparse precision matrix estimation., Journal of the American Statistical Association, 106:594–607, February 2011.
  • T. Cai and W. Liu. A direct estimation approach to sparse linear discriminant analysis., J. Amer. Statist. Assoc., 106(496) :1566–1577, 2011.
  • T. Cai, W. Liu, and H. Zhou. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation., Ann. Statist., 44(2):455–488, 2016.
  • E. Candes and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n., Ann. Statist., 35(6) :2313–2351, December 2007.
  • T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein., Introduction to algorithms. MIT Press, Cambridge, MA, third edition, 2009.
  • A. S. Dalalyan and Y. Chen. Fused sparsity and robust estimation for linear models with unknown variance. In, Advances in Neural Information Processing Systems 25: NIPS, pages 1268–1276, 2012.
  • A. S. Dalalyan, M. Hebiri, K. Meziani, and J. Salmon. Learning heteroscedastic models by convex programming under group sparsity. In, Journal of Machine Learning Research – W & CP 28(3) (ICML 2013), page 379–387, 2013.
  • E. W. Dijkstra. A note on two problems in connexion with graphs., Numer. Math., 1:269–271, 1959.
  • R. A. Fisher. The use of multiple measurements in taxonomic problems., Annals of Eugenics, 7(2):179–188, 1936.
  • J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso., Biostatistics, 9(3):432–441, July 2008.
  • I. A. Ibragimov and R. Z. Has$'$minskiĭ., Statistical estimation, volume 16 of Applications of Mathematics. Springer-Verlag, New York-Berlin, 1981. Asymptotic theory, Translated from the Russian by Samuel Kotz.
  • V. Jarník., O jistém problému minimálním: (Z dopisu panu O. Boru̇skovi). Práce Moravské přírodovědecké společnosti. Mor. přírodovědecká společnost, 1930.
  • J. B. Kruskal, Jr. On the shortest spanning subtree of a graph and the traveling salesman problem., Proc. Amer. Math. Soc., 7:48–50, 1956.
  • J. Lafferty, H. Liu, and L. Wasserman. Sparse nonparametric graphical models., Statist. Sci., 27(4):519–537, 2012.
  • B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection., Ann. Statist., 28(5) :1302–1338, 2000.
  • S. L. Lauritzen., Graphical models, volume 17 of Oxford Statistical Science Series. The Clarendon Press, Oxford University Press, New York, 1996. Oxford Science Publications.
  • L. Le Cam and G. L. Yang., Asymptotics in statistics. Springer Series in Statistics. Springer-Verlag, New York, second edition, 2000. Some basic concepts.
  • J. Lederer. Trust, but verify: Benefits and pitfalls of least-squares refitting in high dimensions. Technical report, arXiv :1306.0113, 2014.
  • H. Liu and L Wang. Tiger: A tuning-insensitive approach for optimally estimating large undirected graphs. Technical report, arxiv :1412.7216, 2012.
  • H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs., J. Mach. Learn. Res., 10 :2295–2328, 2009.
  • L. Long, V. Carey, and R. Gentleman., RBGL: An interface to the BOOST graph library, 2016. URL http://www.bioconductor.org.
  • G. Marsaglia. Conditional means and covariances of normal variables with singular covariance matrix., Journal of the American Statistical Association, 59 (308) :1203–1204, 1964.
  • N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the lasso., Ann. Statist., 34(3) :1436–1462, June 2006.
  • Y. Nesterov., Introductory lectures on convex optimization: A basic course. Applied optimization. Kluwer Academic Publ., Boston, Dordrecht, London, 2004. ISBN 9781402075537.
  • R. C. Prim. Shortest connection networks and some generalizations., Bell System Technology Journal, 36 :1389–1401, 1957.
  • M. Riedmiller and H. Braun. Rprop – a fast adaptive learning algorithm. Technical report, Proc. of ISCIS VII), Universitat, 1992.
  • M. Rosenbaum and A. B. Tsybakov., Improved matrix uncertainty selector, volume Volume 9 of Collections, pages 276–290. Institute of Mathematical Statistics, 2013.
  • N. Städler, P. Bühlmann, and S. van de Geer. $\ell_1$-penalization for mixture regression models., TEST, 19(2):209–256, 2010a.
  • N. Städler, P. Bühlmann, and S. van de Geer. Rejoinder: $\ell_1$-penalization for mixture regression models., TEST, 19(2):280–285, 2010b.
  • T. Sun and C-H. Zhang. Scaled sparse linear regression., Biometrika, 99(4):879–898, September 2012.
  • T. Sun and C-H. Zhang. Sparse matrix inversion with scaled lasso., J. Mach. Learn. Res., 14 :3385–3418, November 2013.
  • R. Tibshirani. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society (Series B), 58:267–288, 1996.
  • M. Yuan. High dimensional inverse covariance matrix estimation via linear programming., J. Mach. Learn. Res., 11 :2261–2286, January 2010.
  • M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model., Biometrika, 94(1):19–35, 2007.