The Annals of Statistics

High-dimensional Ising model selection using 1-regularized logistic regression

Pradeep Ravikumar, Martin J. Wainwright, and John D. Lafferty
Source: Ann. Statist. Volume 38, Number 3 (2010), 1287-1319.

Abstract

We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on 1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an 1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

First Page: Show Hide
Primary Subjects: 62F12
Secondary Subjects: 68T99
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1268056617
Digital Object Identifier: doi:10.1214/09-AOS691
Zentralblatt MATH identifier: 05712422
Mathematical Reviews number (MathSciNet): MR2662343

References

[1] Abbeel, P., Koller, D. and Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. J. Mach. Learn. Res. 7 1743–1788.
Mathematical Reviews (MathSciNet): MR2274423
[2] Banerjee, O., Ghaoui, L. E. and d’Asprémont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
Mathematical Reviews (MathSciNet): MR2417243
[3] Bertsekas, D. (1995). Nonlinear Programming. Athena Scientific, Belmont, MA.
[4] Bresler, G., Mossel, E. and Sly, A. (2009). Reconstruction of Markov random fields from samples: Some easy observations and algorithms. Available at http://front.math.ucdavis.edu/0712.1402.
[5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
[6] Chickering, D. (1995). Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V (D. Fisher and H. Lenz, eds.). Lecture Notes in Statistics 112 121–130. Springer, New York.
Mathematical Reviews (MathSciNet): MR1473013
[7] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462–467.
[8] Cross, G. and Jain, A. (1983). Markov random field texture models. IEEE Trans. PAMI 5 25–39.
[9] Csiszár, I. and Talata, Z. (2006). Consistent estimation of the basic neighborhood structure of Markov random fields. Ann. Statist. 34 123–145.
[10] Dasgupta, S. (1999). Learning polytrees. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99). Morgan Kaufmann, San Francisco, CA.
[11] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of the Geometry of Banach Spaces 1 317–336. Elsevier, Amsterdam.
Mathematical Reviews (MathSciNet): MR1863696
Zentralblatt MATH: 1067.46008
Digital Object Identifier: doi:10.1016/S1874-5849(01)80010-3
[12] Donoho, D. and Elad, M. (2003). Maximal sparsity representation via 1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
Mathematical Reviews (MathSciNet): MR1963681
Zentralblatt MATH: 1064.94011
Digital Object Identifier: doi:10.1073/pnas.0437847100
[13] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6 721–741.
[14] Hassner, M. and Sklansky, J. (1980). The use of Markov random fields as models of texture. Comp. Graphics Image Proc. 12 357–370.
[15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
Mathematical Reviews (MathSciNet): MR144363
Zentralblatt MATH: 0127.10602
Digital Object Identifier: doi:10.2307/2282952
[16] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR832183
[17] Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31 253–258.
[18] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613–636.
[19] Kim, Y., Kim, J. and Kim, Y. (2005). Blockwise sparse regression. Statist. Sinica 16 375–390.
Mathematical Reviews (MathSciNet): MR2267240
Zentralblatt MATH: 1096.62076
[20] Koh, K., Kim, S. J. and Boyd, S. (2007). An interior-point method for large-scale 1-regularized logistic regression. J. Mach. Learn. Res. 3 1519–1555.
[21] Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1722790
Zentralblatt MATH: 0951.68158
[22] Meier, L., van de Geer, S. and Bühlmann, P. (2007). The group lasso for logistic regression. Technical report, Mathematics Dept., Swiss Federal Institute of Technology Zürich.
[23] Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
[24] Ng, A. Y. (2004). Feature selection, 1 vs. 2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML-04). Morgan Kaufmann, San Francisco, CA.
[25] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Technical report, Dept. Statistics, Univ. California, Berkeley.
[26] Ripley, B. D. (1981). Spatial Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR624436
[27] Rockafellar, G. (1970). Convex Analysis. Princeton Univ. Press, Princeton.
Mathematical Reviews (MathSciNet): MR274683
[28] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
Mathematical Reviews (MathSciNet): MR2417391
Digital Object Identifier: doi:10.1214/08-EJS176
Project Euclid: euclid.ejs/1214491853
[29] Santhanam, N. P. and Wainwright, M. J. (2008). Information-theoretic limits of high-dimensional graphical model selection. In International Symposium on Information Theory. Toronto, Canada.
[30] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1815675
[31] Srebro, N. (2003). Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143 123–138.
Mathematical Reviews (MathSciNet): MR1963987
Zentralblatt MATH: 1011.68066
Digital Object Identifier: doi:10.1016/S0004-3702(02)00360-0
[32] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030–1051.
[33] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
[34] Wainwright, M. J. and Jordan, M. I. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, Univ. California, Berkeley.
[35] Wainwright, M. J., Ravikumar, P. and Lafferty, J. D. (2007). High-dimensional graphical model selection using 1-regularized logistic regression. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1465–1472. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2441316
[36] Welsh, D. J. A. (1993). Complexity: Knots, Colourings, and Counting. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1245272
Zentralblatt MATH: 0799.68008
[37] Woods, J. (1978). Markov image modeling. IEEE Trans. Automat. Control 23 846–850.
[38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[39] Zhao, P. and Yu, B. (2007). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2567.
Mathematical Reviews (MathSciNet): MR2274449

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?