The Annals of Applied Statistics

Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models

Nicolas Städler and Sach Mukherjee

Full-text: Open access


We consider penalized estimation in hidden Markov models (HMMs) with multivariate Normal observations. In the moderate-to-large dimensional setting, estimation for HMMs remains challenging in practice, due to several concerns arising from the hidden nature of the states. We address these concerns by $\ell_{1}$-penalization of state-specific inverse covariance matrices. Penalized estimation leads to sparse inverse covariance matrices which can be interpreted as state-specific conditional independence graphs. Penalization is nontrivial in this latent variable setting; we propose a penalty that automatically adapts to the number of states $K$ and the state-specific sample sizes and can cope with scaling issues arising from the unknown states. The methodology is adaptive and very general, applying in particular to both low- and high-dimensional settings without requiring hand tuning. Furthermore, our approach facilitates exploration of the number of states $K$ by coupling estimation for successive candidate values $K$. Empirical results on simulated examples demonstrate the effectiveness of the proposed approach. In a challenging real data example from genome biology, we demonstrate the ability of our approach to yield gains in predictive power and to deliver richer estimates than existing methods.

Article information

Ann. Appl. Stat., Volume 7, Number 4 (2013), 2157-2179.

First available in Project Euclid: 23 December 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

HMM Graphical Lasso universal regularization model selection MMDL greedy backward pruning genome biology chromatin modeling


Städler, Nicolas; Mukherjee, Sach. Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models. Ann. Appl. Stat. 7 (2013), no. 4, 2157--2179. doi:10.1214/13-AOAS662.

Export citation


  • Barron, A., Huang, C., Li, J. Q. and Luo, X. (2008). MDL Principle, Penalized Likelihood, and Statistical Risk. MIT Press Books. Tampere Univ. Press, Tampere, Finland.
  • Bicego, M., Murino, V. and Figueiredo, M. A. T. (2003). A sequential pruning strategy for the selection of the number of states in hidden Markov models. Pattern Recognition Letters 24 1395–1407.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. J. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, Cambridge.
  • ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489 57–74.
  • Ernst, J. and Kellis, M. (2010). Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28 817–825.
  • Figueiredo, M. A. T. and Jain, A. K. (2000). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 381–396.
  • Figueiredo, M. A. T., Leitão, J. M. N. and Jain, A. K. (1999). On fitting mixture models. In Proceedings of the Second International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, EMMCVPR’99 54–69. Springer, Berlin.
  • Filion, G. J., van Bemmel, J. G., Braunschweig, U., Talhout, W., Kind, J., Ward, L. D., Brugman, W., de Castro, I. J., Kerkhoven, R. M., Bussemaker, H. J. and van Steensel, B. (2010). Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143 212–224.
  • Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97 611–631.
  • Fraley, C. and Raftery, A. E. (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Dept. Statistics, Univ. Washington, Seattle, WA.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Grünwald, P. D. (2007). The Minimum Description Length Principle. MIT Press, Cambridge, MA.
  • Hennig, C. and Liao, T. F. (2013). How to find an appropriate clustering for mixed type variables with application to socio-economic stratification. J. R. Stat. Soc. Ser. C. Appl. Stat. 62 309–369.
  • Hill, S. M. and Mukherjee, S. (2013). Network-based clustering with mixtures of L1-penalized Gaussian graphical models: An empirical investigation. Available at arXiv:1301.2194.
  • Khalili, A. and Chen, J. (2007). Variable selection in finite mixture of regression models. J. Amer. Statist. Assoc. 102 1025–1038.
  • Krogh, A., Mian, I. S. and Haussler, D. (1994). A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22 4768–4778.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res. 8 1145–1164.
  • Park, P. (2009). ChIP–seq: Advantages and challenges of a maturing technology. Nature Reviews Genetics 10 669–680.
  • Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
  • Städler, N. and Mukherjee, S. (2013). Supplement to “Penalized estimation in high-dimensional hidden Markov models with state-specific graphical models.” DOI:10.1214/13-AOAS662SUPP.
  • Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • van Steensel, B. and Henikoff, S. (2000). Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18 424–428.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.

Supplemental materials

  • Supplementary material: Graphical Lasso with different penalty functions and supplementary figures. Optimization and performance of the Graphical Lasso with the penalty functions $\mathrm{Pen}_{\mathrm{invcov}}$, $\mathrm{Pen}_{\mathrm{parcor}}$ and $\mathrm{Pen}_{\mathrm{invcor}}$ introduced in Section 2.1. Additional Figures S2–S5 for Sections 3.1, 3.2 and 4.