## The Annals of Statistics

### Support consistency of direct sparse-change learning in Markov networks

#### Abstract

We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$. Rather than fitting two Markov networks separately to two sets of data and figuring out their differences, a recent work proposed to learn changes directly via estimating the ratio between two Markov network models. In this paper, we give sufficient conditions for successful change detection with respect to the sample size $n_{p},n_{q}$, the dimension of data $m$ and the number of changed edges $d$. When using an unbounded density ratio model, we prove that the true sparse changes can be consistently identified for $n_{p}=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ and $n_{q}=\Omega({n_{p}^{2}})$, with an exponentially decaying upper-bound on learning error. Such sample complexity can be improved to $\min(n_{p},n_{q})=\Omega(d^{2}\log\frac{m^{2}+m}{2})$ when the boundedness of the density ratio model is assumed. Our theoretical guarantee can be applied to a wide range of discrete/continuous Markov networks.

#### Article information

Source
Ann. Statist. Volume 45, Number 3 (2017), 959-990.

Dates
Revised: April 2016
First available in Project Euclid: 13 June 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1497319685

Digital Object Identifier
doi:10.1214/16-AOS1470

Zentralblatt MATH identifier
1371.62022

Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 68T99: None of the above, but in this section

#### Citation

Liu, Song; Suzuki, Taiji; Relator, Raissa; Sese, Jun; Sugiyama, Masashi; Fukumizu, Kenji. Support consistency of direct sparse-change learning in Markov networks. Ann. Statist. 45 (2017), no. 3, 959--990. doi:10.1214/16-AOS1470. https://projecteuclid.org/euclid.aos/1497319685

#### References

• [1] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
• [2] Danaher, P., Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 373–397.
• [3] Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
• [4] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• [5] Hammersley, J. M. and Clifford, P. (1971). Markov Fields on Finite Graphs and Lattices. Available at http://www.statslab.cam.ac.uk/~grg/books/hammfest/hamm-cliff.pdf.
• [6] Kolar, M. and Xing, E. P. (2012). Estimating networks with jumps. Electron. J. Stat. 6 2069–2106.
• [7] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.
• [8] Lee, S.-I., Ganapathi, V. and Koller, D. (2007). Efficient structure learning of Markov networks using $l_{1}$-regularization. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 817–824. MIT Press, Cambridge, MA.
• [9] Liu, S., Quinn, J. A., Gutmann, M. U., Suzuki, T. and Sugiyama, M. (2014). Direct learning of sparse changes in Markov networks by density ratio estimation. Neural Comput. 26 1169–1197.
• [10] Liu, S., Suzuki, T., Relator, R., Sese, J., Sugiyama, M. and Fukumizu, K. (2016). Supplement to “Support consistency of direct sparse-change learning in Markov networks.” DOI:10.1214/16-AOS1470SUPP.
• [11] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [12] Nagashima, T., Shimodaira, H., Ide, K., Nakakuki, T., Tani, Y., Takahashi, K., Yumoto, N. and Hatakeyama, M. (2007). Quantitative transcriptional control of erbb receptor signaling undergoes graded to biphasic response for cell differentiation. Journal of Biological Chemistry 282 4045–4056.
• [13] Neal, R. M. (2003). Slice sampling. Ann. Statist. 31 705–767.
• [14] Raskutti, G., Yu, B., Wainwright, M. J. and Ravikumar, P. (2009). Model selection in Gaussian graphical models: High-dimensional consistency of $\ell_{1}$-regularized mle. In Advances in Neural Information Processing Systems (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 21 1329–1336. Curran Associates, Red Hook, NY.
• [15] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• [16] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer, New York.
• [17] Sugiyama, M., Kanamori, T., Suzuki, T., du Plessis, M. C., Liu, S. and Takeuchi, I. (2013). Density-difference estimation. Neural Comput. 25 2734–2775.
• [18] Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P. and Kawanabe, M. (2008). Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. T. Roweis, eds.). Curran Associates, Red Hook, NY.
• [19] Sugiyama, M., Suzuki, T. and Kanamori, T. (2012). Density Ratio Estimation in Machine Learning. Cambridge Univ. Press, Cambridge.
• [20] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
• [21] Tomioka, R. and Suzuki, T. (2014). Spectral norm of random tensors. Preprint. Available at arXiv:1407.1870.
• [22] Tsuboi, Y., Kashima, H., Hido, S., Bickel, S. and Sugiyama, M. (2009). Direct density ratio estimation for large-scale covariate shift adaptation. Journal of Information Processing 17 138–155.
• [23] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [24] Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer, New York.
• [25] Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H. and Sugiyama, M. (2013). Relative density-ratio estimation for robust distribution comparison. Neural Comput. 25 1324–1370.
• [26] Yang, E., Genevera, A., Liu, Z. and Ravikumar, P. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems (F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 25 1358–1366. Curran Associates, Red Hook, NY.
• [27] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [28] Zhang, B. and Wang, Y. J. (2010). Learning structural changes of Gaussian graphical models in controlled experiments. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010) 701–708. AUAI Press, Corvallis, OR.
• [29] Zhao, S. D., Cai, T. T. and Li, H. (2014). Direct estimation of differential networks. Biometrika 101 253–268.

#### Supplemental materials

• Supplement to “Support consistency of direct sparse-change learning in Markov networks”. Due to the page limit, we present the proofs of Lemmas 1–5, corollaries and propositions in this supplementary article. We also use this supplementary material to show some detailed experimental settings, extended simulation results and illustrations of a few concepts in the paper.