## The Annals of Applied Statistics

### Sparse median graphs estimation in a high-dimensional semiparametric model

#### Abstract

We propose a unified framework for conducting inference on complex aggregated data in high-dimensional settings. We assume the data are a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Using the concept of median graphs in summarizing the commonality across these graphical structures, we provide a novel semiparametric approach to modeling such complex aggregated data, along with robust estimation of the median graph, which is assumed to be sparse. We prove the estimator is consistent in graph recovery and give an upper bound on the rate of convergence. We further provide thorough numerical analysis on both synthetic and real datasets to illustrate the empirical usefulness of the proposed models and methods.

#### Article information

Source
Ann. Appl. Stat., Volume 10, Number 3 (2016), 1397-1426.

Dates
Revised: April 2016
First available in Project Euclid: 28 September 2016

https://projecteuclid.org/euclid.aoas/1475069612

Digital Object Identifier
doi:10.1214/16-AOAS940

Mathematical Reviews number (MathSciNet)
MR3553229

Zentralblatt MATH identifier
06775271

#### Citation

Han, Fang; Han, Xiaoyan; Liu, Han; Caffo, Brian. Sparse median graphs estimation in a high-dimensional semiparametric model. Ann. Appl. Stat. 10 (2016), no. 3, 1397--1426. doi:10.1214/16-AOAS940. https://projecteuclid.org/euclid.aoas/1475069612

#### References

• Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Heidelberg.
• Bullmore, E. and Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10 186–198.
• Bunke, H. and Shearer, K. (1998). A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19 255–259.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.
• Eloyan, A., Muschelli, J., Nebel, M. B., Liu, H., Han, F., Zhao, T., Barber, A., Joel, S., Pekar, J. J., Mostofsky, S. and Caffo, B. (2012). Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging. Frontiers in Systems Neuroscience 6 61.
• Fingelkurts, A. A. and Kähkönen, S. (2005). Functional connectivity in the brain—Is it an elusive concept? Neuroscience and Biobehavioral Reviews 28 827–836.
• Friedman, J. H., Hastie, T. and Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Friston, K. J. (2011). Functional and effective connectivity: A review. Brain Connect. 1 13–36.
• Han, F. and Liu, H. (2014). Distribution-free tests of independence with applications to testing more structures. Preprint. Available at arXiv:1410.4179.
• Horwitz, B. (2003). The elusive concept of brain connectivity. Neuroimage 19 466–470.
• Hsieh, C. J., Sustik, M. A., Ravikumar, P. and Dhillon, I. S. (2011). Sparse inverse covariance matrix estimation using quadratic approximation. In Advances in Neural Information Processing Systems (NIPS) 24. Granada, Spain.
• Jiang, X., Munger, A. and Bunke, H. (2001). On median graphs: Properties, algorithms, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 1144–1151.
• Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723–2772.
• Ke, T., Jin, J. and Fan, J. (2014). Covariance assisted screening and estimation. Ann. Statist. 42 2202.
• Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• Ledoit, O. and Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance 10 603–621.
• Li, L. and Toh, K.-C. (2010). An inexact interior point method for $L_{1}$-regularized sparse covariance selection. Math. Program. Comput. 2 291–315.
• Liu, I. and Agresti, A. (1996). Mantel–Haenszel-type infererence for cumulative odds ratios with a stratified ordinal response. Biometrics 52 1223–1234.
• Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
• Liu, W. and Luo, X. (2012). High-dimensional sparse precision matrix estimation via sparse column inverse operator. Preprint. Available at arXiv:1203.3896.
• Liu, H., Roeder, K. and Wasserman, L. (2010). Stability approach to regularization selection (StARS) for high dimensional graphical models. In Advances in Neural Information Processing Systems 1432–1440. Vancouver, Canada.
• Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Milham, M. P., Fair, D., Mennes, M. and Mostofsky, S. H. (2012). The ADHD-200 consortium: A model to advance the translational potential of neuroimaging in clinical neuroscience. Frontiers in Systems Neuroscience 6 62.
• Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.
• Power, J. D., Cohen, A. L., Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., Vogel, A. C., Laumann, T. O., Miezin, F. M., Schlaggar, B. L. and Peterson, S. (2011). Functional network organization of the human brain. Neuron 72 665–678.
• Ramsay, J. D., Hanson, S. J., Hanson, C., Halchenko, Y., Poldrack, R. and Glymour, C. (2009). Six problems for causal inference from fMRI. NeuroImage 49 1545–1558.
• Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2009). Model selection in Gaussian graphical models: High-dimensional consistency of $\ell_{1}$-regularized MLE. In Advances in Neural Information Processing Systems 22. Vancouver, Canada.
• Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• Rowe, B. L. Y. (2014). tawny: Provides various portfolio optimization strategies including random matrix theory and shrinkage estimators. R package version 2.1.2.
• Rubinov, M. and Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. Neuroimage 52 1059–1069.
• Scheinberg, K., Ma, S. and Glodfarb, D. (2010). Sparse inverse covariance selection via alternating linearization methods. In Advances in Neural Information Processing Systems (NIPS) 23. Vancouver, Canada.
• Xu, W., Hou, Y., Hung, Y. S. and Zou, Y. (2010). Comparison of Spearman’s rho and Kendall’s tau in normal and contaminated normal models. Preprint. Available at arXiv:1011.2009.
• Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40 2541–2571.
• Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
• Zhao, T., Liu, H., Roeder, K., Lafferty, J. and Wasserman, L. (2012). The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res. 13 1059–1062.