Open Access
June 2017 Integrative sparse $K$-means with overlapping group lasso in genomic applications for disease subtype discovery
Zhiguang Huo, George Tseng
Ann. Appl. Stat. 11(2): 1011-1039 (June 2017). DOI: 10.1214/17-AOAS1033
Abstract

Cancer subtypes discovery is the first step to deliver personalized medicine to cancer patients. With the accumulation of massive multi-level omics datasets and established biological knowledge databases, omics data integration with incorporation of rich existing biological knowledge is essential for deciphering a biological mechanism behind the complex diseases. In this manuscript, we propose an integrative sparse $K$-means (IS-$K$means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using an alternating direction method of multiplier (ADMM) will be applied for fast optimization. Simulation and three real applications in breast cancer and leukemia will be used to compare IS-$K$means with existing methods and demonstrate its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.

References

1.

Abramson, V. G., Lehmann, B. D., Ballinger, T. J. and Pietenpol, J. A. (2015). Subtyping of triple-negative breast cancer: Implications for therapy. Cancer 121 8–16.Abramson, V. G., Lehmann, B. D., Ballinger, T. J. and Pietenpol, J. A. (2015). Subtyping of triple-negative breast cancer: Implications for therapy. Cancer 121 8–16.

2.

Balgobind, B. V., Van den Heuvel-Eibrink, M. M., De Menezes, R. X., Reinhardt, D., Hollink, I. H. I. M., Arentsen-Peters, S. T. J. C. M., van Wering, E. R., Kaspers, G. J. L., Cloos, J., de Bont, E. S. J. M., Cayuela, J. M., Baruchel, A., Meyer, C., Marschalek, R., Trka, J., Stary, J., Beverloo, H. B., Pieters, R., Zwaan, C. M. and den Boer, M. L. (2010). Evaluation of gene expression signatures predictive of cytogenetic and molecular subtypes of pediatric acute myeloid leukemia. Haematologica 96 221–230.Balgobind, B. V., Van den Heuvel-Eibrink, M. M., De Menezes, R. X., Reinhardt, D., Hollink, I. H. I. M., Arentsen-Peters, S. T. J. C. M., van Wering, E. R., Kaspers, G. J. L., Cloos, J., de Bont, E. S. J. M., Cayuela, J. M., Baruchel, A., Meyer, C., Marschalek, R., Trka, J., Stary, J., Beverloo, H. B., Pieters, R., Zwaan, C. M. and den Boer, M. L. (2010). Evaluation of gene expression signatures predictive of cytogenetic and molecular subtypes of pediatric acute myeloid leukemia. Haematologica 96 221–230.

3.

Bass, A., Thorsson, V., Shmulevich, I., Reynolds, S., Miller, M., Bernard, B., Hinoue, T., Laird, P., Curtis, C., Shen, H. et al. (2014). Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513 202–209.Bass, A., Thorsson, V., Shmulevich, I., Reynolds, S., Miller, M., Bernard, B., Hinoue, T., Laird, P., Curtis, C., Shen, H. et al. (2014). Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513 202–209.

4.

Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 1–122.Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 1–122.

5.

Curtis, C., Shah, S. P., Chin, S.-F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y. et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486 346–352.Curtis, C., Shah, S. P., Chin, S.-F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y. et al. (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486 346–352.

6.

Domany, E. (2014). Using high-throughput transcriptomic data for prognosis: A critical overview and perspectives. Cancer Res. 74 4612–4621.Domany, E. (2014). Using high-throughput transcriptomic data for prognosis: A critical overview and perspectives. Cancer Res. 74 4612–4621.

7.

Dudoit, S. and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3 1–21.Dudoit, S. and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 3 1–21.

8.

Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95 14863–14868.Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95 14863–14868.

9.

Fan, X. and Kurgan, L. (2015). Comprehensive overview and assessment of computational prediction of microRNA targets in animals. Brief. Bioinform. 16 780–794.Fan, X. and Kurgan, L. (2015). Comprehensive overview and assessment of computational prediction of microRNA targets in animals. Brief. Bioinform. 16 780–794.

10.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.

11.

He, B. S., Yang, H. and Wang, S. L. (2000). Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106 337–356.He, B. S., Yang, H. and Wang, S. L. (2000). Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 106 337–356.

12.

Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.

13.

Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.

14.

Huo, Z. and Tseng, G. (2017). Supplement to “Integrative sparse $K$-means with overlapping group lasso in genomic applications for disease subtype discovery.”  DOI:10.1214/17-AOAS1033SUPP.Huo, Z. and Tseng, G. (2017). Supplement to “Integrative sparse $K$-means with overlapping group lasso in genomic applications for disease subtype discovery.”  DOI:10.1214/17-AOAS1033SUPP.

15.

Huo, Z., Ding, Y., Liu, S., Oesterreich, S. and Tseng, G. (2016). Meta-analytic framework for sparse $K$-means to identify disease subtypes in multiple transcriptomic studies. J. Amer. Statist. Assoc. 111 27–42.Huo, Z., Ding, Y., Liu, S., Oesterreich, S. and Tseng, G. (2016). Meta-analytic framework for sparse $K$-means to identify disease subtypes in multiple transcriptomic studies. J. Amer. Statist. Assoc. 111 27–42.

16.

Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaud. Sci. Nat. 37 547–579.Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaud. Sci. Nat. 37 547–579.

17.

Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning 433–440. ACM, New York.Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning 433–440. ACM, New York.

18.

Kaufman, L. and Rousseeuw, P. (1987). Clustering by Means of Medoids. North-Holland, Amsterdam.Kaufman, L. and Rousseeuw, P. (1987). Clustering by Means of Medoids. North-Holland, Amsterdam.

19.

Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

20.

Kim, E.-Y., Kim, S.-Y., Ashlock, D. and Nam, D. (2009). MULTI-K: Accurate classification of microarray subtypes using ensemble $k$-means clustering. BMC Bioinform. 10 260.Kim, E.-Y., Kim, S.-Y., Ashlock, D. and Nam, D. (2009). MULTI-K: Accurate classification of microarray subtypes using ensemble $k$-means clustering. BMC Bioinform. 10 260.

21.

Koboldt, D. C., Fulton, R. S., McLellan, M. D., Schmidt, H., Kalicki-Veizer, J., McMichael, J. F., Fulton, L. L., Dooling, D. J., Ding, L., Mardis, E. R. et al. (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.Koboldt, D. C., Fulton, R. S., McLellan, M. D., Schmidt, H., Kalicki-Veizer, J., McMichael, J. F., Fulton, L. L., Dooling, D. J., Ding, L., Mardis, E. R. et al. (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.

22.

Kohlmann, A., Kipps, T. J., Rassenti, L. Z., Downing, J. R., Shurtleff, S. A., Mills, K. I., Gilkes, A. F., Hofmann, W.-K., Basso, G., Dell’Orto, M. C., Foà, R., Chiaretti, S., Vos, J. D., Rauhut, S., Papenhausen, P. R., Hernández, J. M., Lumbreras, E., Yeoh, A. E., Koay, E. S., Li, R., Liu, W., Williams, P. M., Wieczorek, L. and Haferlach, T. (2008). An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: The microarray innovations in LEukemia study prephase. Br. J. Haematol. 142 802–807.Kohlmann, A., Kipps, T. J., Rassenti, L. Z., Downing, J. R., Shurtleff, S. A., Mills, K. I., Gilkes, A. F., Hofmann, W.-K., Basso, G., Dell’Orto, M. C., Foà, R., Chiaretti, S., Vos, J. D., Rauhut, S., Papenhausen, P. R., Hernández, J. M., Lumbreras, E., Yeoh, A. E., Koay, E. S., Li, R., Liu, W., Williams, P. M., Wieczorek, L. and Haferlach, T. (2008). An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: The microarray innovations in LEukemia study prephase. Br. J. Haematol. 142 802–807.

23.

Lehmann, B. D., Bauer, J. A., Chen, X., Sanders, M. E., Chakravarthy, A. B., Shyr, Y. and Pietenpol, J. A. (2011). Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Invest. 121 2750.Lehmann, B. D., Bauer, J. A., Chen, X., Sanders, M. E., Chakravarthy, A. B., Shyr, Y. and Pietenpol, J. A. (2011). Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Invest. 121 2750.

24.

Lock, E. F. and Dunson, D. B. (2013). Bayesian consensus clustering. Bioinformatics 29 2610–2616.Lock, E. F. and Dunson, D. B. (2013). Bayesian consensus clustering. Bioinformatics 29 2610–2616.

25.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 281–297. Univ. California Press, Berkeley, CA.MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 281–297. Univ. California Press, Berkeley, CA.

26.

Maitra, R. and Ramler, I. P. (2009). Clustering in the presence of scatter. Biometrics 65 341–352.Maitra, R. and Ramler, I. P. (2009). Clustering in the presence of scatter. Biometrics 65 341–352.

27.

McLachlan, G. J., Bean, R. and Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18 413–422.McLachlan, G. J., Bean, R. and Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18 413–422.

28.

Milligan, G. W. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50 159–179.Milligan, G. W. and Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50 159–179.

29.

Parker, J. S., Mullins, M., Cheang, M. C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z. et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27 1160–1167.Parker, J. S., Mullins, M., Cheang, M. C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z. et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27 1160–1167.

30.

Parsons, D. W., Jones, S., Zhang, X., Lin, J. C.-H., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Siu, I.-M., Gallia, G. L. et al. (2008). An integrated genomic analysis of human glioblastoma multiforme. Science 321 1807–1812.Parsons, D. W., Jones, S., Zhang, X., Lin, J. C.-H., Leary, R. J., Angenendt, P., Mankoo, P., Carter, H., Siu, I.-M., Gallia, G. L. et al. (2008). An integrated genomic analysis of human glioblastoma multiforme. Science 321 1807–1812.

31.

Qin, Z. S. (2006). Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 22 1988–1997.Qin, Z. S. (2006). Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 22 1988–1997.

32.

Ramasamy, A., Mondry, A., Holmes, C. C. and Altman, D. G. (2008). Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 5 1320–1333.Ramasamy, A., Mondry, A., Holmes, C. C. and Altman, D. G. (2008). Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 5 1320–1333.

33.

Richardson, S., Tseng, G. C. and Sun, W. (2016). Statistical methods in integrative genomics. Annu. Rev. Statist. Appl. 3 181–209.Richardson, S., Tseng, G. C. and Sun, W. (2016). Statistical methods in integrative genomics. Annu. Rev. Statist. Appl. 3 181–209.

34.

Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane, J. M. et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346 1937–1947.Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., Giltnane, J. M. et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346 1937–1947.

35.

Sadanandam, A., Lyssiotis, C. A., Homicsko, K., Collisson, E. A., Gibb, W. J., Wullschleger, S., Ostos, L. C. G., Lannon, W. A., Grotzinger, C., Del Rio, M. et al. (2013). A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat. Med. 19 619–625.Sadanandam, A., Lyssiotis, C. A., Homicsko, K., Collisson, E. A., Gibb, W. J., Wullschleger, S., Ostos, L. C. G., Lannon, W. A., Grotzinger, C., Del Rio, M. et al. (2013). A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat. Med. 19 619–625.

36.

Shen, R., Olshen, A. B. and Ladanyi, M. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25 2906–2912.Shen, R., Olshen, A. B. and Ladanyi, M. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25 2906–2912.

37.

Shen, K. and Tseng, G. C. (2010). Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 26 1316–1323.Shen, K. and Tseng, G. C. (2010). Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 26 1316–1323.

38.

Simon, R. (2005). Development and validation of therapeutically relevant multi-gene biomarker classifiers. J. Natl. Cancer Inst. 97 866–867.Simon, R. (2005). Development and validation of therapeutically relevant multi-gene biomarker classifiers. J. Natl. Cancer Inst. 97 866–867.

39.

Simon, R., Radmacher, M. D., Dobbin, K. and McShane, L. M. (2003). Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95 14–18.Simon, R., Radmacher, M. D., Dobbin, K. and McShane, L. M. (2003). Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95 14–18.

40.

Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231–245.Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231–245.

41.

Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X. and Kellam, P. (2004). Consensus clustering and functional interpretation of gene-expression data. Genome Biol. 5 R94.Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X. and Kellam, P. (2004). Consensus clustering and functional interpretation of gene-expression data. Genome Biol. 5 R94.

42.

Tibshirani, R. and Walther, G. (2005). Cluster validation by prediction strength. J. Comput. Graph. Statist. 14 511–528.Tibshirani, R. and Walther, G. (2005). Cluster validation by prediction strength. J. Comput. Graph. Statist. 14 511–528.

43.

Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 411–423.Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 411–423.

44.

Tothill, R. W., Tinker, A. V., George, J., Brown, R., Fox, S. B., Lade, S., Johnson, D. S., Trivett, M. K., Etemadmoghadam, D., Locandro, B. et al. (2008). Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 14 5198–5208.Tothill, R. W., Tinker, A. V., George, J., Brown, R., Fox, S. B., Lade, S., Johnson, D. S., Trivett, M. K., Etemadmoghadam, D., Locandro, B. et al. (2008). Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin. Cancer Res. 14 5198–5208.

45.

Tseng, G. C. (2007). Penalized and weighted $K$-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23 2247–2255.Tseng, G. C. (2007). Penalized and weighted $K$-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23 2247–2255.

46.

Tseng, G., Ghosh, D. and Feingold, E. (2012). Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 40 3785–3799.Tseng, G., Ghosh, D. and Feingold, E. (2012). Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 40 3785–3799.

47.

Tseng, G. C. and Wong, W. H. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics 61 10–16.Tseng, G. C. and Wong, W. H. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics 61 10–16.

48.

Verhaak, R. G., Wouters, B. J., Erpelinck, C. A., Abbas, S., Beverloo, H. B., Lugthart, S., Löwenberg, B., Delwel, R. and Valk, P. J. (2009). Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling. Haematologica 94 131–134.Verhaak, R. G., Wouters, B. J., Erpelinck, C. A., Abbas, S., Beverloo, H. B., Lugthart, S., Löwenberg, B., Delwel, R. and Valk, P. J. (2009). Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling. Haematologica 94 131–134.

49.

Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., Miller, C. R., Ding, L., Golub, T., Mesirov, J. P. et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17 98–110.Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., Miller, C. R., Ding, L., Golub, T., Mesirov, J. P. et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17 98–110.

50.

Wang, S. L. and Liao, L. Z. (2001). Decomposition method with a variable parameter for a class of monotone variational inequality problems. J. Optim. Theory Appl. 109 415–429.Wang, S. L. and Liao, L. Z. (2001). Decomposition method with a variable parameter for a class of monotone variational inequality problems. J. Optim. Theory Appl. 109 415–429.

51.

Witkos, T. M., Koscianska, E. and Krzyzosiak, W. J. (2011). Practical aspects of microRNA target prediction. Curr. Mol. Med. 11 93–109.Witkos, T. M., Koscianska, E. and Krzyzosiak, W. J. (2011). Practical aspects of microRNA target prediction. Curr. Mol. Med. 11 93–109.

52.

Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.

53.

Xie, B., Pan, W. and Shen, X. (2008). Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electron. J. Stat. 2 168–212.Xie, B., Pan, W. and Shen, X. (2008). Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electron. J. Stat. 2 168–212.

54.

Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.

55.

Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Copyright © 2017 Institute of Mathematical Statistics
Zhiguang Huo and George Tseng "Integrative sparse $K$-means with overlapping group lasso in genomic applications for disease subtype discovery," The Annals of Applied Statistics 11(2), 1011-1039, (June 2017). https://doi.org/10.1214/17-AOAS1033
Received: 1 January 2016; Published: June 2017
Vol.11 • No. 2 • June 2017
Back to Top