The Annals of Statistics

Robust subspace clustering

Mahdi Soltanolkotabi, Ehsan Elhamifar, and Emmanuel J. Candès

Full-text: Open access


Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790–2797] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.

Article information

Ann. Statist., Volume 42, Number 2 (2014), 669-699.

First available in Project Euclid: 20 May 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62-07: Data analysis

Subspace clustering spectral clustering LASSO Dantzig selector $\ell_{1}$ minimization multiple hypothesis testing true and false discoveries geometric functional analysis nonasymptotic random matrix theory


Soltanolkotabi, Mahdi; Elhamifar, Ehsan; Candès, Emmanuel J. Robust subspace clustering. Ann. Statist. 42 (2014), no. 2, 669--699. doi:10.1214/13-AOS1199.

Export citation


  • [1] Agarwal, P. K. and Mustafa, N. H. (2004). $k$-means projective clustering. In Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems 155–165.
  • [2] Aldroubi, A. and Sekmen, A. (2012). Nearness to local subspace algorithm for subspace and motion segmentation. Signal Process. Lett., IEEE 19 704–707.
  • [3] Arias-Castro, E. (2011). Clustering based on pairwise distances when the data is of mixed dimensions. IEEE Trans. Inform. Theory 57 1692–1706.
  • [4] Arias-Castro, E., Chen, G. and Lerman, G. (2011). Spectral clustering based on local linear approximations. Electron. J. Stat. 5 1537–1587.
  • [5] Arora, S., Ge, R., Kannan, R. and Moitra, A. (2012). Computing a nonnegative matrix factorization–provably. In STOC’12—Proceedings of the 2012 ACM Symposium on Theory of Computing 145–161. ACM, New York.
  • [6] Bako, L. (2011). Identification of switched linear systems via sparse optimization. Automatica J. IFAC 47 668–677.
  • [7] Balcan, M.-F., Blum, A. and Gupta, A. (2009). Approximate clustering without the approximation. In Proceedings of the Twentieth Annual ACM–SIAM Symposium on Discrete Algorithms 1068–1077. SIAM, Philadelphia, PA.
  • [8] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57 764–785.
  • [9] Bayati, M. and Montanari, A. (2012). The LASSO risk for Gaussian matrices. IEEE Trans. Inform. Theory 58 1997–2017.
  • [10] Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165–218.
  • [11] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
  • [12] Bittorf, V., Recht, B., Re, C. and Tropp, J. A. (2012). Factoring nonnegative matrices with linear programs. In Proceedings of Natural Information Processing Systems Foundation NIPS.
  • [13] Boult, T. E. and Gottesfeld Brown, L. (1991). Factorization-based segmentation of motions. In Proceedings of the IEEE Workshop on Visual Motion 179–186.
  • [14] Bradley, P. S. and Mangasarian, O. L. (2000). $k$-plane clustering. J. Global Optim. 16 23–32.
  • [15] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [16] Chen, G. and Lerman, G. (2009). Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Found. Comput. Math. 9 517–558.
  • [17] Chen, G. and Lerman, G. (2009). Spectral curvature clustering (SCC). Int. J. Comput. Vis. 81 317–330.
  • [18] Chen, Y., Nasrabadi, N. M. and Tran, T. D. (2011). Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans. Geosci. Remote Sens. 99 1–13.
  • [19] Dalalyan, A. and Chen, Y. (2012). Fused sparsity and robust estimation for linear models with unknown variance. In Advances in Neural Information Processing Systems 25 1268–1276.
  • [20] Elhamifar, E. and Vidal, R. (2009). Sparse subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2790–2797.
  • [21] Elhamifar, E. and Vidal, R. (2010). Clustering disjoint subspaces via sparse representation. In IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP 1926–1929. IEEE Press, New York.
  • [22] Elhamifar, E. and Vidal, R. (2013). Sparse subspace clustering: Algorithms, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35 2765–2781.
  • [23] Eriksson, B., Balzano, L. and Nowak, R. (2011). High-rank matrix completion and subspace clustering with missing data. Preprint. Available at arXiv:1112.5629.
  • [24] Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. Statist. Sci. 27 500–518.
  • [25] Goh, A. and Vidal, R. (2007). Segmenting motions of different types by unsupervised manifold clustering. In IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 1–6. IEEE Press, New York.
  • [26] Gunnemann, S., Muller, E., Raubach, S. and Seidl, T. (2011). Flexible fault tolerant subspace clustering for data with missing values. In IEEE International Conference on Data Mining, ICDM 231–240.
  • [27] Kannan, R. and Vempala, S. (2008). Spectral algorithms. Found. Trends Theor. Comput. Sci. 4 157–288 (2009).
  • [28] Keller, F., Muller, E. and Bohm, K. (2012). HICS: High contrast subspaces for density-based outlier ranking. In IEEE International Conference on Data Engineering, ICDE 1037–1048.
  • [29] Kotropoulos, Y. P. C. and Arce, G. R. (2011). $\ell_1$-graph based music structure analysis. In International Society for Music Information Retrieval Conference, ISMIR.
  • [30] Lerman, G. and Zhang, T. (2011). Robust recovery of multiple subspaces by geometric $l_p$ minimization. Ann. Statist. 39 2686–2715.
  • [31] Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y. and Ma, Y. (2013). Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35 171–184.
  • [32] Loh, P.-L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40 1637–1664.
  • [33] Lu, L. and Vidal, R. (2006). Combined central and subspace clustering for computer vision applications. In Proceedings of the 23rd International Conference on Machine Learning 593–600. ACM, New York.
  • [34] Ma, Y., Derksen, H., Hong, W. and Wright, J. (2007). Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Trans. Pattern Anal. Mach. Intell. 29 1546–1562.
  • [35] Ma, Y. and Vidal, R. (2005). Identification of deterministic switched arx systems via identification of algebraic varieties. In Hybrid Systems: Computation and Control 449–465.
  • [36] Ma, Y., Yang, A. Y., Derksen, H. and Fossum, R. (2008). Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Rev. 50 413–458.
  • [37] McWilliams, B. and Montana, G. (2014). Subspace clustering of high-dimensional data: A predictive approach. Data Min. Knowl. Discov. 28 736–772.
  • [38] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [39] Müller, E., Gunnemann, S., Assent, I. and Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. Proc. VLDB Endow. 2 1270–1281.
  • [40] Ng, A. Y., Jordan, M. I. and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2 849–856.
  • [41] Ozay, N., Sznaier, M. and Lagoa, C. (2010). Model (in) validation of switched arx systems with unknown switches and its application to activity monitoring. In IEEE Conference on Decision and Control, CDC 7624–7630.
  • [42] Ozay, N., Sznaier, M., Lagoa, C. and Camps, O. (2010). GPCA with denoising: A moments-based convex approach. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 3209–3216. IEEE Press, New York.
  • [43] Parsons, L., Haque, E. and Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explor. Newsl. 6 90–105.
  • [44] Rosenbaum, M. and Tsybakov, A. B. (2013). Improved matrix uncertainty selector. In From Probability to Statistics and Back: High-Dimensional Models and Processes—A Festschrift in Honor of Jon A. Wellner 276–290. IMS, Beachwood, OH.
  • [45] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620–2651.
  • [46] Soltanolkotabi, M. and Candés, E. J. (2012). A geometric analysis of subspace clustering with outliers. Ann. Statist. 40 2195–2238.
  • [47] Soltanolkotabi, M., Elhamifar, E. and Candès, E. J. (2014). Supplement to “Robust subspace clustering.” DOI:10.1214/13-AOS1199SUPP.
  • [48] Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_1$-penalization for mixture regression models. TEST 19 209–256.
  • [49] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [50] Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 611–622.
  • [51] Tomasi, C. and Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. Int. J. Comput. Vis. 9 137–154.
  • [52] Tseng, P. (2000). Nearest $q$-flat to $m$ points. J. Optim. Theory Appl. 105 249–252.
  • [53] Vidal, R. (2011). Subspace clustering. IEEE Signal Process. Mag. 28 52–68.
  • [54] Vidal, R., Ma, Y. and Sastry, S. (2005). Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27 1945–1959.
  • [55] Yan, J. and Pollefeys, M. (2006). A general framework for motion segmentation: Independent, articulated, rigid, nonrigid, degenerate and nondegenerate. In ECCV 2006 94–106.
  • [56] Zhang, A., Fawaz, N., Ioannidis, S. and Montanari, A. (2012). Guess who rated this movie: Identifying users through subspace clustering. In Proceedings of the International Conference on Uncertainty in Articial Intelligence 944–953.
  • [57] Zhang, T., Szlam, A. and Lerman, G. (2009). Median $k$-flats for hybrid linear modeling with many outliers. In IEEE International Conference on Computer Vision Workshops, ICCV 234–241.
  • [58] Zhang, T., Szlam, A., Wang, Y. and Lerman, G. (2012). Hybrid linear modeling via local best-fit flats. Int. J. Comput. Vis. 100 217–240.
  • [59] Zhou, F., Torre, F. and Hodgins, J. K. (2008). Aligned cluster analysis for temporal segmentation of human motion. In IEEE International Conference on Automatic Face and Gesture Recognition, FG 1–7.
  • [60]
  • [61]

Supplemental materials