The Annals of Statistics

SPADES and mixture models

Florentina Bunea, Alexandre B. Tsybakov, Marten H. Wegkamp, and Adrian Barbu

Full-text: Open access


This paper studies sparse density estimation via 1 penalization (SPADES). We focus on estimation in high-dimensional mixture models and nonparametric adaptive density estimation. We show, respectively, that SPADES can recover, with high probability, the unknown components of a mixture of probability densities and that it yields minimax adaptive density estimates. These results are based on a general sparsity oracle inequality that the SPADES estimates satisfy. We offer a data driven method for the choice of the tuning parameter used in the construction of SPADES. The method uses the generalized bisection method first introduced in [10]. The suggested procedure bypasses the need for a grid search and offers substantial computational savings. We complement our theoretical results with a simulation study that employs this method for approximations of one and two-dimensional densities with mixtures. The numerical results strongly support our theoretical findings.

Article information

Ann. Statist., Volume 38, Number 4 (2010), 2525-2558.

First available in Project Euclid: 11 July 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62C20: Minimax procedures 62G05: Estimation 62G20: Asymptotic properties

Adaptive estimation aggregation lasso minimax risk mixture models consistent model selection nonparametric density estimation oracle inequalities penalized least squares sparsity statistical learning


Bunea, Florentina; Tsybakov, Alexandre B.; Wegkamp, Marten H.; Barbu, Adrian. SPADES and mixture models. Ann. Statist. 38 (2010), no. 4, 2525--2558. doi:10.1214/09-AOS790.

Export citation


  • [1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Biau, G. and Devroye, L. (2005). Density estimation by the penalized combinatorial method. J. Multivariate Anal. 94 196–208.
  • [3] Biau, G., Cadre, B., Devroye, L. and Györfi, L. (2008). Strongly consistent model selection for densities. TEST 17 531–545.
  • [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [5] Birgé, L. (2008). Model selection for density estimation with L2 loss. Available at arXiv:0808.1416.
  • [6] Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien LeCam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. Yang, eds.) 55–87. Springer, New York.
  • [7] Bunea, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regression. Ann. Statist. 32 898–927.
  • [8] Bunea, F. (2008). Honest variable selection in linear and logistic models via 1 and 1+2 penalization. Electron. J. Stat. 2 1153–1194.
  • [9] Bunea, F. (2008). Consistent selection via the Lasso for high dimensional approximating regression models. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh (B. Clarke and S. Ghosal, eds.) 3 122–137. IMS, Beachwood, OH.
  • [10] Bunea, F. and Barbu, A. (2009). Dimension reduction and variable selection in case control studies via regularized likelihood optimization. Electron. J. Stat. 3 1257–1287.
  • [11] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [12] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via 1-penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Artificial Intelligence 4005 379–391. Springer, Heidelberg.
  • [13] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [14] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparse density estimation with 1 penalties. In Learning Theory. Lecture Notes in Comput. Sci. 4539 530–544. Springer, Heidelberg.
  • [15] Burden, R. L. and Faires, J. D. (2001). Numerical Analysis, 7th ed. Brooks/Cole, Pacific Grove, CA.
  • [16] Chen, S., Donoho, D. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
  • [17] Devroye, L. and Lugosi, G. (2000). Combinatorial Methods in Density Estimation. Springer, New York.
  • [18] Donoho, D. L. (1995). Denoising via soft-thresholding. IEEE Trans. Inform. Theory 41 613–627.
  • [19] Donoho, D. L., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • [20] Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
  • [21] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1.
  • [22] Golubev, G. K. (1992). Nonparametric estimation of smooth probability densties in L2. Probl. Inf. Transm. 28 44–54.
  • [23] Golubev, G. K. (2002). Reconstruction of sparse vectors in white Gaussian noise. Probl. Inf. Transm. 38 65–79.
  • [24] Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
  • [25] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approximation and Statistical Applications. Lecture Notes in Statistics 129. Springer, New York.
  • [26] James, L., Priebe, C. and Marchette, D. (2001). Consistent estimation of mixture complexity. Ann. Statist. 29 1281–1296.
  • [27] Kerkyacharian, G., Picard, D. and Tribouley, K. (1996). Lp adaptive density estimation. Bernoulli 2 229–247.
  • [28] Koltchinskii, V. (2005). Model selection and aggregation in sparse classification problems. Oberwolfach Reports 2 2663–2667.
  • [29] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. 45 7–57.
  • [30] Loubes, J.-M. and van de Geer, S. A. (2002). Adaptive estimation in regression, using soft thresholding type penalties. Statist. Neerlandica 56 453–478.
  • [31] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • [32] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [33] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint–Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • [34] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge Univ. Press, New York.
  • [35] Rigollet, P. (2006). Inégalités d’oracle, agrégation et adaptation. Ph.D. thesis, Univ. Paris 6.
  • [36] Rigollet, P. and Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
  • [37] Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9 65–78.
  • [38] Samarov, A. and Tsybakov, A. (2007). Aggregation of density estimators and dimension reduction. In Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum (V. Nair, ed.) 233–251. World Scientific, Singapore.
  • [39] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [40] Tsybakov, A. B. (2003). Optimal rates of aggregation. In Proceedings of 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines. Lecture Notes in Artificial Intelligence 2777. Springer, Heidelberg.
  • [41] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, New York.
  • [42] van de Geer, S. A. (2008). High dimensional generalized linear models and the Lasso. Ann. Statist. 26 225–287.
  • [43] Wasserman, L. A. (2004). All of Statistics. Springer, New York.
  • [44] Wegkamp, M. H. (1999). Quasi-universal bandwidth selection for kernel density estimators. Canad. J. Statist. 27 409–420.
  • [45] Wegkamp, M. H. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
  • [46] Zhang, C. H. and Huang, J. (2008). The sparsity and biais of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [47] Zhao, P. and Yu, B. (2007). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2567.
  • [48] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.