• Bernoulli
  • Volume 26, Number 1 (2020), 286-322.

Prediction and estimation consistency of sparse multi-class penalized optimal scoring

Irina Gaynanova

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes.

Article information

Bernoulli, Volume 26, Number 1 (2020), 286-322.

Received: September 2018
Revised: March 2019
First available in Project Euclid: 26 November 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

classification high-dimensional regression lasso linear discriminant analysis


Gaynanova, Irina. Prediction and estimation consistency of sparse multi-class penalized optimal scoring. Bernoulli 26 (2020), no. 1, 286--322. doi:10.3150/19-BEJ1126.

Export citation


  • [1] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [2] Bien, J., Gaynanova, I., Lederer, J. and Müller, C.L. (2019). Prediction error bounds for linear regression with the TREX. TEST 28 451–474.
  • [3] Bouveyron, C., Latouche, P. and Mattei, P.-A. (2018). Bayesian variable selection for globally sparse probabilistic PCA. Electron. J. Stat. 12 3036–3070.
  • [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [5] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [6] Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566–1577.
  • [7] Chatterjee, S. (2013). Assumptionless consistency of the Lasso.
  • [8] Clemmensen, L., Hastie, T., Witten, D. and Ersbøll, B. (2011). Sparse discriminant analysis. Technometrics 53 406–413.
  • [9] Dalalyan, A.S., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
  • [10] Gaynanova, I., Booth, J.G. and Wells, M.T. (2016). Simultaneous sparse estimation of canonical vectors in the $p\gg N$ setting. J. Amer. Statist. Assoc. 111 696–706.
  • [11] Gaynanova, I. and Kolar, M. (2015). Optimal variable selection in multi-group sparse discriminant analysis. Electron. J. Stat. 9 2007–2034.
  • [12] Gaynanova, I. and Wang, T. (2019). Sparse quadratic classification rules via linear dimension reduction. J. Multivariate Anal. 169 278–299.
  • [13] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • [14] Han, F., Zhao, T. and Liu, H. (2013). CODA: High dimensional copula discriminant analysis. J. Mach. Learn. Res. 14 629–671.
  • [15] Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73–102.
  • [16] Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. J. Amer. Statist. Assoc. 89 1255–1270.
  • [17] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. Boca Raton, FL: CRC Press.
  • [18] Hsu, D., Kakade, S.M. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17 no. 52, 6.
  • [19] Kolar, M. and Liu, H. (2015). Optimal feature selection in high-dimensional discriminant analysis. IEEE Trans. Inform. Theory 61 1063–1083.
  • [20] Lederer, J., Yu, L. and Gaynanova, I. (2019). Oracle inequalities for high-dimensional prediction. Bernoulli 25 1225–1255.
  • [21] Li, Y. and Jia, J. (2017). L1 least squares for sparse high-dimensional LDA. Electron. J. Stat. 11 2499–2518.
  • [22] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A.B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
  • [23] Mai, Q., Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99 29–42.
  • [24] Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979). Multivariate Analysis. London–New York–Toronto, Ont.: Academic Press [Harcourt Brace Jovanovich, Publishers].
  • [25] McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley.
  • [26] Merchante, L.F.S., Grandvalet, Y. and Govaert, G. (2012). An efficient approach to sparse linear discriminant analysis. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012 1167–1174. Universite de Technologie de Compiegne, Compiegne, France.
  • [27] Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • [28] Neykov, M., Ning, Y., Liu, J.S. and Liu, H. (2018). A unified theory of confidence regions and testing for high-dimensional estimating equations. Statist. Sci. 33 427–443.
  • [29] Obozinski, G., Wainwright, M.J. and Jordan, M.I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
  • [30] Pan, Y., Mai, Q. and Zhang, X. (2018). Covariate-adjusted tensor classification in high dimensions. J. Amer. Statist. Assoc. 1–41.
  • [31] Raskutti, G., Wainwright, M.J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • [32] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [33] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
  • [34] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [35] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [36] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge: Cambridge Univ. Press.
  • [37] Wainwright, M.J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [38] Witten, D.M. and Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 753–772.
  • [39] Wu, Y., Wipf, D.P. and Yun, J.M. (2015). Understanding and evaluating sparse linear discriminant analysis. AISTATS.
  • [40] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • [41] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_{1}$ regularization. Ann. Statist. 37 2109–2144.
  • [42] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • [43] Zhou, S. (2009). Restricted eigenvalue conditions on subgaussian random matrices.