Electronic Journal of Statistics

Self-concordant analysis for logistic regression

Francis Bach
Source: Electron. J. Statist. Volume 4 (2010), 384-414.

Abstract

Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the 2-norm and regularization by the 1-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ejs/1271941980
Digital Object Identifier: doi:10.1214/09-EJS521
Mathematical Reviews number (MathSciNet): MR2645490

References

[1] A. W. Van der Vaart., Asymptotic Statistics. Cambridge University Press, 1998.
Mathematical Reviews (MathSciNet): MR1652247
Zentralblatt MATH: 0910.62001
[2] P. Massart., Concentration Inequalities and Model Selection: Ecole d’été de Probabilités de Saint-Flour 23. Springer, 2003.
Mathematical Reviews (MathSciNet): MR2319879
Zentralblatt MATH: 1170.60006
[3] S. A. Van De Geer. High-dimensional generalized linear models and the Lasso., Annals of Statistics, 36(2):614, 2008.
Mathematical Reviews (MathSciNet): MR2396809
Zentralblatt MATH: 1138.62323
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
[4] C. Gu. Adaptive spline smoothing in non-gaussion regression models., Journal of the American Statistical Association, pages 801–807, 1990.
Mathematical Reviews (MathSciNet): MR1138360
Digital Object Identifier: doi:10.2307/2290018
[5] F. Bunea. Honest variable selection in linear and logistic regression models via, 1 and 1+2 penalization. Electronic Journal of Statistics, 2 :1153–1194, 2008.
Mathematical Reviews (MathSciNet): MR2461898
Digital Object Identifier: doi:10.1214/08-EJS287
Project Euclid: euclid.ejs/1229450666
[6] D. P. Bertsekas., Nonlinear programming. Athena Scientific, 1999.
[7] S. Boyd and L. Vandenberghe., Convex Optimization. Cambridge University Press, 2003.
Mathematical Reviews (MathSciNet): MR2061575
[8] Y. Nesterov and A. Nemirovskii., Interior-point polynomial algorithms in convex programming. SIAM studies in Applied Mathematics, 1994.
Mathematical Reviews (MathSciNet): MR1258086
Zentralblatt MATH: 0824.90112
[9] R. Christensen., Log-linear models and logistic regression. Springer, 1997.
Mathematical Reviews (MathSciNet): MR1633357
Zentralblatt MATH: 0880.62073
[10] D. W. Hosmer and S. Lemeshow., Applied logistic regression. Wiley-Interscience, 2004.
[11] C. Houdré and P. Reynaud-Bouret. Exponential inequalities, with constants, for U-statistics of order two. In, Stochastic inequalities and applications, Progress in Probability, 56, pages 55–69. Birkhäuser, 2003.
Mathematical Reviews (MathSciNet): MR2073426
[12] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research, 7 :2541–2563, 2006.
Mathematical Reviews (MathSciNet): MR2274449
[13] M. Yuan and Y. Lin. On the non-negative garrotte estimator., Journal of The Royal Statistical Society Series B, 69(2):143–161, 2007.
Mathematical Reviews (MathSciNet): MR2325269
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00581.x
[14] H. Zou. The adaptive Lasso and its oracle properties., Journal of the American Statistical Association, 101 :1418–1429, December 2006.
Mathematical Reviews (MathSciNet): MR2279469
Zentralblatt MATH: 1171.62326
Digital Object Identifier: doi:10.1198/016214506000000735
[15] M. J. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using, 1-constrained quadratic programming. IEEE Transactions on Information Theory, 55(5) :2183, 2009.
Mathematical Reviews (MathSciNet): MR2729873
Digital Object Identifier: doi:10.1109/TIT.2009.2016018
[16] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics, 37(4) :1705–1732, 2009.
Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
[17] J. F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal., Numerical Optimization Theoretical and Practical Aspects. Springer, 2003.
[18] J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In, Proceedings of the 21st Annual Conference on Learning Theory (COLT), pages 263–274, 2008.
[19] P. McCullagh and J. A. Nelder., Generalized linear models. Chapman & Hall/CRC, 1989.
Mathematical Reviews (MathSciNet): MR727836
[20] B. Efron. The estimation of prediction error: Covariance penalties and cross-validation., Journal of the American Statistical Association, 99(467):619–633, 2004.
Mathematical Reviews (MathSciNet): MR2090899
Zentralblatt MATH: 1117.62324
Digital Object Identifier: doi:10.1198/016214504000000692
[21] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds., Journal of the American Statistical Association, 101(473):138–156, 2006.
Mathematical Reviews (MathSciNet): MR2268032
Zentralblatt MATH: 1118.62330
Digital Object Identifier: doi:10.1198/016214505000000907
[22] G. Wahba., Spline Models for Observational Data. SIAM, 1990.
Mathematical Reviews (MathSciNet): MR1045442
Zentralblatt MATH: 0813.62001
[23] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions., Journal of Mathematical Analysis and Applications, 33:82–95, 1971.
Mathematical Reviews (MathSciNet): MR290013
Zentralblatt MATH: 0201.39702
Digital Object Identifier: doi:10.1016/0022-247X(71)90184-3
[24] G. H. Golub and C. F. Van Loan., Matrix Computations. Johns Hopkins University Press, 1996.
Mathematical Reviews (MathSciNet): MR1417720
[25] C. Gu., Smoothing spline ANOVA models. Springer, 2002.
Mathematical Reviews (MathSciNet): MR1876599
[26] K. Sridharan, N. Srebro, and S. Shalev-Shwartz. Fast rates for regularized objectives. In, Advances in Neural Information Processing Systems (NIPS), 2008.
[27] I. Steinwart, D. Hush, and C. Scovel. A new concentration result for regularized risk minimizers., High Dimensional Probability: Proceedings of the Fourth International Conference, 51:260–275, 2006.
Mathematical Reviews (MathSciNet): MR2387774
Zentralblatt MATH: 1127.68090
Digital Object Identifier: doi:10.1214/074921706000000897
[28] S. Arlot and F. Bach. Data-driven calibration of linear estimators with minimal penalties. In, Advances in Neural Information Processing Systems (NIPS), 2009.
[29] T. J. Hastie and R. J. Tibshirani., Generalized Additive Models. Chapman & Hall, 1990.
Mathematical Reviews (MathSciNet): MR1082147
[30] Z. Harchaoui, F. R. Bach, and E. Moulines. Testing for homogeneity with kernel fisher discriminant analysis. Technical Report 00270806, HAL, 2008.
[31] R. Shibata. Statistical aspects of model selection. In, From Data to Model, pages 215–240. Springer, 1989.
[32] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions., Psychometrika, 52(3):345–370, 1987.
Mathematical Reviews (MathSciNet): MR914460
Digital Object Identifier: doi:10.1007/BF02294361
[33] P. Liang, F. Bach, G. Bouchard, and M. I. Jordan. An asymptotic analysis of smooth regularizers. In, Advances in Neural Information Processing Systems (NIPS), 2009.
[34] P. Craven and G. Wahba. Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation., Numerische Mathematik, 31(4):377–403, 1978/79.
Mathematical Reviews (MathSciNet): MR516581
Zentralblatt MATH: 0377.65007
Digital Object Identifier: doi:10.1007/BF01404567
[35] K.-C. Li. Asymptotic optimality for $C_ p$, $C_ L$, cross-validation and generalized cross-validation: discrete index set., Annals of Statistics, 15(3):958–975, 1987.
Mathematical Reviews (MathSciNet): MR902239
Digital Object Identifier: doi:10.1214/aos/1176350486
Project Euclid: euclid.aos/1176350486
[36] F. Bach. Consistency of the group Lasso and multiple kernel learning., Journal of Machine Learning Research, 9 :1179–1225, 2008.
Mathematical Reviews (MathSciNet): MR2417268
[37] C. L. Mallows. Some comments on, Cp. Technometrics, 15:661–675, 1973.
Mathematical Reviews (MathSciNet): MR1365719
Digital Object Identifier: doi:10.2307/1269729
[38] F. O’Sullivan, B. S. Yandell, and W. J. Raynor Jr. Automatic smoothing of regression functions in generalized linear models., Journal of the American Statistical Association, pages 96–103, 1986.
[39] R. Tibshirani. Regression shrinkage and selection via the Lasso., Journal of The Royal Statistical Society Series B, 58(1):267–288, 1996.
Mathematical Reviews (MathSciNet): MR1379242
[40] T. Zhang. Some sharp performance bounds for least squares regression with, 1 regularization. Annals of Statistics, 37(5) :2109–2144, 2009.
Mathematical Reviews (MathSciNet): MR2543687
Zentralblatt MATH: 1173.62029
Digital Object Identifier: doi:10.1214/08-AOS659
Project Euclid: euclid.aos/1247663750
[41] A. Juditsky and A. S. Nemirovski. On verifiable sufficient conditions for sparse signal recovery via, 1 minimization. Technical Report 0809.2650, arXiv, 2008.
[42] A. d’Aspremont and L. El Ghaoui. Testing the nullspace property using semidefinite programming. Technical Report 0807.3520, arXiv, 2008.
[43] P. Chaudhuri and P. A. Mykland. Nonlinear experiments: Optimal design and inference based on likelihood., Journal of the American Statistical Association, 88(422):538–546, 1993.
Mathematical Reviews (MathSciNet): MR1224379
Zentralblatt MATH: 0774.62079
Digital Object Identifier: doi:10.2307/2290334
[44] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables., Journal of The Royal Statistical Society Series B, 68(1):49–67, 2006.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[45] F. Bach. Bolasso: model consistent Lasso estimation through the bootstrap. In, Proceedings of the International Conference on Machine Learning (ICML), 2008.
[46] N. Meinshausen and P. Bühlmann. Stability selection. Technical report, arXiv: 0809.2932, 2008.
[47] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso., Annals of statistics, 34(3) :1436, 2006.
[48] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation., Journal of Machine Learning Research, 9:485–516, 2008.
[49] J. M. Borwein and A. S. Lewis., Convex Analysis and Nonlinear Optimization. Number 3 in CMS Books in Mathematics. Springer, 2000.
Mathematical Reviews (MathSciNet): MR1757448

2012 © Institute of Mathematical Statistics

Electronic Journal of Statistics

Electronic Journal of Statistics