## Statistical Science

### Logistic Regression: From Art to Science

#### Abstract

A high quality logistic regression model contains various desirable properties: predictive power, interpretability, significance, robustness to error in data and sparsity, among others. To achieve these competing goals, modelers incorporate these properties iteratively as they hone in on a final model. In the period 1991–2015, algorithmic advances in Mixed-Integer Linear Optimization (MILO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MILO problems. Motivated by this speedup, we propose modeling logistic regression problems algorithmically with a mixed integer nonlinear optimization (MINLO) approach in order to explicitly incorporate these properties in a joint, rather than sequential, fashion. The resulting MINLO is flexible and can be adjusted based on the needs of the modeler. Using both real and synthetic data, we demonstrate that the overall approach is generally applicable and provides high quality solutions in realistic timelines as well as a guarantee of suboptimality. When the MINLO is infeasible, we obtain a guarantee that imposing distinct statistical properties is simply not feasible.

#### Article information

Source
Statist. Sci., Volume 32, Number 3 (2017), 367-384.

Dates
First available in Project Euclid: 1 September 2017

https://projecteuclid.org/euclid.ss/1504253122

Digital Object Identifier
doi:10.1214/16-STS602

Mathematical Reviews number (MathSciNet)
MR3696001

Zentralblatt MATH identifier
06870251

#### Citation

Bertsimas, Dimitris; King, Angela. Logistic Regression: From Art to Science. Statist. Sci. 32 (2017), no. 3, 367--384. doi:10.1214/16-STS602. https://projecteuclid.org/euclid.ss/1504253122

#### References

• [1] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
• [2] Bache, K. and Lichman, M. (2014). UCI machine learning repository. Available at http://archive.ics.uci.edu/ml. Accessed: 2014-08-20.
• [3] Ben-Tal, A., El Ghaoui, L. and Nemirovski, A. (2009). Robust Optimization. Princeton Univ. Press, Princeton, NJ.
• [4] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
• [5] Bertsimas, D., Brown, D. B. and Caramanis, C. (2011). Theory and applications of robust optimization. SIAM Rev. 53 464–501.
• [6] Bertsimas, D., Dunn, J., Pawlowski, C. and Zhuo, Y. D. (2017). Robust classification. J. Mach. Learn. Res. To appear.
• [7] Bertsimas, D. and King, A. (2017). Supplement to “Logistic Regression: From Art to Science.” DOI:10.1214/16-STS602SUPP.
• [8] Bertsimas, D., King, A. and Mazumder, R. (2016). Best subset selection via a modern optimization lens. Ann. Statist. 44 813–852.
• [9] Bezanson, J., Karpinski, S., Shah, V. B. and Edelman, A. (2012). Julia: A fast dynamic language for technical computing. Preprint. Available at https://arxiv.org/abs/1209.5145.
• [10] Bianco, A. M. and Yohai, V. J. (1996). Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis, and Computer Intensive Methods (Schloss Thurnau, 1994). Lect. Notes Stat. 109 17–34. Springer, New York.
• [11] Bonami, P., Kilinç, M. and Linderoth, J. (2012). Algorithms and software for convex mixed integer nonlinear programs. In Mixed Integer Nonlinear Programming 1–39. Springer, Berlin.
• [12] Box, G. E. P. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics 4 531–550.
• [13] Bussieck, M. R. and Vigerske, S. (2010). Minlp Solver Software. In Wiley Encyclopedia of Operations Research and Management Science. Wiley Online Library.
• [14] Carroll, R. J. and Pederson, S. (1993). On robustness in the logistic regression model. J. R. Stat. Soc. Ser. B. Stat. Methodol. 55 693–706.
• [15] Chatterjee, S., Hadi, A. S. and Price, B. (2012). Regression Analysis by Example, 5th ed. Wiley, New York.
• [16] Cramer, J. S. (2002). The origins of logistic regression. Technical report, Tinbergen Institute.
• [17] Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression. Comput. Statist. Data Anal. 44 273–295. Special issue in honour of Stan Azen: a birthday celebration.
• [18] Czyzyk, J., Mesnier, M. P. and Moré, J. J. (1998). The neos server. J. Comput. Sci. Eng. 5 68–75.
• [19] Dobson, A. J. and Barnett, A. G. (2008). An Introduction to Generalized Linear Models, 3rd ed. CRC Press, Boca Raton, FL.
• [20] Dolan, E. D. (2001). Neos server 4.0 administrative guide. Preprint. Available at arXiv:cs/0107034.
• [21] Duran, M. A. and Grossmann, I. E. (1986). An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36 307–339.
• [22] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
• [23] Eldar, Y. C. and Kutyniok, G. (2012). Compressed Sensing: Theory and Applications. Cambridge Univ. Press, London.
• [24] Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25 1150–1159.
• [25] Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.
• [26] Free Software Foundation (2015). GNU linear programming kit. Available at http://www.gnu.org/software/glpk/glpk.html. Accessed: 2015-03-06.
• [27] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
• [28] Furnival, G. M. and Wilson, R. W. (1974). Regressions by leaps and bounds. Technometrics 16 499–511.
• [29] Gropp, W. and Moré, J. (1997). Optimization environments and the neos server. In Approximation Theory and Optimization 167–182. Cambridge Univ. Press, Cambridge, UK.
• [30] Hilbe, J. M. (2011). Logistic Regression Models. CRC Press, Boca Raton, FL.
• [31] Hosmer, D. W., Jovanovic, B. and Lemeshow, S. (1989). Best subsets logistic regression. Biometrics 45 1265–1270.
• [32] Hosmer Jr., D. W. and Lemeshow, S. (2013). Applied Logistic Regression. Wiley, Hoboken, NJ.
• [33] IBM ILOG CPLEX Optimization Studio (2015). Cplex optimizer. Available at http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html. Accessed: 2015-03-06.
• [34] Gurobi Inc. (2014). Gurobi optimizer reference manual. Available at http://www.gurobi.com. Accessed: 2014-08-20.
• [35] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375–390.
• [36] Koh, K., Kim, S.-J. and Boyd, S. P. (2007). An interior-point method for large-scale $l_{1}$-regularized logistic regression. J. Mach. Learn. Res. 8 1519–1555.
• [37] Krishnapuram, B., Carin, L., Figueiredo, M. A. T. and Hartemink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27 957–968.
• [38] Krishnapuram, B., Harternink, A. J., Carin, L. and Figueiredo, M. A. T. (2004). A Bayesian approach to joint feature selection and classifier design. IEEE Trans. Pattern Anal. Mach. Intell. 26 1105–1111.
• [39] Lee, S.-I., Lee, H., Abbeel, P. and Ng, A. Y. (2006). Efficient $\ell_{1}$ regularized logistic regression. In Proceedings of the National Conference on Artificial Intelligence 21 401. AAAI Press, Menlo Park, CA.
• [40] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the Lasso. Ann. Statist. 42 413–468.
• [41] Lubin, M. and Dunning, I. (2015). Computing in operations research using Julia. INFORMS J. Comput. 27 238–248.
• [42] Ma, S., Song, X. and Huang, J. (2007). Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics 8 60.
• [43] Maronna, R., Martin, R. D. and Yohai, V. (2006). Robust Statistics. Wiley, Chichester.
• [44] Meier, L., Van De Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 53–71.
• [45] Menard, S. (2002). Applied Logistic Regression Analysis 106. Sage, Thousand Oaks, CA.
• [46] Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist. 9 705–724.
• [47] Ryan, T. P. (2009). Modern Regression Methods, 2nd ed. Wiley, Hoboken, NJ.
• [48] Sato, T., Takano, Y., Miyashiro, R. and Yoshise, A. (2016). Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl. 64 865–880.
• [49] Shafieezadeh-Abadeh, S., Mohajerin, P. and Kuhn, D. Distributionally robust logistic regression. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15), Montreal, Canada, December 0712, 2015 (C. Cortes, D. D. Lee, M. Sugiyama and R. Garnett, eds.) 1576–1584. MIT Press, Cambridge, MA.
• [50] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group Lasso. J. Comput. Graph. Statist. 22 231–245.
• [51] Tabachnick, B. G., Fidell, L. S. et al. (2001). Using Multivariate Statistics. Allyn and Bacon, Boston, MA.
• [52] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1 211–244.
• [53] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [54] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
• [55] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

#### Supplemental materials

• Supplement to “Logistic Regression: From Art to Science”.