### Variable selection using MM algorithms

David R. Hunter and Runze Li
Source: Ann. Statist. Volume 33, Number 4 (2005), 1617-1642.

#### Abstract

Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize–maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton–Raphson-like aspect of these algorithms to propose a sandwich estimator for the standard errors of the estimators. Our method performs well in numerical tests.

First Page:
Primary Subjects: 62J12, 65C20
Full-text: Open access

Permanent link to this document: http://projecteuclid.org/euclid.aos/1123250224
Digital Object Identifier: doi:10.1214/009053605000000200
Mathematical Reviews number (MathSciNet): MR2166557
Zentralblatt MATH identifier: 1078.62028

### References

Antoniadis, A. (1997). Wavelets in statistics: A review (with discussion). J. Italian Statistical Society 6 97--144.
Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations (with discussion). J. Amer. Statist. Assoc. 96 939--967.
Mathematical Reviews (MathSciNet): MR1946364
Digital Object Identifier: doi:10.1198/016214501753208942
Zentralblatt MATH: 1072.62561
Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika. To appear.
Mathematical Reviews (MathSciNet): MR2201361
Zentralblatt MATH: 1094.62123
Digital Object Identifier: doi:10.1093/biomet/92.2.303
Cox, D. R. (1975). Partial likelihood. Biometrika 62 269--276.
Mathematical Reviews (MathSciNet): MR400509
Zentralblatt MATH: 0312.62002
Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377--403.
Mathematical Reviews (MathSciNet): MR516581
Digital Object Identifier: doi:10.1007/BF01404567
Zentralblatt MATH: 0377.65007
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1--38.
Mathematical Reviews (MathSciNet): MR501537
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
Mathematical Reviews (MathSciNet): MR1946581
Digital Object Identifier: doi:10.1198/016214501753382273
Zentralblatt MATH: 1073.62547
Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. Ann. Statist. 30 74--99.
Mathematical Reviews (MathSciNet): MR1892656
Digital Object Identifier: doi:10.1214/aos/1015362185
Project Euclid: euclid.aos/1015362185
Zentralblatt MATH: 1012.62106
Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710--723.
Mathematical Reviews (MathSciNet): MR2090905
Digital Object Identifier: doi:10.1198/016214504000001060
Zentralblatt MATH: 1117.62329
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928--961.
Mathematical Reviews (MathSciNet): MR2065194
Digital Object Identifier: doi:10.1214/009053604000000256
Project Euclid: euclid.aos/1085408491
Zentralblatt MATH: 1092.62031
Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109--148.
Heiser, W. J. (1995). Convergent computation by iterative majorization: Theory and applications in multidimensional data analysis. In Recent Advances in Descriptive Multivariate Analysis (W. J. Krzanowski ed.) 157--189. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet): MR1380319
Hestenes, M. R. (1975). Optimization Theory: The Finite Dimensional Case. Wiley, New York.
Mathematical Reviews (MathSciNet): MR461238
Zentralblatt MATH: 0327.90015
Hunter, D. R. and Lange, K. (2000). Rejoinder to discussion of Optimization transfer using surrogate objective functions.'' J. Comput. Graph. Statist. 9 52--59.
Mathematical Reviews (MathSciNet): MR1819866
Kauermann, G. and Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. J. Amer. Statist. Assoc. 96 1387--1396.
Mathematical Reviews (MathSciNet): MR1946584
Digital Object Identifier: doi:10.1198/016214501753382309
Zentralblatt MATH: 1073.62539
Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 57 425--437.
Mathematical Reviews (MathSciNet): MR1323348
Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Statist. 9 1--59.
Mathematical Reviews (MathSciNet): MR1819865
McLachlan, G. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1417721
Meng, X.-L. (1994). On the rate of convergence of the ECM algorithm. Ann. Statist. 22 326--339.
Mathematical Reviews (MathSciNet): MR1272086
Meng, X.-L. and Van Dyk, D. A. (1997). The EM algorithm---An old folk song sung to a fast new tune (with discussion). J. Roy. Statist. Soc. Ser. B 59 511--567.
Mathematical Reviews (MathSciNet): MR1452025
Digital Object Identifier: doi:10.1111/1467-9868.00082
Miller, A. J. (2002). Subset Selection in Regression, 2nd ed. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR2001193
Zentralblatt MATH: 1051.62060
Ortega, J. M. (1990). Numerical Analysis: A Second Course, 2nd ed. SIAM, Philadelphia.
Mathematical Reviews (MathSciNet): MR1037261
Zentralblatt MATH: 0701.65002
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
Mathematical Reviews (MathSciNet): MR1379242
Wu, C.-F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95--103.
Mathematical Reviews (MathSciNet): MR684867