The Annals of Statistics

Nonconcave penalized likelihood with a diverging number of parameters

Abstract

A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed by Fan and Li to simultaneously estimate parameters and select important variables. They demonstrated that this class of procedures has an oracle property when the number of parameters is finite. However, in most model selection problems the number of parameters should be large and grow with the sample size. In this paper some asymptotic properties of the nonconcave penalized likelihood are established for situations in which the number of parameters tends to ∞ as the sample size increases. Under regularity conditions we have established an oracle property and the asymptotic normality of the penalized likelihood estimators. Furthermore, the consistency of the sandwich formula of the covariance matrix is demonstrated. Nonconcave penalized likelihood ratio statistics are discussed, and their asymptotic distributions under the null hypothesis are obtained by imposing some mild conditions on the penalty functions. The asymptotic results are augmented by a simulation study, and the newly developed methodology is illustrated by an analysis of a court case on the sexual discrimination of salary.

Article information

Source
Ann. Statist. Volume 32, Number 3 (2004), 928-961.

Dates
First available in Project Euclid: 24 May 2004

Permanent link to this document
http://projecteuclid.org/euclid.aos/1085408491

Digital Object Identifier
doi:10.1214/009053604000000256

Mathematical Reviews number (MathSciNet)
MR2065194

Zentralblatt MATH identifier
1092.62031

Citation

Fan, Jianqing; Peng, Heng. Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 (2004), no. 3, 928--961. doi:10.1214/009053604000000256. http://projecteuclid.org/euclid.aos/1085408491.

References

• Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proc. 2nd International Symposium on Information Theory (V. Petrov and F. Csáki, eds.) 267--281. Akadémiai Kiadó, Budapest.
• Albright, S. C., Winston, W. L. and Zappe, C. J. (1999). Data Analysis and Decision Making with Microsoft Excel. Duxbury, Pacific Grove, CA.
• Antoniadis, A. (1997). Wavelets in statistics: A review (with discussion). J. Italian Statist. Soc. 6 97--144.
• Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). J. Amer. Statist. Assoc. 96 939--967.
• Bai, Z. D., Rao, C. R. and Wu, Y. (1999). Model selection with data-oriented penalty. J. Statist. Plann. Inference 77 103--117.
• Blake, A. (1989). Comparison of the efficiency of deterministic and stochastic algorithms for visual reconstruction. IEEE Trans. Pattern Anal. Machine Intell. 11 2--12.
• Blake, A. and Zisserman, A. (1987). Visual Reconstruction. MIT Press, Cambridge, MA.
• Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350--2383.
• Cox, D. D. and O'Sullivan, F. (1990). Asymptotic analysis of penalized likelihood and related estimators. Ann. Statist. 18 1676--1695.
• De Boor, C. (1978). A Practical Guide to Splines. Springer, New York.
• Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimensionality. Aide-Memoire of a Lecture at AMS Conference on Math Challenges of the 21st Century.
• Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425--455.
• Fan, J. (1997). Comments on Wavelets in statistics: A review,'' by A. Antoniadis. J. Italian Statist. Soc. 6 131--138.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
• Fan, J. and Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. Ann. Statist. 30 74--99.
• Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometric regression tools (with discussion). Technometrics 35 109--148.
• Fu, W. J. (1998). Penalized regression: The bridge versus the LASSO. J. Comput. Graph. Statist. 7 397--416.
• Gilks, W. R., Richardson, S. and Spiegelhalter, D. J., eds. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC Press, London.
• Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6 721--741.
• Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London.
• Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799--821.
• Kauermann, G. and Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. J. Amer. Statist. Assoc. 96 1387--1396.
• Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356--1378.
• Lehmann, E. L. (1983). Theory of Point Estimation. Wiley, New York.
• Mallows, C. L. (1973). Some comments on $C_p$. Technometrics 12 661--675.
• McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
• Murphy, S. A. (1993). Testing for a time dependent coefficient in Cox's regression model. Scand. J. Statist. 20 35--50.
• Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1--32.
• Nikolova, M., Idier, J. and Mohammad-Djafari, A. (1998). Inversion of large-support ill-posed linear operators using a piecewise Gaussian MRF. IEEE Trans. Image Process. 7 571--585.
• Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16 356--366.
• Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461--464.
• Shen, X. T. and Ye, J. M. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210--221.
• Stone, C. J., Hansen, M., Kooperberg, C. and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371--1470.
• Tibshirani, R. J. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
• Tibshirani, R. J. (1997). The Lasso method for variable selection in the Cox model. Statist. Medicine 16 385--395.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press.
• Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.