The Annals of Statistics

Sure independence screening in generalized linear models with NP-dimensionality

Jianqing Fan and Rui Song

Full-text: Open access

Abstract

Ultrahigh-dimensional variable selection plays an increasingly important role in contemporary scientific discoveries and statistical research. Among others, Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008) 849–911] propose an independent screening framework by ranking the marginal correlations. They showed that the correlation ranking procedure possesses a sure independence screening property within the context of the linear model with Gaussian covariates and responses. In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or the maximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008) 849–911] as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method in a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we establish an exponential inequality for the quasi-maximum likelihood estimator which is useful for high-dimensional statistical learning.

Article information

Source
Ann. Statist. Volume 38, Number 6 (2010), 3567-3604.

Dates
First available: 30 November 2010

Permanent link to this document
http://projecteuclid.org/euclid.aos/1291126966

Digital Object Identifier
doi:10.1214/10-AOS798

Zentralblatt MATH identifier
05838981

Mathematical Reviews number (MathSciNet)
MR2766861

Subjects
Primary: 68Q32: Computational learning theory [See also 68T05] 62J12: Generalized linear models
Secondary: 62E99: None of the above, but in this section 60F10: Large deviations

Keywords
Generalized linear models independent learning sure independent screening variable selection

Citation

Fan, Jianqing; Song, Rui. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics 38 (2010), no. 6, 3567--3604. doi:10.1214/10-AOS798. http://projecteuclid.org/euclid.aos/1291126966.


Export citation

References

  • Bickel, P. J. and Doksum, K. A. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc. 76 296–311.
  • Bickel, P. J. and Doksum, K. A. (2001). Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. Prentice Hall, Upper Saddle River, NJ.
  • Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. J. R. Stat. Soc. Ser. B 26 211–246.
  • Candes, E. and Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2404.
  • Cox, D. R. (1972). Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 34 187–220.
  • Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist. 13 342–368.
  • Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • Fan, J., Samworth, R. and Wu, Y. (2009). Ultra-dimensional variable selection via independent learning: Beyond the linear model. J. Mach. Learn. Res. 10 1829–1853.
  • Frank, I. E. and Friedman, J. H. (1993). Astatistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
  • Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear preditor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Hall, P. and Miller, H. (2009). Using generalised correlation to effect variable selection in very high dimensional problems. J. Comput. Graph. Statist. 18 533.
  • Hall, P., Titterington, D. M. and Xue, J.-H. (2009). Tilting methods for assessing the influence of components in a classifier. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 783–803.
  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
  • Kosorok, M. R., Lee, B. L. and Fine, J. P. (2004). Robust inference for univariate proportional hazards frailty regression models. Ann. Statist. 32 1448–1491.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin.
  • Massart, P. (2000). About the constants in talagrands concentration inequalities for empirical processes. Ann. Probab. 28 863–884.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • van de Geer, S. (2002). M-estimation using penalties or sieves. J. Statist. Plann. Inference 108 55–69.
  • van de Geer, S. (2008). High-dimensional generalized linear modelsand the Lasso. Ann. Statist. 36 614–645.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–26.
  • Zeng, D. and Lin, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 507–564.
  • Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.