The Annals of Statistics

Fence methods for mixed model selection

Jiming Jiang, J. Sunil Rao, Zhonghua Gu, and Thuan Nguyen

Full-text: Open access

Abstract

Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure. The methods are illustrated through simulation studies and real data analysis.

Article information

Source
Ann. Statist., Volume 36, Number 4 (2008), 1669-1692.

Dates
First available in Project Euclid: 16 July 2008

Permanent link to this document
https://projecteuclid.org/euclid.aos/1216237296

Digital Object Identifier
doi:10.1214/07-AOS517

Mathematical Reviews number (MathSciNet)
MR2435452

Zentralblatt MATH identifier
1142.62047

Subjects
Primary: 62F07: Ranking and selection 62F35: Robustness and adaptive procedures
Secondary: 62F40: Bootstrap, jackknife and other resampling methods

Keywords
Adaptive fence consistency F-B fence finite sample performance GLMM linear mixed model model selection

Citation

Jiang, Jiming; Rao, J. Sunil; Gu, Zhonghua; Nguyen, Thuan. Fence methods for mixed model selection. Ann. Statist. 36 (2008), no. 4, 1669--1692. doi:10.1214/07-AOS517. https://projecteuclid.org/euclid.aos/1216237296


Export citation

References

  • [1] Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Csaki, eds.) 267–281. Akademiai Kiadó, Budapest.
  • [2] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
  • [3] Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62 1198–1211.
  • [4] Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 80 28–36.
  • [5] Bozdogan, H. (1994). Editor’s general preface. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach (H. Bozdogan et al., eds.) 3 ix–xii. Kluwer Academic Publishers, Dordrecht.
  • [6] Datta, G. S. and Lahiri, P. (2001). Discussion on “Scales of evidence for model selection: Fisher versus Jeffreys,” by B. Efron and A. Gous. In Model Selection (P. Lahiri, ed.) 208–256. IMS, Beachwood, OH.
  • [7] de Leeuw, J. (1992). Introduction to Akaike (1973) Information theory and an extension of the maximum likelihood principle. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 1 599–609. Springer, London.
  • [8] Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedure to census data. J. Amer. Statist. Assoc. 74 269–277.
  • [9] Fabrizi, E. and Lahiri, P. (2004). A new approximation to the Bayes information criterion in finite population sampling. Technical report, Dept. Mathematics, Univ. Maryland.
  • [10] Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41 190–195.
  • [11] Hartley, H. O. and Rao, J. N. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54 93–108.
  • [12] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • [13] Harville, D. A. (1977). Maximum likelihood approaches to variance components estimation and related problems. J. Amer. Statist. Assoc. 72 320–340.
  • [14] Hodges, J. S. and Sargent, D. J. (2001). Counting degrees of freedom in hierarchical and other richly-parameterised models. Biometrika 88 367–379.
  • [15] Jiang, J. and Zhang, W. (2001). Robust estimation in generalized linear mixed models. Biometrika 88 753–765.
  • [16] Jiang, J. and Rao, J. S. (2003). Consistent procedures for mixed linear model selection. Sankhyā 65 23–42.
  • [17] Jiang, J., Rao, J. S., Gu, Z. and Nguyen, T. (2006). Fence methods for mixed model selection. Technical report. Available at http://anson.ucdavis.edu/~jiang/jp10.r3.pdf.
  • [18] Lahiri, P., ed. (2001). Model Selection. IMS, Beachwood, OH.
  • [19] Meza, J. and Lahiri, P. (2005). A note on the Cp statistic under the nested error regression model. Survey Methodology 31 105–109.
  • [20] Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of analysis of variance. Ann. Statist. 5 746–762.
  • [21] Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 758–765.
  • [22] Owen, A. (2007). The pigeonhole bootstrap. Ann. Appl. Statist. 1 386–411.
  • [23] Pebley, A. R., Goldman, N. and Rodriguez, G. (1996). Prenatal and delivery care and childhood immunization in Guatamala; do family and community matter? Demography 33 231–247.
  • [24] Rao, J. N. K. (2003). Small Area Estimation. Wiley, New York.
  • [25] Rodriguez, G. and Goldman, N. (2001). Improved estimation procedure for multilevel models with binary responses: A case-study. J. Roy. Statist. Soc. Ser. A 164 339–355.
  • [26] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [27] Shibata, R. (1984). Approximate efficiency of a selection procedure for the number of regression variables. Biometrika 71 43–49.
  • [28] Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika 92 351–370.
  • [29] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131.