The Annals of Statistics

Fence methods for mixed model selection

Jiming Jiang, J. Sunil Rao, Zhonghua Gu, and Thuan Nguyen

Source: Ann. Statist. Volume 36, Number 4 (2008), 1669-1692.

Abstract

Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure. The methods are illustrated through simulation studies and real data analysis.

Primary Subjects: 62F07, 62F35
Secondary Subjects: 62F40
Keywords: Adaptive fence; consistency; F-B fence; finite sample performance; GLMM; linear mixed model; model selection

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1216237296
Digital Object Identifier: doi:10.1214/07-AOS517
Mathematical Reviews number (MathSciNet): MR2435452
Zentralblatt MATH identifier: 1142.62047

References

[1] Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Csaki, eds.) 267–281. Akademiai Kiadó, Budapest.
[2] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723.
Mathematical Reviews (MathSciNet): MR423716
Digital Object Identifier: doi:10.1109/TAC.1974.1100705
[3] Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62 1198–1211.
[4] Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 80 28–36.
[5] Bozdogan, H. (1994). Editor’s general preface. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach (H. Bozdogan et al., eds.) 3 ix–xii. Kluwer Academic Publishers, Dordrecht.
[6] Datta, G. S. and Lahiri, P. (2001). Discussion on “Scales of evidence for model selection: Fisher versus Jeffreys,” by B. Efron and A. Gous. In Model Selection (P. Lahiri, ed.) 208–256. IMS, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR2000754
Digital Object Identifier: doi:10.1214/lnms/1215540972
[7] de Leeuw, J. (1992). Introduction to Akaike (1973) Information theory and an extension of the maximum likelihood principle. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 1 599–609. Springer, London.
[8] Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedure to census data. J. Amer. Statist. Assoc. 74 269–277.
Mathematical Reviews (MathSciNet): MR548019
Digital Object Identifier: doi:10.2307/2286322
[9] Fabrizi, E. and Lahiri, P. (2004). A new approximation to the Bayes information criterion in finite population sampling. Technical report, Dept. Mathematics, Univ. Maryland.
[10] Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41 190–195.
Mathematical Reviews (MathSciNet): MR547244
[11] Hartley, H. O. and Rao, J. N. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54 93–108.
Mathematical Reviews (MathSciNet): MR216684
Zentralblatt MATH: 0178.22001
[12] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1082147
Zentralblatt MATH: 0747.62061
[13] Harville, D. A. (1977). Maximum likelihood approaches to variance components estimation and related problems. J. Amer. Statist. Assoc. 72 320–340.
Mathematical Reviews (MathSciNet): MR451550
Digital Object Identifier: doi:10.2307/2286796
[14] Hodges, J. S. and Sargent, D. J. (2001). Counting degrees of freedom in hierarchical and other richly-parameterised models. Biometrika 88 367–379.
Mathematical Reviews (MathSciNet): MR1844837
Zentralblatt MATH: 0984.62045
Digital Object Identifier: doi:10.1093/biomet/88.2.367
[15] Jiang, J. and Zhang, W. (2001). Robust estimation in generalized linear mixed models. Biometrika 88 753–765.
Mathematical Reviews (MathSciNet): MR1859407
Digital Object Identifier: doi:10.1093/biomet/88.3.753
[16] Jiang, J. and Rao, J. S. (2003). Consistent procedures for mixed linear model selection. Sankhyā 65 23–42.
Mathematical Reviews (MathSciNet): MR2016775
[17] Jiang, J., Rao, J. S., Gu, Z. and Nguyen, T. (2006). Fence methods for mixed model selection. Technical report. Available at http://anson.ucdavis.edu/~jiang/jp10.r3.pdf.
[18] Lahiri, P., ed. (2001). Model Selection. IMS, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR2000750
[19] Meza, J. and Lahiri, P. (2005). A note on the Cp statistic under the nested error regression model. Survey Methodology 31 105–109.
[20] Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of analysis of variance. Ann. Statist. 5 746–762.
Mathematical Reviews (MathSciNet): MR448661
Digital Object Identifier: doi:10.1214/aos/1176343897
Project Euclid: euclid.aos/1176343897
[21] Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 758–765.
Mathematical Reviews (MathSciNet): MR740928
Digital Object Identifier: doi:10.1214/aos/1176346522
Project Euclid: euclid.aos/1176346522
[22] Owen, A. (2007). The pigeonhole bootstrap. Ann. Appl. Statist. 1 386–411.
[23] Pebley, A. R., Goldman, N. and Rodriguez, G. (1996). Prenatal and delivery care and childhood immunization in Guatamala; do family and community matter? Demography 33 231–247.
[24] Rao, J. N. K. (2003). Small Area Estimation. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1953089
[25] Rodriguez, G. and Goldman, N. (2001). Improved estimation procedure for multilevel models with binary responses: A case-study. J. Roy. Statist. Soc. Ser. A 164 339–355.
Mathematical Reviews (MathSciNet): MR1830703
Digital Object Identifier: doi:10.1111/1467-985X.00206
[26] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Mathematical Reviews (MathSciNet): MR468014
Digital Object Identifier: doi:10.1214/aos/1176344136
Project Euclid: euclid.aos/1176344136
[27] Shibata, R. (1984). Approximate efficiency of a selection procedure for the number of regression variables. Biometrika 71 43–49.
Mathematical Reviews (MathSciNet): MR738324
Zentralblatt MATH: 0543.62053
Digital Object Identifier: doi:10.1093/biomet/71.1.43
[28] Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika 92 351–370.
Mathematical Reviews (MathSciNet): MR2201364
Zentralblatt MATH: 05039583
Digital Object Identifier: doi:10.1093/biomet/92.2.351
[29] Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131.
Mathematical Reviews (MathSciNet): MR1614596
Digital Object Identifier: doi:10.2307/2669609

2009 © Institute of Mathematical Statistics