The Annals of Statistics

Analysis of variance—why it is more important than ever

Andrew Gelman
Source: Ann. Statist. Volume 33, Number 1 (2005), 1-53.

Abstract

Analysis of variance (ANOVA) is an extremely important method in exploratory and confirmatory data analysis. Unfortunately, in complex problems (e.g., split-plot designs), it is not always easy to set up an appropriate ANOVA. We propose a hierarchical analysis that automatically gives the correct ANOVA comparisons even in complex scenarios. The inferences for all means and variances are performed under a model with a separate batch of effects for each row of the ANOVA table.

We connect to classical ANOVA by working with finite-sample variance components: fixed and random effects models are characterized by inferences about existing levels of a factor and new levels, respectively. We also introduce a new graphical display showing inferences about the standard deviations of each batch of effects.

We illustrate with two examples from our applied data analysis, first illustrating the usefulness of our hierarchical computations and displays, and second showing how the ideas of ANOVA are helpful in understanding a previously fit hierarchical model.

First Page: Show Hide
Primary Subjects: 62J10, 62J07, 62F15, 62J05, 62J12
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1112967698
Digital Object Identifier: doi:10.1214/009053604000001048
Mathematical Reviews number (MathSciNet): MR2157795
Zentralblatt MATH identifier: 1064.62082

References

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669--679.
Mathematical Reviews (MathSciNet): MR1224394
Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11 581--598.
Mathematical Reviews (MathSciNet): MR637937
Digital Object Identifier: doi:10.1016/0047-259X(81)90099-3
Zentralblatt MATH: 0474.60044
Bafumi, J., Gelman, A. and Park, D. K. (2002). State-level opinions from national polls. Technical report, Dept. Political Science, Columbia Univ.
Besag, J. and Higdon, D. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 61 691--746.
Mathematical Reviews (MathSciNet): MR1722238
Digital Object Identifier: doi:10.1111/1467-9868.00201
Zentralblatt MATH: 0951.62091
Boscardin, W. J. (1996). Bayesian analysis for some hierarchical linear models. Ph.D. dissertation, Dept. Statistics, Univ. California, Berkeley.
Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. Addison--Wesley, Reading, MA.
Mathematical Reviews (MathSciNet): MR418321
Zentralblatt MATH: 0271.62044
Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1427749
Zentralblatt MATH: 0871.62012
Chipman, H., George, E. I. and McCulloch, R. E. (2001). The practical implementation of Bayesian model selection. In Model Selection (P. Lahiri, ed.) 67--116. IMS, Beachwood, Ohio.
Mathematical Reviews (MathSciNet): MR2000752
Cochran, W. G. and Cox, G. M. (1957). Experimental Designs, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR85682
Zentralblatt MATH: 0077.13205
Cornfield, J. and Tukey, J. W. (1956). Average values of mean squares in factorials. Ann. Math. Statist. 27 907--949.
Mathematical Reviews (MathSciNet): MR87282
Digital Object Identifier: doi:10.1214/aoms/1177728067
DeGroot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
Mathematical Reviews (MathSciNet): MR356303
Zentralblatt MATH: 0225.62006
Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR1270903
Zentralblatt MATH: 0835.62038
Eisenhart, C. (1947). The assumptions underlying the analysis of variance. Biometrics 3 1--21.
Mathematical Reviews (MathSciNet): MR20761
Fox, J. (2002). An R and S-Plus Companion to Applied Regression. Sage, Thousand Oaks, CA.
Gelman, A. (1992). Discussion of ``Maximum entropy and the nearly black object,'' by D. Donoho et al. J. Roy. Statist. Soc. Ser. B 54 72--73.
Mathematical Reviews (MathSciNet): MR1157714
Gelman, A. (1996). Discussion of ``Hierarchical generalized linear models,'' by Y. Lee and J. A. Nelder. J. Roy. Statist. Soc. Ser. B 58 668.
Mathematical Reviews (MathSciNet): MR1410182
Gelman, A. (2000). Bayesiaanse variantieanalyse. Kwantitatieve Methoden 21 5--12.
Gelman, A. (2003). Bugs.R: Functions for running WinBugs from R. Available at www.stat. columbia.edu/~gelman/bugsR/.
Gelman, A. (2004). Parameterization and Bayesian modeling. J. Amer. Statist. Assoc. 99 537--545.
Mathematical Reviews (MathSciNet): MR2109315
Digital Object Identifier: doi:10.1198/016214504000000458
Zentralblatt MATH: 1117.62343
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1385925
Gelman, A. and Little, T. C. (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology 23 127--135.
Gelman, A., Pasarica, C. and Dodhia, R. M. (2002). Let's practice what we preach: Turning tables into graphs. Amer. Statist. 56 121--130.
Mathematical Reviews (MathSciNet): MR1939380
Digital Object Identifier: doi:10.1198/000313002317572790
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881--889.
Goldstein, H. (1995). Multilevel Statistical Models, 2nd ed. Arnold, London.
Zentralblatt MATH: 1014.62126
Green, B. F. and Tukey, J. W. (1960). Complex analyses of variance: General problems. Psychometrika 25 127--152.
Mathematical Reviews (MathSciNet): MR114273
Digital Object Identifier: doi:10.1007/BF02288577
Zentralblatt MATH: 0094.33005
James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 361--379. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet): MR133191
Johnson, E. G. and Tukey, J. W. (1987). Graphical exploratory analysis of variance illustrated on a splitting of the Johnson and Tsao data. In Design, Data and Analysis by Some Friends of Cuthbert Daniel (C. Mallows, ed.) 171--244. Wiley, New York.
Khuri, A. I., Mathew, T. and Sinha, B. K. (1998). Statistical Tests for Mixed Linear Models. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1601351
Zentralblatt MATH: 0893.62009
Kirk, R. E. (1995). Experimental Design: Procedures for the Behavioral Sciences, 3rd ed. Brooks/Cole, Belmont, MA.
Zentralblatt MATH: 0943.62072
Kreft, I. and de Leeuw, J. (1998). Introducing Multilevel Modeling. Sage, London.
LaMotte, L. R. (1983). Fixed-, random-, and mixed-effects models. In Encyclopedia of Statistical Sciences (S. Kotz, N. L. Johnson and C. B. Read, eds.) 3 137--141. Wiley, New York.
Liu, C. (2002). Robit regression: A simple robust alternative to logistic and probit regression. Technical report, Bell Laboratories.
Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM---the PX-EM algorithm. Biometrika 85 755--770.
Mathematical Reviews (MathSciNet): MR1666758
Digital Object Identifier: doi:10.1093/biomet/85.4.755
Zentralblatt MATH: 0921.62071
Liu, J. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264--1274.
Mathematical Reviews (MathSciNet): MR1731488
Meng, X.-L. and van Dyk, D. (1997). The EM algorithm---an old folk-song sung to a fast new tune (with discussion). J. Roy. Statist. Soc. Ser. B 59 511--567.
Mathematical Reviews (MathSciNet): MR1452025
Digital Object Identifier: doi:10.1111/1467-9868.00082
Montgomery, D. C. (1986). Design and Analysis of Experiments, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1076621
Zentralblatt MATH: 0747.62072
Nelder, J. A. (1965a). The analysis of randomized experiments with orthogonal block structure. I. Block structure and the null analysis of variance. Proc. Roy. Soc. London Ser. A 283 147--162.
Mathematical Reviews (MathSciNet): MR176576
Digital Object Identifier: doi:10.1098/rspa.1965.0012
Nelder, J. A. (1965b). The analysis of randomized experiments with orthogonal block structure. II. Treatment structure and the general analysis of variance. Proc. Roy. Soc. London Ser. A 283 163--178.
Mathematical Reviews (MathSciNet): MR174156
Digital Object Identifier: doi:10.1098/rspa.1965.0013
Nelder, J. A. (1977). A reformulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48--76.
Mathematical Reviews (MathSciNet): MR458743
Nelder, J. A. (1994). The statistics of linear models: Back to basics. Statist. Comput. 4 221--234.
Plackett, R. L. (1960). Models in the analysis of variance (with discussion). J. Roy. Statist. Soc. Ser. B 22 195--217.
Mathematical Reviews (MathSciNet): MR119324
R Project (2000). The R project for statistical computing. Available at www.r-project.org.
Ripley, B. D. (1981). Spatial Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR624436
Zentralblatt MATH: 0583.62087
Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects (with discussion). Statist. Sci. 6 15--51.
Mathematical Reviews (MathSciNet): MR1108815
Robinson, G. K. (1998). Variance components. In Encyclopedia of Biostatistics (P. Armitage and T. Colton, eds.) 6 4713--4719. Wiley, Chichester.
Rubin, D. B. (1981). Estimation in parallel randomized experiments. J. Educational Statistics 6 377--401.
Sargent, D. J. and Hodges, J. S. (1997). Smoothed ANOVA with application to subgroup analysis. Technical report, Dept. Biostatistics, Univ. Minnesota.
Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1190470
Zentralblatt MATH: 0850.62007
Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods, 8th ed. Iowa State Univ. Press, Ames, IA.
Mathematical Reviews (MathSciNet): MR1017246
Zentralblatt MATH: 0727.62003
Snijders, T. A. B. and Bosker, R. J. (1999). Multilevel Analysis. Sage, London.
Zentralblatt MATH: 0953.62127
Speed, T. P. (1987). What is an analysis of variance? (with discussion). Ann. Statist. 15 885--941.
Mathematical Reviews (MathSciNet): MR902237
Spiegelhalter, D., Thomas, A., Best, N. and Lunn, D. (2002). BUGS: Bayesian inference using Gibbs sampling, version 1.4. MRC Biostatistics Unit, Cambridge, England. Available at www.mrc-bsu.cam.ac.uk/bugs/.
Voss, D. S., Gelman, A. and King, G. (1995). Pre-election survey methodology: Details from eight polling organizations, 1988 and 1992. Public Opinion Quarterly 59 98--132.
Yates, F. (1967). A fresh look at the basic principles of the design and analysis of experiments. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 4 777--790. Univ. California Press, Berkeley.
Cochran, W. G. and Cox, G. M. (1957). Experimental Designs, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR85682
Zentralblatt MATH: 0077.13205
Cox, D. R. (1984). Interaction (with discussion). Internat. Statist. Rev. 52 1--31.
Mathematical Reviews (MathSciNet): MR967201
Cox, D. R. and Snell, E. J. (1981). Applied Statistics: Principles and Examples. Chapman and Hall, London.
Zentralblatt MATH: 0612.62002
Joe, H. (1990). Extended use of paired comparison models with application to chess rankings. Appl. Statist. 39 85--93.
Mathematical Reviews (MathSciNet): MR1038891
McCullagh, P. (2000). Invariance and factorial models (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 62 209--256.
Mathematical Reviews (MathSciNet): MR1749537
Digital Object Identifier: doi:10.1111/1467-9868.00229
Stewart, J. Q. (1948). Demographic gravitation: Evidence and application. Sociometry 11 31--58.
Stigler, S. M. (1994). Citation patterns in the journals of statistics and probability. Statist. Sci. 9 94--108.
Tukey, J. W. (1974). Named and faceless values: An initial exploration in memory of Prasanta C. Mahalanobis. Sankhyā Ser. A 36 125--176.
Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Statist. 13 1378--1402.
Mathematical Reviews (MathSciNet): MR811498
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
Mathematical Reviews (MathSciNet): MR1045442
Zentralblatt MATH: 0813.62001
Wu, C. F. J. and Hamada, M. (2000). Experiments: Planning, Analysis, and Parameter Design Optimization. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1780411
Zentralblatt MATH: 0964.62065
Berger, J. O. and Pericchi, L. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109--122.
Mathematical Reviews (MathSciNet): MR1394065
Goldstein, H., Rasbash, J., Plewis, I., Draper, D., Browne, W., Yang, M., Woodhouse, G. and Healy, M. (1998). A User's Guide to MLwiN. Institute of Education, Univ. London.
Lee, P. M. (1997). Bayesian Statistics: An Introduction. Arnold, London.
Mathematical Reviews (MathSciNet): MR1475928
Zentralblatt MATH: 0882.62017
Ayanian, J. Z., Zaslavsky, A. M., Fuchs, C. S., Guadagnoli, E., Creech, C. M., Cress, R. D., O'Connor, L. C., West, D. W., Allen, M. E., Wolf, R. E. and Wright, W. E. (2003). Use of adjuvant chemotherapy and radiation therapy for colorectal cancer in a population-based cohort. J. Clinical Oncology 21 1293--1300.
Meng, X.-L. (1994). Posterior predictive $p$-values. Ann. Statist. 22 1142--1160.
Mathematical Reviews (MathSciNet): MR1311969
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151--1172.
Mathematical Reviews (MathSciNet): MR760681
Wennberg, J. E. and Gittelsohn, A. (1982). Variations in medical care among small areas. Scientific American 246(4) 120--134.
Zaslavsky, A. M., Zaborski, L. B. and Cleary, P. D. (2004). Plan, geographical, and temporal variation of consumer assessments of ambulatory health care. Health Services Res. 39 1467--1485.
Gelman, A., Bois, F. Y. and Jiang, J. (1996). Physiological pharmacokinetic analysis using population modeling and informative prior distributions. J. Amer. Statist. Assoc. 91 1400--1412.
Gelman, A. and Huang, Z. (2005). Estimating incumbency advantage and its variation, as an example of a before/after study. J. Amer. Statist. Assoc. To appear.
Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. J. Amer. Statist. Assoc. 79 393--398.
Mathematical Reviews (MathSciNet): MR755093
Meulders, M., De Boeck, P., Van Mechelen, I., Gelman, A. and Maris, E. (2001). Bayesian inference with probability matrix decomposition models. J. Educational and Behavioral Statistics 26 153--179.
Park, D. K., Gelman, A. and Bafumi, J. (2004). Bayesian multilevel estimation with poststratification: State-level estimates from national polls. Political Analysis 12 375--385.

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics