## The Annals of Statistics

- Ann. Statist.
- Volume 30, Number 5 (2002), 1225-1310.

### What is a statistical model?

**Full-text: Open access**

#### Abstract

This paper addresses two closely related questions, "What is a statistical model?" and "What is a parameter?" The notions that a model must "make sense," and that a parameter must "have a well-defined meaning" are deeply ingrained in applied statistical work, reasonably well understood at an instinctive level, but absent from most formal theories of modelling and inference. In this paper, these concepts are defined in algebraic terms, using morphisms, functors and natural transformations. It is argued that inference on the basis of a model is not possible unless the model admits a natural extension that includes the domain for which inference is required. For example, prediction requires that the domain include all future units, subjects or time points. Although it is usually not made explicit, every sensible statistical model admits such an extension. Examples are given to show why such an extension is necessary and why a formal theory is required. In the definition of a subparameter, it is shown that certain parameter functions are natural and others are not. Inference is meaningful only for natural parameters. This distinction has important consequences for the construction of prior distributions and also helps to resolve a controversy concerning the Box-Cox model.

#### Article information

**Source**

Ann. Statist. Volume 30, Number 5 (2002), 1225-1310.

**Dates**

First available in Project Euclid: 28 October 2002

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1035844977

**Digital Object Identifier**

doi:10.1214/aos/1035844977

**Mathematical Reviews number (MathSciNet)**

MR1936320

**Zentralblatt MATH identifier**

1039.62003

**Subjects**

Primary: 62AO5

Secondary: 62F99: None of the above, but in this section

**Keywords**

Aggregation agricultural field experiment Bayes inference Box-Cox model category causal inference commutative diagram conformal model contingency table embedding exchangeability extendability extensive variable fertility effect functor Gibbs model harmonic model intensive variable interference Kolmogorov consistency lattice process measure process morphism natural parameterization natural subparameter opposite category quadratic exponential model representation spatial process spline model type III model

#### Citation

McCullagh, Peter. What is a statistical model?. Ann. Statist. 30 (2002), no. 5, 1225--1310. doi:10.1214/aos/1035844977. https://projecteuclid.org/euclid.aos/1035844977.

#### References

- ALDOUS, D. (1981). Representations for partially exchangeable array s of random variables. J. Multivariate Analy sis 11 581-598. Mathematical Reviews (MathSciNet): MR82m:60022

Zentralblatt MATH: 0474.60044

Digital Object Identifier: doi:10.1016/0047-259X(81)90099-3 - ANDREWS, D. F. and HERZBERG, A. (1985). Data. Springer, New York.
- BARNDORFF-NIELSEN, O. E. and COX, D. R. (1994). Inference and Asy mptotics. Chapman and Hall, London.
- BARTLETT, M. S. (1978). Nearest neighbour models in the analysis of field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 40 147-174.
- BERGER, J. O. (1985). Statistical Decision Theory and Bayesian Analy sis, 2nd ed. Springer, New York.
- BERNARDO, J. M. and SMITH, A. F. M. (1994). Bayesian Theory. Wiley, New York. Mathematical Reviews (MathSciNet): MR96a:62006
- BESAG, J. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236.
- BESAG, J. and HIGDON, D. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746. Mathematical Reviews (MathSciNet): MR1722238

Zentralblatt MATH: 0951.62091

Digital Object Identifier: doi:10.1111/1467-9868.00201

JSTOR: links.jstor.org - BESAG, J. and KOOPERBERG, C. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746.
- BEST, N. G., ICKSTADT, K. and WOLPERT, R. L. (1999). Contribution to the discussion of Besag
- (1999). J. Roy. Statist. Soc. Ser. B 61 728-729.
- BICKEL, P. and DOKSUM, K. A. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc. 76 296-311. Mathematical Reviews (MathSciNet): MR83b:62048

Zentralblatt MATH: 0464.62058

Digital Object Identifier: doi:10.2307/2287831

JSTOR: links.jstor.org - BILLINGSLEY, P. (1986). Probability and Measure, 2nd ed. Wiley, New York. Mathematical Reviews (MathSciNet): MR87f:60001
- BOX, G. E. P. and COX, D. R. (1964). An analysis of transformations (with discussion). J. Roy. Statist. Soc. Ser. B 26 211-252.
- BOX, G. E. P. and COX, D. R. (1982). An analysis of transformations revisited, rebutted. J. Amer. Statist. Assoc. 77 209-210. Zentralblatt MATH: 0504.62058

Mathematical Reviews (MathSciNet): MR648047

Digital Object Identifier: doi:10.2307/2287791

JSTOR: links.jstor.org - COX, D. R. (1958). Planning of Experiments. Wiley, New York. Mathematical Reviews (MathSciNet): MR95561
- COX, D. R. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 963-964.
- COX, D. R. and HINKLEY, D. V. (1974). Theoretical Statistics. Chapman and Hall, London. Mathematical Reviews (MathSciNet): MR370837
- COX, D. R. and REID, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B 49 1-39.
- COX, D. R. and SNELL, E. J. (1981). Applied Statistics. Chapman and Hall, London.
- COX, D. R. and WERMUTH, N. (1996). Multivariate Dependencies. Chapman and Hall, London.
- DALE, J. R. (1984). Local versus global association for bivariate ordered responses. Biometrika 71 507-514. Mathematical Reviews (MathSciNet): MR86b:62091

Digital Object Identifier: doi:10.1093/biomet/71.3.507

JSTOR: links.jstor.org - DE FINETTI, B. (1975). Theory of Probability 2. Wiley, New York.
- GELMAN, A., CARLIN, J. B., STERN, H. and RUBIN, D. B. (1995). Bayesian Data Analy sis. Chapman and Hall, London.Mathematical Reviews (MathSciNet): MR97c:62059
- GOODMAN, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. Amer. Statist. Assoc. 74 537-552.
- GOODMAN, L. A. (1981). Association models and canonical correlation in the analysis of crossclassifications having ordered categories. J. Amer. Statist. Assoc. 76 320-334.
- HAMADA, M. and WU, C. F. J. (1992). Analy sis of designed experiments with complex aliasing. J. Qual. Technology 24 130-137.
- HARVILLE, D. A. and ZIMMERMANN, D. L. (1999). Contribution to the discussion of Besag (1999). J. Roy. Statist. Soc. Ser. B 61 733-734.
- HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881. Mathematical Reviews (MathSciNet): MR1704291

Zentralblatt MATH: 0953.81003

Digital Object Identifier: doi:10.1023/A:1026676913271 - HELLAND, I. S. (1999b). Quantum theory from sy mmetries in a general statistical parameter space. Technical report, Dept. Mathematics, Univ. Oslo.
- HINKLEY, D. V. and RUNGER, G. (1984). The analysis of transformed data (with discussion). J. Amer. Statist. Assoc. 79 302-320. Mathematical Reviews (MathSciNet): MR85m:62142

Zentralblatt MATH: 0553.62051

Digital Object Identifier: doi:10.2307/2288264

JSTOR: links.jstor.org - HOLLAND, P. (1986). Statistics and causal inference (with discussion). J. Amer. Statist. Assoc. 81 945-970. Mathematical Reviews (MathSciNet): MR88k:62010

Zentralblatt MATH: 0607.62001

Digital Object Identifier: doi:10.2307/2289064

JSTOR: links.jstor.org - HORA, R. B. and BUEHLER, R. J. (1966). Fiducial theory and invariant estimation. Ann. Math. Statist. 37 643-656. Mathematical Reviews (MathSciNet): MR33:8078

Zentralblatt MATH: 0148.13805

Digital Object Identifier: doi:10.1214/aoms/1177699458

Project Euclid: euclid.aoms/1177699458 - KINGMAN, J. F. C. (1984). Present position and potential developments: Some personal views. Probability and random processes. J. Roy. Statist. Soc. Ser. A 147 233-244.
- KINGMAN, J. F. C. (1993). Poisson Processes. Oxford Univ. Press. Mathematical Reviews (MathSciNet): MR94a:60052
- LAURITZEN, S. (1988). Extremal Families and Sy stems of Sufficient Statistics. Lecture Notes in Statist. 49. Springer, New York.
- LEHMANN, E. L. (1983). Theory of Point Estimation. Wiley, New York. Mathematical Reviews (MathSciNet): MR85a:62001
- LEHMANN, E. L. and CASELLA, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York. Mathematical Reviews (MathSciNet): MR99g:62025
- LITTELL, R., FREUND, R. J. and SPECTOR, P. C. (1991). SAS Sy stem for Linear Models, 3rd ed. SAS Institute, Cary, NC.
- MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York. Mathematical Reviews (MathSciNet): MR2001j:18001
- MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142.
- MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259. Mathematical Reviews (MathSciNet): MR93h:62048

Zentralblatt MATH: 0753.62002

Digital Object Identifier: doi:10.1093/biomet/79.2.247

JSTOR: links.jstor.org - MCCULLAGH, P. (1996). Möbius transformation and Cauchy parameter estimation. Ann. Statist. 24 787-808. Zentralblatt MATH: 0859.62007

Mathematical Reviews (MathSciNet): MR1394988

Digital Object Identifier: doi:10.1214/aos/1032894465

Project Euclid: euclid.aos/1032894465 - MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456. Mathematical Reviews (MathSciNet): MR1745814

Digital Object Identifier: doi:10.2307/3316103

JSTOR: links.jstor.org - MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256. Mathematical Reviews (MathSciNet): MR2002a:62102

Digital Object Identifier: doi:10.1111/1467-9868.00229

JSTOR: links.jstor.org - MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London. Mathematical Reviews (MathSciNet): MR727836
- MCCULLAGH. P. and WIT, E. (2000). Natural transformation and the Bay es map. Technical report.
- MERCER, W. B. and HALL, A. D. (1911). The experimental error of field trials. J. Agric. Research 50 331-357.
- NELDER, J. A. (1977). A re-formulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48-77. Mathematical Reviews (MathSciNet): MR56:16943

Digital Object Identifier: doi:10.2307/2344517

JSTOR: links.jstor.org - PEARSON, K. (1913). Note on the surface of constant association. Biometrika 9 534-537.
- PLACKETT, R. L. (1965). A class of bivariate distributions. J. Amer. Statist. Assoc. 60 516-522. Mathematical Reviews (MathSciNet): MR32:524

Digital Object Identifier: doi:10.2307/2282685

JSTOR: links.jstor.org - RUBIN, D. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34-58. Mathematical Reviews (MathSciNet): MR472152

Zentralblatt MATH: 0383.62021

Digital Object Identifier: doi:10.1214/aos/1176344064

Project Euclid: euclid.aos/1176344064 - RUBIN, D. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 961-962.
- SMITH, A. F. M. (1984). Present position and potential developments: some personal views. Bayesian statistics. J. Roy. Statist. Soc. Ser. A 147 245-259.
- TJUR, T. (2000). Contribution to the discussion of McCullagh (2000). J. Roy. Statist. Soc. Ser. B 62 238-239.
- WHITTLE, P. (1974). Contribution to the discussion of Besag (1974). J. Roy. Statist. Soc. Ser. B 36 228.
- YANDELL, B. S. (1997). Practical Data Analy sis for Designed Experiments. Chapman and Hall, London.
- CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu
- berg (1995), Besag and Higdon (1999) and Rue and Tjelmeland (2002). However, spatial effects are often of secondary importance, as in variety trials, and the main intention is to absorb an appropriate level of spatial variation in the formulation, rather than produce a spatial model with scientifically interpretable parameters. Nevertheless, McCullagh's basic point is well taken. For example, I view the use of MRFs in geographical epidemiology [e.g., Besag, York and Mollié (1991)] as mainly of exploratory value, in suggesting additional spatially related covariates whose inclusion would ideally dispense with the need for a spatial formulation;
- uniformity trials in Fairfield Smith (1938) and Pearce (1976). Of course, in a genuine variety trial, one might want to predict what the aggregate yield over the entire field would have been for a few individual varieties but this does not require any extension of the formulation to McCullagh's conceptual plots. Indeed, such calculations are especially well suited to the Bayesian paradigm, both theoretically, because one is supposed to deal with potentially observable quantities rather than merely with parameters, and in practice, via MCMC, because the posterior predictive distributions are available rigorously. That is, for the aggregate yield of variety A, one uses the observed yields on plots that were sown with A and generates a set of observations from the likelihood for those that were not for each MCMC sample of parameter values, hence building a corresponding distribution of total yield. One may also construct credible intervals for the difference in total yields between varieties A and B and easily address all manner of questions in ranking and selection that simply cannot be considered in a frequentist framework; for example, the posterior probability that the total yield obtained by sowing any particular variety (perhaps chosen in the light of the experiment) would have been at least 10% greater than that of growing any other test variety in the field.
- ton (1986). The findings ty pically suggest that the gains from spatial analysis in a badly designed experiment provide improvements commensurate with standard analysis and optimal design. This is not a reason to adopt poor designs but the simple fact is that, despite the efforts of statisticians, many experiments are carried out using nothing better than randomized complete blocks. It is highly desirable that the representation of fertility is flexible but is also parsimonious because there are many variety effects to be estimated, with very limited replication. McCullagh's use of discrete approximations to harmonic functions in Section 8 fails on both counts: first, local maxima or minima cannot exist except (artificially) at plots on the edge of the trial; second, the degrees of freedom lost in the fit equals the number of such plots and is therefore substantial (in fact, four less in a rectangular lay out because the corner plots are ignored throughout the analysis!). Nevertheless, there is something appealing about the averaging property of harmonic functions, if only it were a little more flexible. What is required is a random effects (in frequentist terms) version and that is precisely the thinking behind the use of intrinsic autoregressions in BH and elsewhere. Indeed, such schemes fit McCullagh's discretized harmonic functions perfectly, except for edge effects (because BH embeds the array in a larger one to cater for such effects), and they also provide a good fit to more plausible fertility functions. For specific comments on the Mercer and Hall data, see below. Of course, spatial scale remains an important issue for variety trials and indeed is discussed empirically in Section 2.3 and in the rejoinder of BH. For onedimensional adjustment, the simplest plausible continuum process is Brownian motion with an arbitrary level, for which the necessary integrations can be
- ATKINSON, A. C. and BAILEY, R. A. (2001). One hundred years of the design of experiments on and off the pages of Biometrika. Biometrika 88 53-97. Mathematical Reviews (MathSciNet): MR2002b:62001

Zentralblatt MATH: 1037.62069

Digital Object Identifier: doi:10.1093/biomet/88.1.53

JSTOR: links.jstor.org - BESAG, J. E. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236.
- BESAG, J. E. (1975). Statistical analysis of non-lattice data. The Statistician 24 179-195.
- BESAG, J. E., GREEN, P. J., HIGDON, D. M. and MENGERSEN, K. L. (1995). Bayesian computation and stochastic sy stems (with discussion). Statist. Sci. 10 3-66. Mathematical Reviews (MathSciNet): MR96m:62048

Digital Object Identifier: doi:10.1214/ss/1177010123

Project Euclid: euclid.ss/1177010123 - BESAG, J. E. and HIGDON, D. M. (1993). Bayesian inference for agricultural field experiments. Bull. Internat. Statist. Inst. 55 121-136.
- BESAG, J. E. and HIGDON, D. M. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746. Mathematical Reviews (MathSciNet): MR1722238

Zentralblatt MATH: 0951.62091

Digital Object Identifier: doi:10.1111/1467-9868.00201

JSTOR: links.jstor.org - BESAG, J. E. and KEMPTON, R. A. (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42 231-251. Zentralblatt MATH: 0658.62129
- BESAG, J. E. and KOOPERBERG, C. L. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746.
- BESAG, J. E., YORK, J. C. and MOLLIÉ, A. (1991). Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann. Inst. Statist. Math. 43 1-59. Mathematical Reviews (MathSciNet): MR92d:62032

Zentralblatt MATH: 0760.62029

Digital Object Identifier: doi:10.1007/BF00116466 - BREIMAN, L. (2001). Statistical modeling: the two cultures (with discussion). Statist. Sci. 16 199- 231. Mathematical Reviews (MathSciNet): MR1874152

Digital Object Identifier: doi:10.1214/ss/1009213726

Project Euclid: euclid.ss/1009213726 - By ERS, S. D. and BESAG, J. E. (2000). Inference on a collapsed margin in disease mapping. Statistics in Medicine 19 2243-2249.
- FAIRFIELD SMITH, H. (1938). An empirical law describing heterogeneity in the yields of agricultural crops. J. Agric. Sci. 28 1-23.
- FISHER, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309-368.
- FISHER, R. A. (1928). Statistical Methods for Research Workers, 2nd ed. Oliver and Boy d, Edinburgh.
- GILMOUR, A. R., CULLIS, B. R., SMITH, A. B. and VERBy LA, A. P. (1999). Discussion of paper by J. E. Besag and D. M. Higdon. J. Roy. Statist. Soc. B 61 731-732.
- HEINE, V. (1955). Models for two-dimensional stationary stochastic processes. Biometrika 42 170- 178.
- KÜNSCH, H. R. (1987). Intrinsic autoregressions and related models on the two-dimensional lattice. Biometrika 74 517-524. Zentralblatt MATH: 0671.62082
- MATÉRN, B. (1986). Spatial Variation. Springer, New York. Mathematical Reviews (MathSciNet): MR867886
- MCBRATNEY, A. B. and WEBSTER, R. (1981). Detection of ridge and furrow pattern by spectral analysis of crop yield. Internat. Statist. Rev. 49 45-52.
- PEARCE, S. C. (1976). An examination of Fairfield Smith's law of environmental variation. J. Agric. Sci. 87 21-24.
- RUE, H. and TJELMELAND, H. (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Statist. 29 31-49.
- WHITTLE, P. (1962). Topographic correlation, power-law covariance functions, and diffusion. Biometrika 49 305-314.
- SEATTLE, WASHINGTON 98195-4322 E-MAIL: julian@stat.washington.edu
- recently by Chen, Lockhart and Stephens (2002). One reason for its attractiveness to me is that if one considers the more realistic semiparametric model, a(Y) = X +, (6) where a is an arbitrary monotone transformation and has a N (µ, 2) distribution then / is identifiable and estimable at the n-1/2 rate while is not identifiable. Bickel and Ritov (1997) discuss way s of estimating / and a which is also estimable at rate n-1/2 optimally and suggest approaches to algorithms in their paper. The choice (,) is of interest to me because its consideration is the appropriate response to the Hinkley-Runger critique. One needs to specify a joint confidence region for (,) making statements such as "the effect magnitude on the scale is consistent with the data." The effect of lack of knowledge of on the variance of remains interpretable. It would be more attractive if McCullagh could somehow divorce the calculus of this paper from the language of functors, morphisms and canonical diagrams for more analysis-oriented statisticians such as my self.
- BICKEL, P. and RITOV, Y. (1997). Local asy mptotic normality of ranks and covariates in the transformation models. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. Yang, eds.) 43-54. Springer, New York.
- CHEN, G., LOCKHART, R. A. and STEPHENS, M. A. (2002). Box-Cox tranformations in linear models: Large sample theory and tests of normality (with discussion). Canad. J. Statist. 30 177-234.Mathematical Reviews (MathSciNet): MR1926062

Digital Object Identifier: doi:10.2307/3315946

JSTOR: links.jstor.org - BERKELEY, CALIFORNIA 94720-3860 E-MAIL: bickel@stat.berkeley.edu
- MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York.Mathematical Reviews (MathSciNet): MR2001j:18001
- FRASER, D. A. S. (1968a). A black box or a comprehensive model. Technometrics 10 219-229. Mathematical Reviews (MathSciNet): MR237024

Digital Object Identifier: doi:10.2307/1267040

JSTOR: links.jstor.org - FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York.
- MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259.Mathematical Reviews (MathSciNet): MR93h:62048

Zentralblatt MATH: 0753.62002

Digital Object Identifier: doi:10.1093/biomet/79.2.247

JSTOR: links.jstor.org - TORONTO, ONTARIO M5S 3G3 CANADA E-MAIL: reid@utstat.utoronto.ca
- from Helland (2002). Let a group G be defined on the parameter space of a model. A measurable function from to another space is called a natural subparameter if ( 1) = ( 2) implies (g 1) = (g 2) for all g G. For example, in the location and scale case the location parameter µ and the scale parameter are natural, while the coefficient of variation µ/ is not natural (it is if the group is changed to the pure scale group). In general the parameter is natural iff the level sets of the function = () are transformed onto other
- inconsistency discussed in detail by Dawid, Stone and Zidek (1973). Their main problem is a violation of the plausible reduction principle: assume that a general method of inference, applied to data (y, z), leads to an answer that in fact depends on z alone. Then the same answer should appear if the same method is applied to z alone. A Bayesian implementation of this principle runs as follows: assume first that the probability density p(y, z |,) depends on the parameter = (,) in such a way that the marginal density p(z |) only depends upon. Then the following implication should hold: if (a) the marginal posterior density ( | y, z) depends on the data (y, z) only through z, then (b) this ( | z) should be proportional to a()p(z |) for some function a(), so that it is proportional to a posterior based solely on the z data. For a proper prior (,) this can be shown to hold with a() being the appropriate marginal prior (). Dawid, Stone and Zidek (1973) gave several examples where the implication above is violated by improper priors of the kind that we sometimes expect to have in objective Bay es inference. For our purpose, the interesting case is when there is a transformation group G defined on the parameter space. Under the assumption that is maximal invariant under G and making some regularity conditions, it is then first shown by Dawid, Stone and Zidek (1973) that it necessarily follows that p(z |,) only depends upon, next (a) is shown to hold alway s, and finally (b) holds if and only if the prior is of the form G(d) d, where G is right Haar measure, and the measure
- DAWID, A. P., STONE, M. and ZIDEK, J. V. (1973). Marginalization paradoxes in Bayesian and structural inference (with discussion). J. Roy. Statist. Soc. Ser. B 35 189-233.
- HELLAND, I. S. (2001). Reduction of regression models under sy mmetry. In Algebraic Methods in Statistics and Probability (M. Viana and D. Richards, eds.) 139-153. Amer. Math. Soc., Providence, RI.
- HELLAND, I. S. (2002). Statistical inference under a fixed sy mmetry group. Available at http:// www.math.uio.no/ ingeh/.URL: Link to item
- BROWN, L. D. (1984). The research of Jack Kiefer outside the area of experimental design. Ann. Statist. 12 406-415. Zentralblatt MATH: 0549.01017

Mathematical Reviews (MathSciNet): MR740901

Digital Object Identifier: doi:10.1214/aos/1176346495

Project Euclid: euclid.aos/1176346495 - CARTIER, P. (2001). A mad day's work: From Grothendieck to Connes and Kontsevich. The evolution of concepts of space and sy mmetry. Bull. Amer. Math. Soc. 38 389-408. Mathematical Reviews (MathSciNet): MR1848254

Digital Object Identifier: doi:10.1090/S0273-0979-01-00913-2 - GROTHENDIECK, A. (1955). Produits tensoriels topologiques et espaces nucléaires. Mem. Amer. Math. Soc. 16. Mathematical Reviews (MathSciNet): MR17,763c
- HUBER, P. J. (1961). Homotopy theory in general categories. Math. Ann. 144 361-385. Mathematical Reviews (MathSciNet): MR27:187

Zentralblatt MATH: 0099.17905

Digital Object Identifier: doi:10.1007/BF01396534 - LE CAM, L. (1964). Sufficiency and approximate sufficiency. Ann. Math. Statist. 35 1419-1455.Mathematical Reviews (MathSciNet): MR34:6909

Zentralblatt MATH: 0129.11202

Digital Object Identifier: doi:10.1214/aoms/1177700372

Project Euclid: euclid.aoms/1177700372 - ARBUTHNOTT, J. (1712). An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes. Philos. Trans. Roy. Soc. London 27 186-190.
- BAILEY, R. A. (1981). A unified approach to design of experiments. J. Roy. Statist. Soc. Ser. A 144 214-223. Mathematical Reviews (MathSciNet): MR82h:62129

Digital Object Identifier: doi:10.2307/2981920

JSTOR: links.jstor.org - BAILEY, R. A. (1991). Strata for randomized experiments (with discussion). J. Roy. Statist. Soc. Ser. B 53 27-78.
- COX, D. R. (1990). Roles of models in statistical analysis. Statist. Sci. 5 169-174. Mathematical Reviews (MathSciNet): MR1062575

Digital Object Identifier: doi:10.1214/ss/1177012165

Project Euclid: euclid.ss/1177012165 - DIACONIS, P. (1988). Group Representations in Probability and Statistics. IMS, Hay ward, CA.
- DIACONIS, P., GRAHAM, R. L. and KANTOR, W. M. (1983). The mathematics of perfect shuffles. Adv. in Appl. Math. 4 175-196. Mathematical Reviews (MathSciNet): MR84j:20040

Zentralblatt MATH: 0521.05005

Digital Object Identifier: doi:10.1016/0196-8858(83)90009-X - FURSTENBURG, H. (1963). Noncommuting random products. Trans. Amer. Math. Soc. 108 377-428. Mathematical Reviews (MathSciNet): MR163345

Zentralblatt MATH: 0203.19102

Digital Object Identifier: doi:10.2307/1993589

JSTOR: links.jstor.org - GRENANDER, U. (1963). Probabilities on Algebraic Structures. Wiley, New York. Mathematical Reviews (MathSciNet): MR34:6810
- MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456. Mathematical Reviews (MathSciNet): MR1745814

Digital Object Identifier: doi:10.2307/3316103

JSTOR: links.jstor.org - MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256. Mathematical Reviews (MathSciNet): MR2002a:62102

Digital Object Identifier: doi:10.1111/1467-9868.00229

JSTOR: links.jstor.org - PINCUS, S. and KALMAN, R. E. (1997). Not all (possibly) "random" sequences are created equal. Proc. Nat. Acad. Sci. U.S.A. 94 3513-3518. Mathematical Reviews (MathSciNet): MR99d:68179

Zentralblatt MATH: 0873.11047

Digital Object Identifier: doi:10.1073/pnas.94.8.3513

JSTOR: links.jstor.org - PINCUS, S. and SINGER, B. H. (1996). Randomness and degrees of irregularity. Proc. Nat. Acad. Sci. U.S.A. 93 2083-2088.Mathematical Reviews (MathSciNet): MR97g:65025

Zentralblatt MATH: 0849.60002

Digital Object Identifier: doi:10.1073/pnas.93.5.2083

JSTOR: links.jstor.org - GUILFORD, CONNECTICUT 06437 E-MAIL: stevepincus@alum.mit.edu
- in McCullagh (1980). Suppose we are dealing with a universe where the natural models for handling of binary responses are the logistic regression models. This could be some socioeconomic research area where peoples' attitudes to various features of brands or service levels are recorded on a binary scale, and the interest lies in the dependence of these attitudes on all sorts of background variables. How do we extend this universe to deal with ordered categorical responses, for example, on three-point positive/indifferent/negative scales? A natural requirement seems to be that if data are dichotomized by the (arbitrary) selection of a cutpoint (putting, for example, negative and indifferent together in a single category), then the marginal model coming out of this is a logistic regression model. This is, after all, just a way of recording a binary response, and even though it would hurt any statistician to throw away information in this way, it is done all the time on more invisible levels. Another natural requirement is that the parameters of interest-with the constant term as an obvious exception-should not depend on how the cutpoint is selected. It is easy to show that these two requirements are met by one and only one class of models for ordered responses, namely the models that can
- and Nelder (1989). Thus, we have here the absurd situation that the potentially canonical-but unfortunately nonexisting-answer to a simple and canonical question results in a collection of very useful methods. The overdispersion models exist as perfectly respectable operational objects, but not as mathematical objects. My personal opinion [Tjur (1998)] is that the simplest way of giving these models a concrete interpretation goes via approximation by nonlinear models for normal data and a small adjustment of the usual estimation method for these models. But neither this, nor the concept of quasi-likelihood, answers the fundamental question whether there is a way of modifying the conditions (1) and (2) above in such a way that a meaningful theory of generalized linear models with overdispersion comes out as the unique answer. It is tempting to ask, in the present context, whether it is a necessity at all that these models "exist" in the usual sense. Is it so, perhaps, that after a century or two people will find this question irrelevant, just as we find old discussions about existence of the number + irrelevant? If this is the case, a new attitude to statistical models is certainly required.
- MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142.
- MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd. ed. Chapman and Hall, London. Mathematical Reviews (MathSciNet): MR727836
- NELDER, J. A. and WEDDERBURN, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370-384.
- TJUR, T. (1998). Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Amer. Statist. 52 222-227.Mathematical Reviews (MathSciNet): MR99g:62089

Digital Object Identifier: doi:10.2307/2685928

JSTOR: links.jstor.org - has recently been obtained by Wichura (2001). Fraser and Reid ask whether category theory can do more than provide a framework. My experience here is similar to Huber's, namely that category theory is well suited for this purpose but, as a branch of logic, that is all we can expect from it. Regarding the coefficient of variation, I agree that there are applications in which this is a useful and natural parameter or statistic, just as there are (a few) applications in which the correlation coefficient is useful. The groups used in this paper are such that the origin is either fixed or completely arbitrary. In either case there is no room for hedging. In practice, things are rarely so clear cut. In order to justify the coefficient of variation, it seems to me that the applications must be such that the scale of measurement has a reasonably well-defined origin relevant to the problem. The Cauchy model with the real fractional linear group was originally used as an example to highlight certain inferential problems. I do not believe I have encountered an application in which it would be easy to make a convincing case for the relevance of this group. Nevertheless, I think it is helpful to study such examples for the light they may shed on foundational matters. The fact that the median is not a natural subparameter is an insight that casts serious doubt on the relevance of the group in "conventional" applications. To turn the argument around, the fact that the Cauchy model is closed under real fractional linear transformation is not, in itself, an adequate reason to choose that group as the base category. In that sense, I agree with a primary thesis of Fraser's Structure of Inference that the group supersedes the probability model. Tjur's remarks capture the spirit of what I am attempting to do. In the cumulative logit model, it is clear intuitively what is meant by the statement that the parameter of interest should not depend on how the cutpoints are selected. As is often the case, what is intuitively clear is not so easy to express in mathematical terms. It does not mean that the maximum-likelihood estimate is unaffected by this choice. For that reason, although Tjur's second condition on overdispersion models has a certain appeal, I do not think it carries the same force as the first. His description of natural subparameters in regression is a model of clarity.
- given the values on the contour (Matheron, 1971). Both processes are also conformal, but the similarity ends there. The set of conformal processes is also closed under addition of independent processes. Thus, the sum of white noise and W is conformal but not Markov. Bey ond convolutions of white noise and
- W, it appears most unlikely that there exists another conformal process with Gaussian increments. Whittle's (1954) family of stationary Gaussian processes has the Markov property [Chilès and Delfiner (1999)] but the family is not closed under conformal maps nor under convolution.
- CHILÈS, J.-P. and DELFINER, P. (1999). Geostatistics. Wiley, New York.
- FEy NMAN, R. P., LEIGHTON, R. B. and SANDS, M. (1964). The Fey nman Lectures on physics. Addison-Wesley, Reading, MA. Mathematical Reviews (MathSciNet): MR213078
- FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York.
- HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881. Mathematical Reviews (MathSciNet): MR1704291

Zentralblatt MATH: 0953.81003

Digital Object Identifier: doi:10.1023/A:1026676913271 - KINGMAN, J. F. C. (1972). On random sequences with spherical sy mmetry. Biometrika 59 492-494. Mathematical Reviews (MathSciNet): MR49:8161

Zentralblatt MATH: 0238.60025

Digital Object Identifier: doi:10.1093/biomet/59.2.492

JSTOR: links.jstor.org - MACCULLAGH, J. (1839). An essay towards the dy namical theory of cry stalline reflexion and refraction. Trans. Roy. Irish Academy 21 17-50.
- MATHERON, G. (1971). The theory of regionalized variables and its applications. Cahiers du Centre de Morphologie Mathématique de Fontainbleu 5.
- WHITTLE, P. (1954). On stationary processes in the plane. Biometrika 41 434-449.
- WICHURA, M. (2001). Some de Finetti ty pe theorems. Preprint.
- CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu

#### See also

- Includes: Julian Besag. Comment.
- Includes: Peter J. Bickel. Comment.
- Includes: Hans Brøns. Comment.
- Includes: D. A. S. Fraser, N. Reid. Comment.
- Includes: Inge S. Helland. Comment.
- Includes: Peter J. Huber. Comment.
- Includes: Rudolf Kalman. Comment.
- Includes: Steve Pincus. Comment.
- Includes: Joe Tjur. Comment.
- Includes: Peter McCullagh. Rejoinder.

### More like this

- Likelihood Inference for Models with Unobservables: Another View

Lee, Youngjo and Nelder, John A., Statistical Science, 2009 - A Unified Theory of Estimation, I

Birnbaum, Allan, The Annals of Mathematical Statistics, 1961 - Examples Bearing on the Definition of Fiducial Probability with a Bibliography

Brillinger, David R., The Annals of Mathematical Statistics, 1962

- Likelihood Inference for Models with Unobservables: Another View

Lee, Youngjo and Nelder, John A., Statistical Science, 2009 - A Unified Theory of Estimation, I

Birnbaum, Allan, The Annals of Mathematical Statistics, 1961 - Examples Bearing on the Definition of Fiducial Probability with a Bibliography

Brillinger, David R., The Annals of Mathematical Statistics, 1962 - The 1988 Wald Memorial Lectures: The Present Position in Bayesian Statistics

Lindley, Dennis V., Statistical Science, 1990 - Multiple-Imputation Inferences with Uncongenial Sources of Input

Meng, Xiao-Li, Statistical Science, 1994 - Much Ado About Nothing: the Mixed Models Controversy Revisited

Lencina, Viviana B., Singer, Julio M., and Stanek Iii, Edward J., International Statistical Review, 2005 - Parametric estimation. Finite sample theory

Spokoiny, Vladimir, The Annals of Statistics, 2012 - A Bayesian Approach to Some Best Population Problems

Guttman, Irwin and Tiao, George C., The Annals of Mathematical Statistics, 1964 - Contributions to Central Limit Theory for Dependent Variables

Serfling, R. J., The Annals of Mathematical Statistics, 1968 - Oracle estimation of parametric transformation models

Goldberg, Yair, Lu, Wenbin, and Fine, Jason, Electronic Journal of Statistics, 2016