The Annals of Statistics

What is a statistical model?

Peter McCullagh

Full-text: Open access

Abstract

This paper addresses two closely related questions, "What is a statistical model?" and "What is a parameter?" The notions that a model must "make sense," and that a parameter must "have a well-defined meaning" are deeply ingrained in applied statistical work, reasonably well understood at an instinctive level, but absent from most formal theories of modelling and inference. In this paper, these concepts are defined in algebraic terms, using morphisms, functors and natural transformations. It is argued that inference on the basis of a model is not possible unless the model admits a natural extension that includes the domain for which inference is required. For example, prediction requires that the domain include all future units, subjects or time points. Although it is usually not made explicit, every sensible statistical model admits such an extension. Examples are given to show why such an extension is necessary and why a formal theory is required. In the definition of a subparameter, it is shown that certain parameter functions are natural and others are not. Inference is meaningful only for natural parameters. This distinction has important consequences for the construction of prior distributions and also helps to resolve a controversy concerning the Box-Cox model.

Article information

Source
Ann. Statist. Volume 30, Number 5 (2002), 1225-1310.

Dates
First available: 28 October 2002

Permanent link to this document
http://projecteuclid.org/euclid.aos/1035844977

Digital Object Identifier
doi:10.1214/aos/1035844977

Mathematical Reviews number (MathSciNet)
MR1936320

Zentralblatt MATH identifier
01916779

Subjects
Primary: 62AO5
Secondary: 62F99: None of the above, but in this section

Keywords
Aggregation agricultural field experiment Bayes inference Box-Cox model category causal inference commutative diagram conformal model contingency table embedding exchangeability extendability extensive variable fertility effect functor Gibbs model harmonic model intensive variable interference Kolmogorov consistency lattice process measure process morphism natural parameterization natural subparameter opposite category quadratic exponential model representation spatial process spline model type III model

Citation

McCullagh, Peter. What is a statistical model?. The Annals of Statistics 30 (2002), no. 5, 1225--1310. doi:10.1214/aos/1035844977. http://projecteuclid.org/euclid.aos/1035844977.


Export citation

References

  • ALDOUS, D. (1981). Representations for partially exchangeable array s of random variables. J. Multivariate Analy sis 11 581-598.
  • ANDREWS, D. F. and HERZBERG, A. (1985). Data. Springer, New York.
  • BARNDORFF-NIELSEN, O. E. and COX, D. R. (1994). Inference and Asy mptotics. Chapman and Hall, London.
  • BARTLETT, M. S. (1978). Nearest neighbour models in the analysis of field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 40 147-174.
  • BERGER, J. O. (1985). Statistical Decision Theory and Bayesian Analy sis, 2nd ed. Springer, New York.
  • BERNARDO, J. M. and SMITH, A. F. M. (1994). Bayesian Theory. Wiley, New York.
  • BESAG, J. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236.
  • BESAG, J. and HIGDON, D. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746.
  • BESAG, J. and KOOPERBERG, C. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746.
  • BEST, N. G., ICKSTADT, K. and WOLPERT, R. L. (1999). Contribution to the discussion of Besag
  • (1999). J. Roy. Statist. Soc. Ser. B 61 728-729.
  • BICKEL, P. and DOKSUM, K. A. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc. 76 296-311.
  • BILLINGSLEY, P. (1986). Probability and Measure, 2nd ed. Wiley, New York.
  • BOX, G. E. P. and COX, D. R. (1964). An analysis of transformations (with discussion). J. Roy. Statist. Soc. Ser. B 26 211-252.
  • BOX, G. E. P. and COX, D. R. (1982). An analysis of transformations revisited, rebutted. J. Amer. Statist. Assoc. 77 209-210.
  • COX, D. R. (1958). Planning of Experiments. Wiley, New York.
  • COX, D. R. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 963-964.
  • COX, D. R. and HINKLEY, D. V. (1974). Theoretical Statistics. Chapman and Hall, London.
  • COX, D. R. and REID, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B 49 1-39.
  • COX, D. R. and SNELL, E. J. (1981). Applied Statistics. Chapman and Hall, London.
  • COX, D. R. and WERMUTH, N. (1996). Multivariate Dependencies. Chapman and Hall, London.
  • DALE, J. R. (1984). Local versus global association for bivariate ordered responses. Biometrika 71 507-514.
  • DE FINETTI, B. (1975). Theory of Probability 2. Wiley, New York.
  • GELMAN, A., CARLIN, J. B., STERN, H. and RUBIN, D. B. (1995). Bayesian Data Analy sis. Chapman and Hall, London.
  • GOODMAN, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. Amer. Statist. Assoc. 74 537-552.
  • GOODMAN, L. A. (1981). Association models and canonical correlation in the analysis of crossclassifications having ordered categories. J. Amer. Statist. Assoc. 76 320-334.
  • HAMADA, M. and WU, C. F. J. (1992). Analy sis of designed experiments with complex aliasing. J. Qual. Technology 24 130-137.
  • HARVILLE, D. A. and ZIMMERMANN, D. L. (1999). Contribution to the discussion of Besag (1999). J. Roy. Statist. Soc. Ser. B 61 733-734.
  • HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881.
  • HELLAND, I. S. (1999b). Quantum theory from sy mmetries in a general statistical parameter space. Technical report, Dept. Mathematics, Univ. Oslo.
  • HINKLEY, D. V. and RUNGER, G. (1984). The analysis of transformed data (with discussion). J. Amer. Statist. Assoc. 79 302-320.
  • HOLLAND, P. (1986). Statistics and causal inference (with discussion). J. Amer. Statist. Assoc. 81 945-970.
  • HORA, R. B. and BUEHLER, R. J. (1966). Fiducial theory and invariant estimation. Ann. Math. Statist. 37 643-656.
  • KINGMAN, J. F. C. (1984). Present position and potential developments: Some personal views. Probability and random processes. J. Roy. Statist. Soc. Ser. A 147 233-244.
  • KINGMAN, J. F. C. (1993). Poisson Processes. Oxford Univ. Press.
  • LAURITZEN, S. (1988). Extremal Families and Sy stems of Sufficient Statistics. Lecture Notes in Statist. 49. Springer, New York.
  • LEHMANN, E. L. (1983). Theory of Point Estimation. Wiley, New York.
  • LEHMANN, E. L. and CASELLA, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York.
  • LITTELL, R., FREUND, R. J. and SPECTOR, P. C. (1991). SAS Sy stem for Linear Models, 3rd ed. SAS Institute, Cary, NC.
  • MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York.
  • MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142.
  • MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259.
  • MCCULLAGH, P. (1996). Möbius transformation and Cauchy parameter estimation. Ann. Statist. 24 787-808.
  • MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456.
  • MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256.
  • MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • MCCULLAGH. P. and WIT, E. (2000). Natural transformation and the Bay es map. Technical report.
  • MERCER, W. B. and HALL, A. D. (1911). The experimental error of field trials. J. Agric. Research 50 331-357.
  • NELDER, J. A. (1977). A re-formulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48-77.
  • PEARSON, K. (1913). Note on the surface of constant association. Biometrika 9 534-537.
  • PLACKETT, R. L. (1965). A class of bivariate distributions. J. Amer. Statist. Assoc. 60 516-522.
  • RUBIN, D. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34-58.
  • RUBIN, D. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 961-962.
  • SMITH, A. F. M. (1984). Present position and potential developments: some personal views. Bayesian statistics. J. Roy. Statist. Soc. Ser. A 147 245-259.
  • TJUR, T. (2000). Contribution to the discussion of McCullagh (2000). J. Roy. Statist. Soc. Ser. B 62 238-239.
  • WHITTLE, P. (1974). Contribution to the discussion of Besag (1974). J. Roy. Statist. Soc. Ser. B 36 228.
  • YANDELL, B. S. (1997). Practical Data Analy sis for Designed Experiments. Chapman and Hall, London.
  • CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu
  • berg (1995), Besag and Higdon (1999) and Rue and Tjelmeland (2002). However, spatial effects are often of secondary importance, as in variety trials, and the main intention is to absorb an appropriate level of spatial variation in the formulation, rather than produce a spatial model with scientifically interpretable parameters. Nevertheless, McCullagh's basic point is well taken. For example, I view the use of MRFs in geographical epidemiology [e.g., Besag, York and Mollié (1991)] as mainly of exploratory value, in suggesting additional spatially related covariates whose inclusion would ideally dispense with the need for a spatial formulation;
  • uniformity trials in Fairfield Smith (1938) and Pearce (1976). Of course, in a genuine variety trial, one might want to predict what the aggregate yield over the entire field would have been for a few individual varieties but this does not require any extension of the formulation to McCullagh's conceptual plots. Indeed, such calculations are especially well suited to the Bayesian paradigm, both theoretically, because one is supposed to deal with potentially observable quantities rather than merely with parameters, and in practice, via MCMC, because the posterior predictive distributions are available rigorously. That is, for the aggregate yield of variety A, one uses the observed yields on plots that were sown with A and generates a set of observations from the likelihood for those that were not for each MCMC sample of parameter values, hence building a corresponding distribution of total yield. One may also construct credible intervals for the difference in total yields between varieties A and B and easily address all manner of questions in ranking and selection that simply cannot be considered in a frequentist framework; for example, the posterior probability that the total yield obtained by sowing any particular variety (perhaps chosen in the light of the experiment) would have been at least 10% greater than that of growing any other test variety in the field.
  • ton (1986). The findings ty pically suggest that the gains from spatial analysis in a badly designed experiment provide improvements commensurate with standard analysis and optimal design. This is not a reason to adopt poor designs but the simple fact is that, despite the efforts of statisticians, many experiments are carried out using nothing better than randomized complete blocks. It is highly desirable that the representation of fertility is flexible but is also parsimonious because there are many variety effects to be estimated, with very limited replication. McCullagh's use of discrete approximations to harmonic functions in Section 8 fails on both counts: first, local maxima or minima cannot exist except (artificially) at plots on the edge of the trial; second, the degrees of freedom lost in the fit equals the number of such plots and is therefore substantial (in fact, four less in a rectangular lay out because the corner plots are ignored throughout the analysis!). Nevertheless, there is something appealing about the averaging property of harmonic functions, if only it were a little more flexible. What is required is a random effects (in frequentist terms) version and that is precisely the thinking behind the use of intrinsic autoregressions in BH and elsewhere. Indeed, such schemes fit McCullagh's discretized harmonic functions perfectly, except for edge effects (because BH embeds the array in a larger one to cater for such effects), and they also provide a good fit to more plausible fertility functions. For specific comments on the Mercer and Hall data, see below. Of course, spatial scale remains an important issue for variety trials and indeed is discussed empirically in Section 2.3 and in the rejoinder of BH. For onedimensional adjustment, the simplest plausible continuum process is Brownian motion with an arbitrary level, for which the necessary integrations can be
  • ATKINSON, A. C. and BAILEY, R. A. (2001). One hundred years of the design of experiments on and off the pages of Biometrika. Biometrika 88 53-97.
  • BESAG, J. E. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236.
  • BESAG, J. E. (1975). Statistical analysis of non-lattice data. The Statistician 24 179-195.
  • BESAG, J. E., GREEN, P. J., HIGDON, D. M. and MENGERSEN, K. L. (1995). Bayesian computation and stochastic sy stems (with discussion). Statist. Sci. 10 3-66.
  • BESAG, J. E. and HIGDON, D. M. (1993). Bayesian inference for agricultural field experiments. Bull. Internat. Statist. Inst. 55 121-136.
  • BESAG, J. E. and HIGDON, D. M. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746.
  • BESAG, J. E. and KEMPTON, R. A. (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42 231-251.
  • BESAG, J. E. and KOOPERBERG, C. L. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746.
  • BESAG, J. E., YORK, J. C. and MOLLIÉ, A. (1991). Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann. Inst. Statist. Math. 43 1-59.
  • BREIMAN, L. (2001). Statistical modeling: the two cultures (with discussion). Statist. Sci. 16 199- 231.
  • By ERS, S. D. and BESAG, J. E. (2000). Inference on a collapsed margin in disease mapping. Statistics in Medicine 19 2243-2249.
  • FAIRFIELD SMITH, H. (1938). An empirical law describing heterogeneity in the yields of agricultural crops. J. Agric. Sci. 28 1-23.
  • FISHER, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309-368.
  • FISHER, R. A. (1928). Statistical Methods for Research Workers, 2nd ed. Oliver and Boy d, Edinburgh.
  • GILMOUR, A. R., CULLIS, B. R., SMITH, A. B. and VERBy LA, A. P. (1999). Discussion of paper by J. E. Besag and D. M. Higdon. J. Roy. Statist. Soc. B 61 731-732.
  • HEINE, V. (1955). Models for two-dimensional stationary stochastic processes. Biometrika 42 170- 178.
  • KÜNSCH, H. R. (1987). Intrinsic autoregressions and related models on the two-dimensional lattice. Biometrika 74 517-524.
  • MATÉRN, B. (1986). Spatial Variation. Springer, New York.
  • MCBRATNEY, A. B. and WEBSTER, R. (1981). Detection of ridge and furrow pattern by spectral analysis of crop yield. Internat. Statist. Rev. 49 45-52.
  • PEARCE, S. C. (1976). An examination of Fairfield Smith's law of environmental variation. J. Agric. Sci. 87 21-24.
  • RUE, H. and TJELMELAND, H. (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Statist. 29 31-49.
  • WHITTLE, P. (1962). Topographic correlation, power-law covariance functions, and diffusion. Biometrika 49 305-314.
  • SEATTLE, WASHINGTON 98195-4322 E-MAIL: julian@stat.washington.edu
  • recently by Chen, Lockhart and Stephens (2002). One reason for its attractiveness to me is that if one considers the more realistic semiparametric model, a(Y) = X +, (6) where a is an arbitrary monotone transformation and has a N (µ, 2) distribution then / is identifiable and estimable at the n-1/2 rate while is not identifiable. Bickel and Ritov (1997) discuss way s of estimating / and a which is also estimable at rate n-1/2 optimally and suggest approaches to algorithms in their paper. The choice (,) is of interest to me because its consideration is the appropriate response to the Hinkley-Runger critique. One needs to specify a joint confidence region for (,) making statements such as "the effect magnitude on the scale is consistent with the data." The effect of lack of knowledge of on the variance of remains interpretable. It would be more attractive if McCullagh could somehow divorce the calculus of this paper from the language of functors, morphisms and canonical diagrams for more analysis-oriented statisticians such as my self.
  • BICKEL, P. and RITOV, Y. (1997). Local asy mptotic normality of ranks and covariates in the transformation models. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. Yang, eds.) 43-54. Springer, New York.
  • CHEN, G., LOCKHART, R. A. and STEPHENS, M. A. (2002). Box-Cox tranformations in linear models: Large sample theory and tests of normality (with discussion). Canad. J. Statist. 30 177-234.
  • BERKELEY, CALIFORNIA 94720-3860 E-MAIL: bickel@stat.berkeley.edu
  • MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York.
  • FRASER, D. A. S. (1968a). A black box or a comprehensive model. Technometrics 10 219-229.
  • FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York.
  • MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259.
  • TORONTO, ONTARIO M5S 3G3 CANADA E-MAIL: reid@utstat.utoronto.ca
  • from Helland (2002). Let a group G be defined on the parameter space of a model. A measurable function from to another space is called a natural subparameter if ( 1) = ( 2) implies (g 1) = (g 2) for all g G. For example, in the location and scale case the location parameter µ and the scale parameter are natural, while the coefficient of variation µ/ is not natural (it is if the group is changed to the pure scale group). In general the parameter is natural iff the level sets of the function = () are transformed onto other
  • inconsistency discussed in detail by Dawid, Stone and Zidek (1973). Their main problem is a violation of the plausible reduction principle: assume that a general method of inference, applied to data (y, z), leads to an answer that in fact depends on z alone. Then the same answer should appear if the same method is applied to z alone. A Bayesian implementation of this principle runs as follows: assume first that the probability density p(y, z |,) depends on the parameter = (,) in such a way that the marginal density p(z |) only depends upon. Then the following implication should hold: if (a) the marginal posterior density ( | y, z) depends on the data (y, z) only through z, then (b) this ( | z) should be proportional to a()p(z |) for some function a(), so that it is proportional to a posterior based solely on the z data. For a proper prior (,) this can be shown to hold with a() being the appropriate marginal prior (). Dawid, Stone and Zidek (1973) gave several examples where the implication above is violated by improper priors of the kind that we sometimes expect to have in objective Bay es inference. For our purpose, the interesting case is when there is a transformation group G defined on the parameter space. Under the assumption that is maximal invariant under G and making some regularity conditions, it is then first shown by Dawid, Stone and Zidek (1973) that it necessarily follows that p(z |,) only depends upon, next (a) is shown to hold alway s, and finally (b) holds if and only if the prior is of the form G(d) d, where G is right Haar measure, and the measure
  • DAWID, A. P., STONE, M. and ZIDEK, J. V. (1973). Marginalization paradoxes in Bayesian and structural inference (with discussion). J. Roy. Statist. Soc. Ser. B 35 189-233.
  • HELLAND, I. S. (2001). Reduction of regression models under sy mmetry. In Algebraic Methods in Statistics and Probability (M. Viana and D. Richards, eds.) 139-153. Amer. Math. Soc., Providence, RI.
  • HELLAND, I. S. (2002). Statistical inference under a fixed sy mmetry group. Available at http:// www.math.uio.no/ ingeh/.
  • BROWN, L. D. (1984). The research of Jack Kiefer outside the area of experimental design. Ann. Statist. 12 406-415.
  • CARTIER, P. (2001). A mad day's work: From Grothendieck to Connes and Kontsevich. The evolution of concepts of space and sy mmetry. Bull. Amer. Math. Soc. 38 389-408.
  • GROTHENDIECK, A. (1955). Produits tensoriels topologiques et espaces nucléaires. Mem. Amer. Math. Soc. 16.
  • HUBER, P. J. (1961). Homotopy theory in general categories. Math. Ann. 144 361-385.
  • LE CAM, L. (1964). Sufficiency and approximate sufficiency. Ann. Math. Statist. 35 1419-1455.
  • ARBUTHNOTT, J. (1712). An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes. Philos. Trans. Roy. Soc. London 27 186-190.
  • BAILEY, R. A. (1981). A unified approach to design of experiments. J. Roy. Statist. Soc. Ser. A 144 214-223.
  • BAILEY, R. A. (1991). Strata for randomized experiments (with discussion). J. Roy. Statist. Soc. Ser. B 53 27-78.
  • COX, D. R. (1990). Roles of models in statistical analysis. Statist. Sci. 5 169-174.
  • DIACONIS, P. (1988). Group Representations in Probability and Statistics. IMS, Hay ward, CA.
  • DIACONIS, P., GRAHAM, R. L. and KANTOR, W. M. (1983). The mathematics of perfect shuffles. Adv. in Appl. Math. 4 175-196.
  • FURSTENBURG, H. (1963). Noncommuting random products. Trans. Amer. Math. Soc. 108 377-428.
  • GRENANDER, U. (1963). Probabilities on Algebraic Structures. Wiley, New York.
  • MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456.
  • MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256.
  • PINCUS, S. and KALMAN, R. E. (1997). Not all (possibly) "random" sequences are created equal. Proc. Nat. Acad. Sci. U.S.A. 94 3513-3518.
  • PINCUS, S. and SINGER, B. H. (1996). Randomness and degrees of irregularity. Proc. Nat. Acad. Sci. U.S.A. 93 2083-2088.
  • GUILFORD, CONNECTICUT 06437 E-MAIL: stevepincus@alum.mit.edu
  • in McCullagh (1980). Suppose we are dealing with a universe where the natural models for handling of binary responses are the logistic regression models. This could be some socioeconomic research area where peoples' attitudes to various features of brands or service levels are recorded on a binary scale, and the interest lies in the dependence of these attitudes on all sorts of background variables. How do we extend this universe to deal with ordered categorical responses, for example, on three-point positive/indifferent/negative scales? A natural requirement seems to be that if data are dichotomized by the (arbitrary) selection of a cutpoint (putting, for example, negative and indifferent together in a single category), then the marginal model coming out of this is a logistic regression model. This is, after all, just a way of recording a binary response, and even though it would hurt any statistician to throw away information in this way, it is done all the time on more invisible levels. Another natural requirement is that the parameters of interest-with the constant term as an obvious exception-should not depend on how the cutpoint is selected. It is easy to show that these two requirements are met by one and only one class of models for ordered responses, namely the models that can
  • and Nelder (1989). Thus, we have here the absurd situation that the potentially canonical-but unfortunately nonexisting-answer to a simple and canonical question results in a collection of very useful methods. The overdispersion models exist as perfectly respectable operational objects, but not as mathematical objects. My personal opinion [Tjur (1998)] is that the simplest way of giving these models a concrete interpretation goes via approximation by nonlinear models for normal data and a small adjustment of the usual estimation method for these models. But neither this, nor the concept of quasi-likelihood, answers the fundamental question whether there is a way of modifying the conditions (1) and (2) above in such a way that a meaningful theory of generalized linear models with overdispersion comes out as the unique answer. It is tempting to ask, in the present context, whether it is a necessity at all that these models "exist" in the usual sense. Is it so, perhaps, that after a century or two people will find this question irrelevant, just as we find old discussions about existence of the number + irrelevant? If this is the case, a new attitude to statistical models is certainly required.
  • MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142.
  • MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd. ed. Chapman and Hall, London.
  • NELDER, J. A. and WEDDERBURN, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370-384.
  • TJUR, T. (1998). Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Amer. Statist. 52 222-227.
  • has recently been obtained by Wichura (2001). Fraser and Reid ask whether category theory can do more than provide a framework. My experience here is similar to Huber's, namely that category theory is well suited for this purpose but, as a branch of logic, that is all we can expect from it. Regarding the coefficient of variation, I agree that there are applications in which this is a useful and natural parameter or statistic, just as there are (a few) applications in which the correlation coefficient is useful. The groups used in this paper are such that the origin is either fixed or completely arbitrary. In either case there is no room for hedging. In practice, things are rarely so clear cut. In order to justify the coefficient of variation, it seems to me that the applications must be such that the scale of measurement has a reasonably well-defined origin relevant to the problem. The Cauchy model with the real fractional linear group was originally used as an example to highlight certain inferential problems. I do not believe I have encountered an application in which it would be easy to make a convincing case for the relevance of this group. Nevertheless, I think it is helpful to study such examples for the light they may shed on foundational matters. The fact that the median is not a natural subparameter is an insight that casts serious doubt on the relevance of the group in "conventional" applications. To turn the argument around, the fact that the Cauchy model is closed under real fractional linear transformation is not, in itself, an adequate reason to choose that group as the base category. In that sense, I agree with a primary thesis of Fraser's Structure of Inference that the group supersedes the probability model. Tjur's remarks capture the spirit of what I am attempting to do. In the cumulative logit model, it is clear intuitively what is meant by the statement that the parameter of interest should not depend on how the cutpoints are selected. As is often the case, what is intuitively clear is not so easy to express in mathematical terms. It does not mean that the maximum-likelihood estimate is unaffected by this choice. For that reason, although Tjur's second condition on overdispersion models has a certain appeal, I do not think it carries the same force as the first. His description of natural subparameters in regression is a model of clarity.
  • given the values on the contour (Matheron, 1971). Both processes are also conformal, but the similarity ends there. The set of conformal processes is also closed under addition of independent processes. Thus, the sum of white noise and W is conformal but not Markov. Bey ond convolutions of white noise and
  • W, it appears most unlikely that there exists another conformal process with Gaussian increments. Whittle's (1954) family of stationary Gaussian processes has the Markov property [Chilès and Delfiner (1999)] but the family is not closed under conformal maps nor under convolution.
  • CHILÈS, J.-P. and DELFINER, P. (1999). Geostatistics. Wiley, New York.
  • FEy NMAN, R. P., LEIGHTON, R. B. and SANDS, M. (1964). The Fey nman Lectures on physics. Addison-Wesley, Reading, MA.
  • FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York.
  • HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881.
  • KINGMAN, J. F. C. (1972). On random sequences with spherical sy mmetry. Biometrika 59 492-494.
  • MACCULLAGH, J. (1839). An essay towards the dy namical theory of cry stalline reflexion and refraction. Trans. Roy. Irish Academy 21 17-50.
  • MATHERON, G. (1971). The theory of regionalized variables and its applications. Cahiers du Centre de Morphologie Mathématique de Fontainbleu 5.
  • WHITTLE, P. (1954). On stationary processes in the plane. Biometrika 41 434-449.
  • WICHURA, M. (2001). Some de Finetti ty pe theorems. Preprint.
  • CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu

See also

  • Includes: Julian Besag. Comment.
  • Includes: Peter J. Bickel. Comment.
  • Includes: Hans Brøns. Comment.
  • Includes: D. A. S. Fraser, N. Reid. Comment.
  • Includes: Inge S. Helland. Comment.
  • Includes: Peter J. Huber. Comment.
  • Includes: Rudolf Kalman. Comment.
  • Includes: Steve Pincus. Comment.
  • Includes: Joe Tjur. Comment.
  • Includes: Peter McCullagh. Rejoinder.