Statistical Science

Fisher and Regression

John Aldrich

Abstract

In 1922 R. A. Fisher introduced the modern regression model, synthesizing the regression theory of Pearson and Yule and the least squares theory of Gauss. The innovation was based on Fisher’s realization that the distribution associated with the regression coefficient was unaffected by the distribution of X. Subsequently Fisher interpreted the fixed X assumption in terms of his notion of ancillarity. This paper considers these developments against the background of the development of statistical theory in the early twentieth century.

Article information

Source
Statist. Sci. Volume 20, Number 4 (2005), 401-417.

Dates
First available in Project Euclid: 12 January 2006

http://projecteuclid.org/euclid.ss/1137076660

Digital Object Identifier
doi:10.1214/088342305000000331

Mathematical Reviews number (MathSciNet)
MR2210227

Zentralblatt MATH identifier
1130.62300

Citation

Aldrich, John. Fisher and Regression. Statist. Sci. 20 (2005), no. 4, 401--417. doi:10.1214/088342305000000331. http://projecteuclid.org/euclid.ss/1137076660.

References

• Fisher's published papers appear in J. H. Bennett, ed. (1971--1974). Collected Papers of R. A. Fisher, 5 vols. Adelaide Univ. Press. Bennett (? and nearly all of the papers referred to here are available from the University of Adelaide R. A. Fisher Digital Archive at http://www.library.adelaide.edu.au/ual/special/fisher.html.
• Aldrich, J. (1993). Cowles exogeneity and CORE exogeneity. Discussion Paper 9308, Dept. Economics, Southampton Univ.
• Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statist. Sci. 10 364--376.
• Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912--1922. Statist. Sci. 12 162--176.
• Aldrich, J. (1998). Doing least squares: Perspectives from Gauss and Yule. Internat. Statist. Rev. 66 61--81.
• Aldrich, J. (1999). Determinacy in the linear model: Gauss to Bose and Koopmans. Internat. Statist. Rev. 67 211--219.
• Aldrich, J. (2003--2005). A guide to R. A. Fisher. Available at http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/rafreader.htm.
• Aldrich, J. (2003). The language of the English biometric school. Internat. Statist. Rev. 71 109--129.
• Aldrich, J. (2005). The statistical education of Harold Jeffreys. Internat. Statist. Rev. 73 289--308.
• Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
• Barndorff-Nielsen, O. E. and Cox, D. R. (1994). Inference and Asymptotics. Chapman and Hall, London.
• Bartlett, M. S. (1933a). On the theory of statistical regression. Proc. Royal Soc. Edinburgh 53 260--283.
• Bartlett, M. S. (1933b). Probability and chance in the theory of statistics. Proc. Roy. Soc. London Ser. A 141 518--534.
• Bartlett, M. S. (1936). Statistical information and properties of sufficiency. Proc. Roy. Soc. London Ser. A 154 124--137.
• Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. Roy. Soc. London Ser. A 160 268--282.
• Bartlett, M. S. (1940). A note on the interpretation of quasi-sufficiency. Biometrika 31 391--392.
• Bartlett, M. S. (1965). R. A. Fisher and the last fifty years of statistical methodology. J. Amer. Statist. Assoc. 60 395--409.
• Bartlett, M. S. (1981). Egon Sharpe Pearson, 1895--1980. Biometrika 68 1--7.
• Bartlett, M. S. (1982). Chance and change. In The Making of Statisticians (J. Gani, ed.) 42--60. Springer, New York.
• Bennett, J. H., ed. (1990). Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher. Oxford Univ. Press.
• Berkson, J. (1950). Are there two regressions? J. Amer. Statist. Assoc. 45 164--180.
• Birnbaum, A. (1962). On the foundations of statistical inference. J. Amer. Statist. Assoc. 57 269--326.
• Bjerve, S. and Doksum, K. A. (1993). Correlation curves: Measures of association as functions of covariate values. Ann. Statist. 21 890--902.
• Blakeman, J. (1905). On tests for linearity of regression in frequency distributions. Biometrika 4 332--350.
• Blyth, S. (1994). Karl Pearson and the correlation curve. Internat. Statist. Rev. 62 393--403.
• Bowley, A. L. (1901). Elements of Statistics. King, London.
• Box, J. F. (1978). R. A. Fisher: The Life of a Scientist. Wiley, New York.
• Brown, L. D. (1990). An ancillarity paradox which appears in multiple linear regression (with discussion). Ann. Statist. 18 471--538.
• Brunt, D. (1917). The Combination of Observations. Cambridge Univ. Press.
• Campbell, N. (1924). The adjustment of observations. Philosophical Magazine (6) 47 816--826.
• Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372.
• Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London.
• CramÃ©r, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton, NJ.
• Edgeworth, F. Y. (1893). Exercises in the calculation of errors. Philosophical Magazine (5) 36 98--111.
• Eisenhart, C. (1979). On the transition from Student's $z$' to Student's $t$.' Amer. Statist. 33 6--10.
• Elderton, W. P. (1906). Frequency Curves and Correlation. Layton, London.
• Ezekiel, M. (1930). Methods of Correlation Analysis. Wiley, London.
• Farebrother, R. W. (1999). Fitting Linear Relationships: A History of the Calculus of Observations. Springer, New York.
• Fienberg, S. E. (1980). Fisher's contribution to the analysis of categorical data. R. A. Fisher: An Appreciation. Lecture Notes in Statist. 1 75--84. Springer, New York.
• Fienberg, S. E. and Hinkley, D. V., eds. (1980). R. A. Fisher: An Appreciation. Lecture Notes in Statist. 1. Springer, New York.
• Fisher, R. A. (1912). On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41 155--160.
• Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507--521.
• Fisher, R. A. (1921a). On the probable error' of a coefficient of correlation deduced from a small sample. Metron 1 3--32.
• Fisher, R. A. (1921b). Studies in crop variation. I. An examination of the yield of dressed grain from Broadbalk. J. Agricultural Science 11 107--135.
• Fisher, R. A. (1922a). The goodness of fit of regression formulae, and the distribution of regression coefficients. J. Roy. Statist. Soc. 85 597--612.
• Fisher, R. A. (1922b). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309--368.
• Fisher, R. A. (1922c). On the interpretation of $\chi^2$ from contingency tables, and the calculation of $P$. J. Roy. Statist. Soc. 85 87--94.
• Fisher, R. A. (1924--1925). Note on Dr. Campbell's alternative to the method of least squares. Unpublished manuscript, Barr Smith Library, Univ. Adelaide.
• Fisher, R. A. (1924--1928). On a distribution yielding the error functions of several well known statistics. In Proc. International Mathematical Congress 2 805--813. Univ. Toronto Press, Toronto.
• Fisher, R. A. (1925a). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.
• Fisher, R. A. (1925b). Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22 700--725.
• Fisher, R. A. (1925c). Applications of Student's' distribution. Metron 5 90--104.
• Fisher, R. A. (1925d). The influence of rainfall on the yield of wheat at Rothamsted. Philos. Trans. Roy. Soc. London Ser. B 213 89--142.
• Fisher, R. A. (1934). Two new properties of mathematical likelihood. Proc. Roy. Soc. London Ser. A 144 285--307.
• Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39--82.
• Fisher, R. A. (1946). Testing the difference between two means of observations of unequal precision. Nature 158 713.
• Fisher, R. A. (1948). Conclusions fiduciaires. Ann. Inst. H. PoincarÃ© 10 191--213.
• Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69--78.
• Fisher, R. A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh.
• Fisher, R. A. and Mackenzie, W. A. (1923). Studies in crop variation. II. The manurial response of different potato varieties. J. Agricultural Science 13 311--320.
• Fraser, D. A. S. (1992). Introduction to reprint of Properties of sufficiency and statistical tests'' [Bartlett (1937)]. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 1 109--112. Springer, New York.
• Fraser, D. A. S. (2004). Ancillaries and conditional inference (with discussion). Statist. Sci. 19 333--369.
• Galton, F. (1877). Typical laws of heredity. Nature 15 492--495, 512--514, 532--533.
• Galton, F. (1886). Family likeness in stature. Proc. Roy. Soc. London 40 42--73.
• Gauss, C. F. (1809/1963). Theoria Motus Corporum Coelestium (C. H. Davis, transl.). Dover, New York, reprinted 1963.
• Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis. Chapman and Hall, London.
• Hald, A. (1998). A History of Mathematical Statistics from 1750 to 1930. Wiley, New York.
• Hald, A. (1999). On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci. 14 214--222.
• Hinkley, D. V. (1980a). Theory of statistical estimation: The 1925 paper. R. A. Fisher: An Appreciation. Lecture Notes in Statist. 1 85--94. Springer, New York.
• Hinkley, D. V. (1980b). Fisher's development of conditional inference. R. A. Fisher: An Appreciation. Lecture Notes in Statist. 1 101--108. Springer, New York.
• Hooker, R. H. (1907). Correlation of the weather and crops. J. Roy. Statist. Soc. 70 1--51.
• Hotelling, H. (1940). The selection of variates for use in prediction with some comments on the general problem of nuisance parameters. Ann. Math. Statist. 11 271--283.
• Hotelling, H. (1948). Review of The Advanced Theory of Statistics 2, by M. G. Kendall. Bull. Amer. Math. Soc. 54 863--868.
• Howie, D. (2002). Interpreting Probability: Controversies and Developments in the Early Twentieth Century. Cambridge Univ. Press.
• Kalbfleisch, J. (1982). Ancillary statistics. Encyclopedia of Statistical Sciences 1 77--81. Wiley, New York.
• Kendall, M. G. (1946). The Advanced Theory of Statistics 2. Griffin, London.
• Kendall, M. G. (1951). Regression, structure and functional relationship. I. Biometrika 38 11--25.
• KoÅ‚odziejczyk, S. (1935). On an important class of statistical hypotheses. Biometrika 27 161--190.
• Koopmans, T. C. (1937). Linear Regression Analysis of Economic Time Series. Bohn, Haarlem, Netherlands.
• Lancaster, H. O. (1969). The Chi-Squared Distribution. Wiley, New York.
• Lehmann, E. L. (1999). Student' and small-sample theory. Statist. Sci. 14 418--426.
• McMullen, L. (1970). Letters from W. S. Gosset to R. A. Fisher 1915--1936: Summaries by R. A. Fisher with a Foreword by L. McMullen, 2nd ed. Printed by Arthur Guinness for private circulation and placed in a few libraries.
• Merriman, M. (1884/1911). A Textbook on the Method of Least Squares. Wiley, New York. References are to the eighth edition, 1911.
• Miller, J., ed. (1999--2005). Earliest uses of symbols in probability and statistics. Available at http://members.aol.com/jeff570/stat.html.
• Morgan, M. S. (1990). The History of Econometric Ideas. Cambridge Univ. Press.
• Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection (with discussion). J. Roy. Statist. Soc. 97 558--625.
• Neyman, J. and Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. I, II. Biometrika 20A 175--240, 263--294.
• Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289--337.
• Olkin, I. (1989). A conversation with Maurice Bartlett. Statist. Sci. 4 151--163.
• Pearson, E. S. (1926). Review of Statistical Methods for Research Workers, by R. A. Fisher. Science Progress 20 733--734.
• Pearson, E. S. (1990). Student', A Statistical Biography of William Sealy Gosset (R. L. Plackett, ed.; G. A. Barnard, assist.). Oxford Univ. Press.
• Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos. Trans. Roy. Soc. London Ser. A 186 343--414.
• Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia. Philos. Trans. Roy. Soc. London Ser. A 187 253--318.
• Pearson, K. (1899). Mathematical contributions to the theory of evolution. V. On the reconstruction of the stature of prehistoric races. Philos. Trans. Roy. Soc. London Ser. A 192 169--244.
• Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine (5) 50 157--175.
• Pearson, K. (1902a). On the systematic fitting of curves to observations and measurements. I, II. Biometrika 1 265--303, 2 1--23.
• Pearson, K. (1902b). On the mathematical theory of errors of judgment, with special reference to the personal equation. Philos. Trans. Roy. Soc. London Ser. A 198 235--299.
• Pearson, K. (1905). On the general theory of skew correlation and non-linear regression. Drapers' Company Research Memoirs, Biometric Series II. Cambridge Univ. Press.
• Pearson, K., ed. (1914). Biometrika Tables for Statisticians and Biometricians. Cambridge Univ. Press.
• Pearson, K. (1916). On the application of `goodness of fit' tables to test regression curves and theoretical curves used to describe observational or experimental data. Biometrika 11 239--261.
• Pearson, K. (1920). Notes on the history of correlation. Biometrika 13 25--45.
• Pearson, K. (1923). Notes on skew frequency surfaces. Biometrika 15 222--230.
• Pearson, K. (1925). Further contributions to the theory of small samples. Biometrika 17 176--200.
• Pearson, K. (1926). Researches on the mode of distribution of the constants of samples taken at random from a bivariate normal population. Proc. Roy. Soc. London Ser. A 112 1--14.
• Pearson, K., ed. (1931). Tables for Statisticians and Biometricians, Part II. Cambridge Univ. Press.
• Pearson, K., ed. (1934). Tables of the Incomplete Beta-Function. Cambridge Univ. Press.
• Pearson, K. (1935). Thoughts suggested by the papers of Messrs. Welch and KoÅ‚odziejczyk. Biometrika 27 227--259.
• Pearson, K. and Filon, L. N. G. (1898). Mathematical contributions to the theory of evolution. IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philos. Trans. Roy. Soc. London Ser. A 191 229--311.
• Reid, N. (1994). A conversation with Sir David Cox. Statist. Sci. 9 439--455.
• Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138--157, 173--196.
• Sampson, A. R. (1974). A tale of two regressions. J. Amer. Statist. Assoc. 69 682--689.
• Savage, L. J. (1962). Subjective probability and statistical practice. In The Foundations of Statistical Inference: A Discussion (L. J. Savage et al., eds.) 9--35. Methuen, London.
• Schultz, H. (1929). Applications of the theory of error to the interpretation of trends: Discussion. J. Amer. Statist. Assoc. Suppl. 24 86--89.
• Seal, H. (1967). The historical development of the Gauss linear model. Biometrika 54 1--24.
• Seneta, E. (1988). Slutsky (Slutskii), Evgenii Evgenievich. Encyclopedia of Statistical Sciences 8 512--515. Wiley, New York.
• Slutsky, E. E. (1913). On the criterion of goodness of fit of the regression lines and on the best method of fitting them to the data. J. Roy. Statist. Soc. 77 78--84.
• Stigler, S. M. (1986). The History of Statistics. The Measurement of Uncertainty before 1900. Belknap, Cambridge, MA.
• Stigler, S. M. (2001). Ancillary history. In State of the Art in Probability and Statistics: Festschrift for Willem R. van Zwet (M. deGunst, C. Klaassen and A. van der Vaart, eds.) 555--567. IMS, Beachwood, OH.
• Student (1908a). The probable error of a mean. Biometrika 6 1--25.
• Student (1908b). Probable error of a correlation coefficient. Biometrika 6 302--310.
• Student (1926). Review of Statistical Methods for Research Workers, by R. A. Fisher. Eugenics Review 18 148--150.
• Tolley, H. R. and Ezekiel, M. J. B. (1923). A method of handling multiple correlation problems. J. Amer. Statist. Assoc. 18 993--1003.
• Welch, B. L. (1935). Some problems in the analysis of regression among $k$ samples of two variables. Biometrika 27 145--160.
• Welch, B. L. (1939). On confidence limits and sufficiency, with particular reference to parameters of location. Ann. Math. Statist. 10 58--69.
• Working, H. and Hotelling, H. (1929). Applications of the theory of error to the interpretation of trends. J. Amer. Statist. Assoc. Suppl. 24 73--85.
• Yule, G. U. (1897). On the theory of correlation. J. Roy. Statist. Soc. 60 812--854.
• Yule, G. U. (1899). An investigation into the causes of changes in pauperism in England, chiefly during the last two intercensal decades (part I). J. Roy. Statist. Soc. 62 249--295.
• Yule, G. U. (1907). On the theory of correlation for any number of variables, treated by a new system of notation. Proc. Roy. Soc. London Ser. A 79 182--193.
• Yule, G. U. (1909). The applications of the method of correlation to social and economic statistics. J. Roy. Statist. Soc. 72 721--730.
• Yule, G. U. (1911). An Introduction to the Theory of Statistics. Griffin, London.
• Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369--387.