## Statistical Science

### Fisher Lecture: Dimension Reduction in Regression

R. Dennis Cook

#### Abstract

Beginning with a discussion of R. A. Fisher’s early written remarks that relate to dimension reduction, this article revisits principal components as a reductive method in regression, develops several model-based extensions and ends with descriptions of general approaches to model-based and model-free dimension reduction in regression. It is argued that the role for principal components and related methodology may be broader than previously seen and that the common practice of conditioning on observed values of the predictors may unnecessarily limit the choice of regression methodology.

#### Article information

Source
Statist. Sci., Volume 22, Number 1 (2007), 1-26.

Dates
First available in Project Euclid: 1 August 2007

https://projecteuclid.org/euclid.ss/1185975631

Digital Object Identifier
doi:10.1214/088342306000000682

Mathematical Reviews number (MathSciNet)
MR2408655

Zentralblatt MATH identifier
1246.62149

#### Citation

Cook, R. Dennis. Fisher Lecture: Dimension Reduction in Regression. Statist. Sci. 22 (2007), no. 1, 1--26. doi:10.1214/088342306000000682. https://projecteuclid.org/euclid.ss/1185975631

#### References

• Adcock, R. J. (1878). A problem in least squares. The Analyst 5 53--54.
• Aldrich, J. (2005). Fisher and regression. Statist. Sci. 20 401--417.
• Alter, O., Brown, P. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97 10,101--10,106.
• Anderson, T. W. (1984). Estimating linear statistical relationships. Ann. Statist. 12 1--45.
• Anscombe, F. J. (1961). Examination of residuals. Proc. 4th Berkeley Symp. Math. Statist. Probab. 1 1--36. Univ. California Press, Berkeley.
• Anscombe, F. J. and Tukey, J. W. (1963). The examination and analysis of residuals. Technometrics 5 141--160.
• Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In Robustness in Statistics (R. Launer and G. Wilkinson, eds.) 201--235. Academic Press, New York.
• Box, G. E. P. (1980). Sampling and Bayes' inference in scientific modeling and robustness (with discussion). J. Roy. Statist. Soc. Ser. A 143 383--430.
• Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199--231.
• Bura, E. and Cook, R. D. (2001). Extending sliced inverse regression: The weighted chi-squared test. J. Amer. Statist. Assoc. 96 996--1003.
• Bura, E. and Pfeiffer, R. M. (2003). Graphical methods for class prediction using dimension reduction techniques on DNA microarray data. Bioinformatics 19 1252--1258.
• Chiaromonte, F. and Martinelli, J. (2002). Dimension reduction strategies for analyzing global gene expression data with a response. Math. Biosci. 176 123--144.
• Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statist. 174. Springer, New York.
• Christensen, R. (2001). Advanced Linear Modeling, 2nd ed. Springer, New York.
• Cook, R. D. (1986). Assessment of local influence (with discussion). J. Roy. Statist. Soc. Ser. B 48 133--169.
• Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In Proc. Section on Physical and Engineering Sciences 18--25. Amer. Statist. Assoc., Alexandria, VA.
• Cook, R. D. (1998). Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
• Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. J. Amer. Statist. Assoc. 100 410--428.
• Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.
• Cook, R. D. and Weisberg, S. (1994). An Introduction to Regression Graphics. Wiley, New York.
• Cox, D. R. (1968). Notes on some aspects of regression analysis. J. Roy. Statist. Soc. Ser. A 131 265--279.
• Cox, D. R. (1990). Role of models in statistical analysis. Statist. Sci. 5 169--174.
• de Leeuw, L. (2006). Principal component analysis of binary data by iterated singular value decomposition. Comput. Statist. Data Anal. 50 21--39.
• Edelman, A., Arias, T. A. and Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20 303--353.
• Edgeworth, F. Y. (1884). On the reduction of observations. Philosophical Magazine 135--141.
• Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New York.
• Fearn, T. (1983). A misuse of ridge regression in the calibration of a near infrared reflectance instrument. Appl. Statist. 32 73--79.
• Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309--368.
• Fisher, R. A. (1924). The influence of rainfall on the yield of wheat at Rothamsted. Philos. Trans. Roy. Soc. London Ser. B 213 89--142.
• Fisher, R. A. (1936). Uncertain inference. Proc. American Academy of Arts and Sciences 71 245--258.
• Fisher, R. A. (1941). Statistical Methods for Research Workers, 8th ed. Oliver and Boyd, London.
• George, E. I. and Oman, S. D. (1996). Multiple-shrinkage principal component regression. The Statistician 45 111--124.
• Gould, S. J. (1981). The Mismeasure of Man. Norton, New York.
• Hadi, A. S. and Ling, R. F. (1998). Some cautionary notes on the use of principal components regression. Amer. Statist. 52 15--19.
• Hawkins, D. M. and Fatti, L. P. (1984). Exploring multivariate data using the minor principal components. The Statistician 33 325--338.
• Helland, I. S. (1992). Maximum likelihood regression on relevant components. J. Roy. Statist. Soc. Ser. B 54 637--647.
• Helland, I. S. and Almøy, T. (1994). Comparison of prediction methods when only a few components are relevant. J. Amer. Statist. Assoc. 89 583--591.
• Hocking, R. R. (1976). The analysis and selection of variables in linear regression. Biometrics 32 1--49.
• Hotelling, H. (1933). Analysis of a complex statistical variable into principal components. J. Educational Psychology 24 417--441.
• Hotelling, H. (1957). The relationship of the newer multivariate statistical methods to factor analysis. British J. Statist. Psychology 10 69--79.
• Hwang, J. T. G. and Nettleton, D. (2003). Principal components regression with data-chosen components and related methods. Technometrics 45 70--79.
• Jolliffe, I. T. (1982). A note on the use of principal components in regression. Appl. Statist. 31 300--303.
• Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. Springer, New York.
• Jong, J. and Kotz, S. (1999). On a relation between principal components and regression analysis. Amer. Statist. 53 349--351.
• Kendall, M. G. (1957). A Course in Multivariate Analysis. Griffin, London.
• Lehmann, E. L. (1990). Model specification: The views of Fisher, Neyman, and later developments. Statist. Sci. 5 160--168.
• Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316--342.
• Li, L. and Li, H. (2004). Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20 3406--3412.
• Maronna, R. (2005). Principal components and orthogonal regression based on robust scales. Technometrics 47 264--273.
• Marx, B. D. and Smith, E. P. (1990). Principal component estimation for generalized linear regression. Biometrika 77 23--31.
• McCullagh, P. (2002). What is a statistical model? (with discussion). Ann. Statist. 30 1225--1310.
• Mosteller, F. and Tukey, J. W. (1977). Data Analysis and Regression: A Second Course in Statistics. Addison--Wesley, Reading, MA.
• Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
• Oman, S. D. (1991). Random calibration with many measurements: An application of Stein estimation. Technometrics 33 187--195.
• Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine (6) 2 559--572.
• Rao, C. R. (1962). Use of discriminant and allied functions in multivariate analysis. Sankhyā Ser. A 24 149--154.
• Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441--500.
• Scott, D. (1992). Multivariate Density Estimation. Wiley, New York.
• Seber, G. A. F. (1984). Multivariate Observations. Wiley, New York.
• Spearman, C. (1904). General intelligence,'' objectively determined and measured. Amer. J. Psychology 15 201--292.
• Stigler, S. M. (1973). Studies in the history of probability and statistics. XXXII. Laplace, Fisher and the discovery of the concept of sufficiency. Biometrika 60 439--445.
• Stigler, S. M. (1976). Discussion of On rereading R. A. Fisher,'' by L. J. Savage. Ann. Statist. 4 498--500.
• Stigler, S. M. (2005). Fisher in 1921. Statist. Sci. 20 32--49.
• Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 611--622.
• Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363--410.
• Zhao, L. P. and Prentice, R. L. (1990). Correlated binary regression using a quadratic exponential model. Biometrika 77 642--648.