## The Annals of Statistics

### Nonlinear confounding in high-dimensional regression

Ker-Chau Li

#### Abstract

It is not uncommon to find nonlinear patterns in the scatterplots of regressor variables. But how such findings affect standard regression analysis remains largely unexplored. This article offers a theory on nonlinear confounding, a term for describing the situation where a certain nonlinear relationship in regressors leads to difficulties in modeling and related analysis of the data. The theory begins with a measure of nonlinearity between two regressor variables. It is then used to assess nonlinearity between any two projections from the high-dimensional regressor and a method of finding most nonlinear projections is given. Nonlinear confounding is addressed by taking a fresh new look at fundamental issues such as the validity of prediction and inference, diagnostics, regression surface approximation, model uncertainty and Fisher information loss.

#### Article information

Source
Ann. Statist., Volume 25, Number 2 (1997), 577-612.

Dates
First available in Project Euclid: 12 September 2002

https://projecteuclid.org/euclid.aos/1031833665

Digital Object Identifier
doi:10.1214/aos/1031833665

Mathematical Reviews number (MathSciNet)
MR1439315

Zentralblatt MATH identifier
0873.62071

Subjects
Primary: 62J20: Diagnostics 62J99: None of the above, but in this section

#### Citation

Li, Ker-Chau. Nonlinear confounding in high-dimensional regression. Ann. Statist. 25 (1997), no. 2, 577--612. doi:10.1214/aos/1031833665. https://projecteuclid.org/euclid.aos/1031833665

#### References

• ALDRIN, M., BøLVIKEN, E. and SCHWEDER, T. 1993. Projection pursuit regression for moderate non-linearities. Comput. Statist. Data Anal. 16 379 403. Z.
• BICKEL, P. J., KLASSEN, C. A. J., RITOV, Y. and WELLNER, J. A. 1992. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press.
• BOX, G. E. P. and COX, D. R. 1964. An analysis of transformations. J. Roy. Statist. Soc. Ser. B 26 211 246. Z.
• BOX, G. E. P. and DRAPER, N. 1987. Empirical Model-Building and Response Surfaces. Wiley, New York. Z.
• BREIMAN, L. and FRIEDMAN, J. 1985. Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580 597. Z.
• BRILLINGER, D. R. 1983. A generalized linear model with Gaussian'' regressor variables. In Z A Festschrift for Erich L. Lehmann P. J. Bickel, K. A. Doksum and J. L. Hodges, Jr.,. eds. 97 114. Wadsworth, Z.
• BRILLINGER, D. R. 1991. Discussion of Sliced inverse regression.'' J. Amer. Statist. Assoc. 86 333 333. Z.
• BUJA, A., HASTIE, T. and TIBSHIRANI, R. 1989. Linear smoothers and additive models. Ann. Statist. 17 453 555.Z.
• CARROLL, R. J. and LI, K. C. 1992. Measurement error regression with unknown link: dimension reduction and data visualization. J. Amer. Statist. Assoc. 87 1040 1050. Z.
• CHEN, H. 1991. Estimation of a projection-pursuit ty pe regression model. Ann. Statist. 19 142 157. Z.
• COOK, R. D. 1993. Exploring partial residual plots. Technometrics 35 351 362. Z.
• COOK, R. D. 1994. On the interpretation of regression plots. J. Amer. Statist. Assoc. 89 177 189. Z.
• COOK, R. D. and NACHTSHEIM, C. J. 1994. Re-weighting to achieve elliptically contoured covariates in regression. J. Amer. Statist. Assoc. 89 592 599. Z.
• COOK, R. D. and WEISBERG, S. 1991. Discussion of Sliced inverse regression'' by K. C. Li. J. Amer. Statist. Assoc. 86 328 332. Z.
• COOK, R. D. and WEISBERG, S. 1994. An Introduction to Regression Graphics. Wiley, New York. Z.
• COX, D. R. and SNELL, E. J. 1981. Applied Statistics: Principles and Examples. Chapman & Hall, New York. Z.
• DUAN, N. and LI, K. C. 1991a. Slicing regression: a link-free regression method. Ann. Statist. 19 505 530. Z.
• DUAN, N. and LI, K. C. 1991b. A bias bound for applying linear regression to a general linear model. Statist. Sinica 1 127 136. Z.
• FRIEDMAN, J. and STUETZLE, W. 1981. Projection pursuit regression. J. Amer. Statist. Assoc. 76 817 823. Z.
• GU, C. 1992. Diagnostics for nonparametric regression models with additive terms. J. Amer. Statist. Assoc. 87 1051 1058. Z.
• HALL, P. 1989. On projection pursuit regression. Ann. Statist. 17 573 588. Z.
• HALL, P. and LI, K. C. 1993. On almost linearity of low dimensional projections from high dimensional data. Ann. Statist. 21 867 889. Z.
• HARDLE, W., HALL, P. and ICHIMURA, H. 1993. Optimal smoothing in single-index models. Ann. ¨ Statist. 21 157 178. Z.
• HARDLE, W. and STOKER, T. 1989. Investigating smooth multiple regression by the method of ¨ average derivatives. J. Amer. Statist. Assoc. 84 986 995. Z.
• HARRISON, D. and RUBINFELD, D. L. 1978. Hedonic housing prices and the demand for clean air. J. Environmental Economics and Management 5 81 102. Z.
• HSING, T. and CARROLL, R. J. 1992. Asy mptotic properties of sliced inverse regression. Ann. Statist. 20 1040 1061. Z.
• LI, K. C. 1990. Data-visualization with SIR: a transformation-based projection pursuit method. Technical report. Z. Z.
• LI, K. C. 1991. Sliced inverse regression for dimension reduction with discussion. J. Amer. Statist. Assoc. 86 316 342. Z.
• LI, K. C. 1992a. Uncertainty analysis for mathematical models with SIR. In Probability and Z. Statistics J. Ze-Pei, Y. Shi-Jian, C. Ping and W. Rong, eds. 138 162. World Scientific, Singapore. Z.
• LI, K. C. 1992b. On principal Hessian directions for data visualization and dimension reduction: another application of Stein's lemma. J. Amer. Statist. Assoc. 87 1025 1039.
• LI, K. C. and DUAN, N. 1989. Regression analysis under link violation. Ann. Statist. 17 1009 1052. Z.
• NELDER, J. A. and WEDDERBURN, R. W. M. 1972. Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370 384. Z.
• SAMAROV, A. 1993. Exploring regression structure using functional estimation. J. Amer. Statist. Assoc. 88 836 847. Z.
• TIERNEY, L. 1990. LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dy namic Graphics. Wiley, New York. Z.
• WHITE, H. 1989. Some asy mptotic results for learning in single hidden-lay er feed-forward network models. J. Amer. Statist. Assoc. 84 1003 1013.
• LOS ANGELES, CALIFORNIA 90024 E-MAIL: kcli@math.ucla.edu