The Annals of Statistics

Nonlinear confounding in high-dimensional regression

Ker-Chau Li

Full-text: Open access

Abstract

It is not uncommon to find nonlinear patterns in the scatterplots of regressor variables. But how such findings affect standard regression analysis remains largely unexplored. This article offers a theory on nonlinear confounding, a term for describing the situation where a certain nonlinear relationship in regressors leads to difficulties in modeling and related analysis of the data. The theory begins with a measure of nonlinearity between two regressor variables. It is then used to assess nonlinearity between any two projections from the high-dimensional regressor and a method of finding most nonlinear projections is given. Nonlinear confounding is addressed by taking a fresh new look at fundamental issues such as the validity of prediction and inference, diagnostics, regression surface approximation, model uncertainty and Fisher information loss.

Article information

Source
Ann. Statist., Volume 25, Number 2 (1997), 577-612.

Dates
First available in Project Euclid: 12 September 2002

Permanent link to this document
https://projecteuclid.org/euclid.aos/1031833665

Digital Object Identifier
doi:10.1214/aos/1031833665

Mathematical Reviews number (MathSciNet)
MR1439315

Zentralblatt MATH identifier
0873.62071

Subjects
Primary: 62J20: Diagnostics 62J99: None of the above, but in this section

Keywords
Adaptiveness dimension reduction graphics nonlinear regression overlinearization quasi-helical confounding information matrices regression diagnostics semi-parametrics sliced inverse regression

Citation

Li, Ker-Chau. Nonlinear confounding in high-dimensional regression. Ann. Statist. 25 (1997), no. 2, 577--612. doi:10.1214/aos/1031833665. https://projecteuclid.org/euclid.aos/1031833665


Export citation

References

  • ALDRIN, M., BøLVIKEN, E. and SCHWEDER, T. 1993. Projection pursuit regression for moderate non-linearities. Comput. Statist. Data Anal. 16 379 403. Z.
  • BICKEL, P. J., KLASSEN, C. A. J., RITOV, Y. and WELLNER, J. A. 1992. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press.
  • BOX, G. E. P. and COX, D. R. 1964. An analysis of transformations. J. Roy. Statist. Soc. Ser. B 26 211 246. Z.
  • BOX, G. E. P. and DRAPER, N. 1987. Empirical Model-Building and Response Surfaces. Wiley, New York. Z.
  • BREIMAN, L. and FRIEDMAN, J. 1985. Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580 597. Z.
  • BRILLINGER, D. R. 1983. A generalized linear model with ``Gaussian'' regressor variables. In Z A Festschrift for Erich L. Lehmann P. J. Bickel, K. A. Doksum and J. L. Hodges, Jr.,. eds. 97 114. Wadsworth, Z.
  • BRILLINGER, D. R. 1991. Discussion of ``Sliced inverse regression.'' J. Amer. Statist. Assoc. 86 333 333. Z.
  • BUJA, A., HASTIE, T. and TIBSHIRANI, R. 1989. Linear smoothers and additive models. Ann. Statist. 17 453 555.Z.
  • CARROLL, R. J. and LI, K. C. 1992. Measurement error regression with unknown link: dimension reduction and data visualization. J. Amer. Statist. Assoc. 87 1040 1050. Z.
  • CHEN, H. 1991. Estimation of a projection-pursuit ty pe regression model. Ann. Statist. 19 142 157. Z.
  • COOK, R. D. 1993. Exploring partial residual plots. Technometrics 35 351 362. Z.
  • COOK, R. D. 1994. On the interpretation of regression plots. J. Amer. Statist. Assoc. 89 177 189. Z.
  • COOK, R. D. and NACHTSHEIM, C. J. 1994. Re-weighting to achieve elliptically contoured covariates in regression. J. Amer. Statist. Assoc. 89 592 599. Z.
  • COOK, R. D. and WEISBERG, S. 1991. Discussion of ``Sliced inverse regression'' by K. C. Li. J. Amer. Statist. Assoc. 86 328 332. Z.
  • COOK, R. D. and WEISBERG, S. 1994. An Introduction to Regression Graphics. Wiley, New York. Z.
  • COX, D. R. and SNELL, E. J. 1981. Applied Statistics: Principles and Examples. Chapman & Hall, New York. Z.
  • DUAN, N. and LI, K. C. 1991a. Slicing regression: a link-free regression method. Ann. Statist. 19 505 530. Z.
  • DUAN, N. and LI, K. C. 1991b. A bias bound for applying linear regression to a general linear model. Statist. Sinica 1 127 136. Z.
  • FRIEDMAN, J. and STUETZLE, W. 1981. Projection pursuit regression. J. Amer. Statist. Assoc. 76 817 823. Z.
  • GU, C. 1992. Diagnostics for nonparametric regression models with additive terms. J. Amer. Statist. Assoc. 87 1051 1058. Z.
  • HALL, P. 1989. On projection pursuit regression. Ann. Statist. 17 573 588. Z.
  • HALL, P. and LI, K. C. 1993. On almost linearity of low dimensional projections from high dimensional data. Ann. Statist. 21 867 889. Z.
  • HARDLE, W., HALL, P. and ICHIMURA, H. 1993. Optimal smoothing in single-index models. Ann. ¨ Statist. 21 157 178. Z.
  • HARDLE, W. and STOKER, T. 1989. Investigating smooth multiple regression by the method of ¨ average derivatives. J. Amer. Statist. Assoc. 84 986 995. Z.
  • HARRISON, D. and RUBINFELD, D. L. 1978. Hedonic housing prices and the demand for clean air. J. Environmental Economics and Management 5 81 102. Z.
  • HSING, T. and CARROLL, R. J. 1992. Asy mptotic properties of sliced inverse regression. Ann. Statist. 20 1040 1061. Z.
  • LI, K. C. 1990. Data-visualization with SIR: a transformation-based projection pursuit method. Technical report. Z. Z.
  • LI, K. C. 1991. Sliced inverse regression for dimension reduction with discussion. J. Amer. Statist. Assoc. 86 316 342. Z.
  • LI, K. C. 1992a. Uncertainty analysis for mathematical models with SIR. In Probability and Z. Statistics J. Ze-Pei, Y. Shi-Jian, C. Ping and W. Rong, eds. 138 162. World Scientific, Singapore. Z.
  • LI, K. C. 1992b. On principal Hessian directions for data visualization and dimension reduction: another application of Stein's lemma. J. Amer. Statist. Assoc. 87 1025 1039.
  • LI, K. C. and DUAN, N. 1989. Regression analysis under link violation. Ann. Statist. 17 1009 1052. Z.
  • NELDER, J. A. and WEDDERBURN, R. W. M. 1972. Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370 384. Z.
  • SAMAROV, A. 1993. Exploring regression structure using functional estimation. J. Amer. Statist. Assoc. 88 836 847. Z.
  • TIERNEY, L. 1990. LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dy namic Graphics. Wiley, New York. Z.
  • WHITE, H. 1989. Some asy mptotic results for learning in single hidden-lay er feed-forward network models. J. Amer. Statist. Assoc. 84 1003 1013.
  • LOS ANGELES, CALIFORNIA 90024 E-MAIL: kcli@math.ucla.edu