The Annals of Statistics

Dimension reduction based on constrained canonical correlation and variable filtering

Jianhui Zhou and Xuming He

Source: Ann. Statist. Volume 36, Number 4 (2008), 1649-1668.

Abstract

The “curse of dimensionality” has remained a challenge for high-dimensional data analysis in statistics. The sliced inverse regression (SIR) and canonical correlation (CANCOR) methods aim to reduce the dimensionality of data by replacing the explanatory variables with a small number of composite directions without losing much information. However, the estimated composite directions generally involve all of the variables, making their interpretation difficult. To simplify the direction estimates, Ni, Cook and Tsai [Biometrika 92 (2005) 242–247] proposed the shrinkage sliced inverse regression (SSIR) based on SIR. In this paper, we propose the constrained canonical correlation (C3) method based on CANCOR, followed by a simple variable filtering method. As a result, each composite direction consists of a subset of the variables for interpretability as well as predictive power. The proposed method aims to identify simple structures without sacrificing the desirable properties of the unconstrained CANCOR estimates. The simulation studies demonstrate the performance advantage of the proposed C3 method over the SSIR method. We also use the proposed method in two examples for illustration.

Primary Subjects: 62J07
Secondary Subjects: 62H20
Keywords: Canonical correlation; dimension reduction; L_1-norm constraint

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1216237295
Digital Object Identifier: doi:10.1214/07-AOS529
Mathematical Reviews number (MathSciNet): MR2435451
Zentralblatt MATH identifier: 1142.62045

References

[1] Chen, C.-H. and Li, K.-C. (1998). Can SIR be as popular as multiple linear regression? Statist. Sinica 8 289–316.
Mathematical Reviews (MathSciNet): MR1624402
[2] Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In 1994 Proceedings of the Section on Physical Engineering Sciences 18–25. Amer. Statist. Assoc., Alexandria, VA.
[3] Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Ann. Statist. 32 1062–1092.
Mathematical Reviews (MathSciNet): MR2065198
Digital Object Identifier: doi:10.1214/009053604000000292
Project Euclid: euclid.aos/1085408495
[4] Cook, R. D. and Critchely, F. (2000). Identifying outliers and regression mixtures graphically. J. Amer. Statist. Assoc. 95 781–794.
[5] Cook, R. D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction” by K. C. Li. J. Amer. Statist. Assoc. 86 328–332.
Mathematical Reviews (MathSciNet): MR1137117
Digital Object Identifier: doi:10.2307/2290563
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Mathematical Reviews (MathSciNet): MR1946581
Digital Object Identifier: doi:10.1198/016214501753382273
[7] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Mathematical Reviews (MathSciNet): MR2065194
Digital Object Identifier: doi:10.1214/009053604000000256
Project Euclid: euclid.aos/1085408491
[8] Fung, W. K., He, X., Liu, L. and Shi, P. (2002). Dimension reduction based on canonical correlation. Statist. Sinica 12 1093–1113.
Mathematical Reviews (MathSciNet): MR1947065
Zentralblatt MATH: 1004.62058
[9] Li, B., Zha, H. and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction. Ann. Statist. 33 1580–1616.
Mathematical Reviews (MathSciNet): MR2166556
Digital Object Identifier: doi:10.1214/009053605000000192
Project Euclid: euclid.aos/1123250223
[10] Li, L. (2007). Sparse sufficient dimension reduction. Biometrika 94 603–613.
[11] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316–327.
Mathematical Reviews (MathSciNet): MR1137117
Digital Object Identifier: doi:10.2307/2290563
[12] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025–1039.
Mathematical Reviews (MathSciNet): MR1209564
Digital Object Identifier: doi:10.2307/2290640
[13] Li, K.-C. (2000) High dimensional data analysis via the SIR/PHD approach. Available at http://www.stat.ucla.edu/~kcli/sir-PHD.pdf.
[14] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009–1052.
Mathematical Reviews (MathSciNet): MR1015136
Digital Object Identifier: doi:10.1214/aos/1176347254
Project Euclid: euclid.aos/1176347254
[15] Muirhead, R. J. and Waternaux, C. M. (1980). Asymptotic distributions in canonical correlation analysis and other multivariate procedures for nonnormal populations. Biometrika 67 31–43.
Mathematical Reviews (MathSciNet): MR570502
Zentralblatt MATH: 0448.62037
Digital Object Identifier: doi:10.1093/biomet/67.1.31
[16] Naik, P. A. and Tsai, C.-L. (2001). Single-index model selections. Biometrika 88 821–832.
Mathematical Reviews (MathSciNet): MR1859412
Zentralblatt MATH: 0988.62042
Digital Object Identifier: doi:10.1093/biomet/88.3.821
[17] Ni, L., Cook, R. D. and Tsai, C.-L. (2005). A note on shrinkage sliced inverse regression. Biometrika 92 242–247.
Mathematical Reviews (MathSciNet): MR2158624
Zentralblatt MATH: 1068.62080
Digital Object Identifier: doi:10.1093/biomet/92.1.242
[18] Shi, P. and Tsai, C.-L. (2002). Regression model selection—a residual likelihood approach. J. Roy. Statist. Soc. Ser. B 64 237–252.
Mathematical Reviews (MathSciNet): MR1904703
Digital Object Identifier: doi:10.1111/1467-9868.00335
[19] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[20] Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. Roy. Statist. Soc. Ser. B 64 363–410.
Mathematical Reviews (MathSciNet): MR1924297
Digital Object Identifier: doi:10.1111/1467-9868.03411
[21] Zhou, J. (2008). Robust dimension reduction based on canonical correlation. Preprint.

2009 © Institute of Mathematical Statistics