We present a new methodology for sufficient dimension reduction (SDR). Our methodology derives directly from the formulation of SDR in terms of the conditional independence of the covariate X from the response Y, given the projection of X on the central subspace [cf. J. Amer. Statist. Assoc. 86 (1991) 316–342 and Regression Graphics (1998) Wiley]. We show that this conditional independence assertion can be characterized in terms of conditional covariance operators on reproducing kernel Hilbert spaces and we show how this characterization leads to an M-estimator for the central subspace. The resulting estimator is shown to be consistent under weak conditions; in particular, we do not have to impose linearity or ellipticity conditions of the kinds that are generally invoked for SDR methods. We also present empirical results showing that the new methodology is competitive in practice.
References
[1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
Mathematical Reviews (MathSciNet):
MR51437
[2] Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
[3] Baker, C. R. (1973). Joint measures and cross-covariance operators. Trans. Amer. Math. Soc. 186 273–289.
Mathematical Reviews (MathSciNet):
MR336795
[4] Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. J. Amer. Statist. Assoc. 80 580–598.
Mathematical Reviews (MathSciNet):
MR803258
[5] Chiaromonte, F. and Cook, R. D. (2002). Sufficient dimension reduction and graphics in regression. Ann. Inst. Statist. Math. 54 768–795.
[6] Cook, R. D. (1998). Regression Graphics. Wiley, New York.
[7] Cook, R. D. and Lee, H. (1999). Dimension reduction in regression with a binary response. J. Amer. Statist. Assoc. 94 1187–1200.
[8] Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455–474.
[9] Cook, R. D. and Weisberg, S. (1991). Discussion of Li. J. Amer. Statist. Assoc. 86 328–332.
[10] Cook, R. D. and Yin, X. (2001). Dimension reduction and visualization in discriminant analysis (with discussion). Aust. N. Z. J. Stat. 43 147–199.
[11] Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach. Chapman and Hall, London.
[12] Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
Mathematical Reviews (MathSciNet):
MR650892
[13] Fukumizu, K., Bach, F. R. and Gretton, A. (2007). Statistical consistency of kernel canonical correlation analysis. J. Mach. Learn. Res. 8 361–383.
[14] Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 5 73–99.
[15] Fukumizu, K., Gretton, A., Sun, X. and Schölkopf, B. (2008). Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems 20 (J. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 489–496. MIT Press, Cambridge, MA.
[16] Gretton, A., Bousquet, O., Smola, A. J. and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In 16th International Conference on Algorithmic Learning Theory (S. Jain, H. U. Simon and E. Tomita, eds.) 63–77. Springer, Berlin.
[17] Groetsch, C. W. (1984). The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Pitman, Boston, MA.
Mathematical Reviews (MathSciNet):
MR742928
[18] Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29 1537–1566.
[19] Kobayashi, S. and Nomizu, K. (1963). Foundations of Differential Geometry, Vol. 1. Wiley, New York.
[20] Lax, P. D. (2002). Functional Analysis. Wiley, New York.
[21] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.
[22] Li, B., Zha, H. and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction. Ann. Statist. 33 1580–1616.
[23] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342.
[24] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025–1039.
[25] Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
Mathematical Reviews (MathSciNet):
MR762984
[26] Reed, M. and Simon, B. (1980). Functional Analysis. Academic Press, New York.
Mathematical Reviews (MathSciNet):
MR751959
[27] Samarov, A. M. (1993). Exploring regression structure using nonparametric functional estimation. J. Amer. Statist. Assoc. 88 836–847.
[28] Sriperumbudur, B., Gretton, A., Fukumizu, K., Lanckriet, G. and Schölkopf, B. (2008). Injective Hilbert space embeddings of probability measures. In Proceedings of the 21st Annual Conference on Learning Theory (COLT 2008) (R. A. Servedio and T. Zhang, eds.) 111–122. Omnipress, Madison, WI.
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
[30] Vakhania, N. N., Tarieladze, V. I. and Chobanyan, S. A. (1987). Probability Distributions on Banach Spaces. Reidel, Dordrecht.
[31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
[32] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
[33] Xia, Y., Tong, H., Li, W. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363–410.
[34] Yin, X. and Bura, E. (2006). Moment-based dimension reduction for multivariate response regression. J. Statist. Plann. Inference 136 3675–3688.
[35] Yin, X. and Cook, R. D. (2005). Direction estimation in single-index regressions. Biometrika 92 371–384.
[36] Zhu, Y. and Zeng, P. (2006). Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Amer. Statist. Assoc. 101 1638–1651.