## Electronic Journal of Statistics

### Slice inverse regression with score functions

#### Abstract

We consider non-linear regression problems where we assume that the response depends non-linearly on a linear projection of the covariates. We propose score function extensions to sliced inverse regression problems, both for the first- order and second-order score functions. We show that they provably improve estimation in the population case over the non-sliced versions and we study finite sample estimators and their consistency given the exact score functions. We also propose to learn the score function as well, in two steps, i.e., first learning the score function and then learning the effective dimension reduction space, or directly, by solving a convex optimization problem regularized by the nuclear norm. We illustrate our results on a series of experiments.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1507-1543.

Dates
First available in Project Euclid: 21 May 2018

https://projecteuclid.org/euclid.ejs/1526889626

Digital Object Identifier
doi:10.1214/18-EJS1428

Mathematical Reviews number (MathSciNet)
MR3804844

Zentralblatt MATH identifier
06875407

Subjects
Primary: 62J02: General nonlinear regression 62G20: Asymptotic properties
Secondary: 62G05: Estimation

#### Citation

Babichev, Dmitry; Bach, Francis. Slice inverse regression with score functions. Electron. J. Statist. 12 (2018), no. 1, 1507--1543. doi:10.1214/18-EJS1428. https://projecteuclid.org/euclid.ejs/1526889626

#### References

• [1] A. Hyvärinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, 6 (2005), p. 695–709.
• [2] G. Arfken, Divergence, in Mathematical Methods for Physicists, Academic Press, Orlando, FL, 1985, ch. 1.7, pp. 37–42.
• [3] A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Machine Learning, 73 (2008), pp. 243–272.
• [4] A. Argyriou, T. Evgeniou, and M. Pontil, Convex multi-task feature learning, Machine Learning, 73 (2008), pp. 243–272.
• [5] S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, Oxford University Press, 2013.
• [6] D. R. Brillinger, A Generalized Linear Model with ‘Gaussian’ Regressor Variables, in A Festschrift for Erich L. Lehmann, K. D. P.J. Bickel and J. Hodges, eds., Woodsworth International Group, Belmont, California, 1982.
• [7] S. Cambanis, S. Huang, and G. Simons, On the Theory of Elliptically Contoured Distributions, Journal of Multivariate Analysis, 11(3) (1981), pp. 368–385.
• [8] R. D. Cook, Save: a method for dimension reduction and graphics in regression, Communications in Statistics - Theory and Methods, 29 (2000), pp. 2109–2121.
• [9] R. D. Cook and H. Lee, Dimension Reduction in Binary Response Regression, Journal of the American Statistical Association, 94 (1999), pp. 1187–1200.
• [10] R. D. Cook and S. Weisberg, Discussion of ‘Sliced Inverse Regression’ by K. C. Li, Journal of the American Statistical Association, 86 (1991), pp. 328–332.
• [11] A. S. Dalalyan, A. Juditsky, and V. Spokoiny, A new algorithm for estimating the effective dimension-reduction subspace, Journal of Machine Learning Research, 9 (2008), pp. 1647–1678.
• [12] N. Duan and K.-C. Li, Slicing regression: a link-free regression method, The Annals of Statistics, 19 (1991), pp. 505–530.
• [13] K. Fukumizu, F. R. Bach, and M. I. Jordan, Kernel dimension reduction in regression, The Annals of Statistics, 37 (2009), pp. 1871–1905.
• [14] L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A distribution-free theory of nonparametric regression, Springer series in statistics, Springer, New York, 2002.
• [15] J. Hooper, Simultaneous Equations and Canonical Correlation Theory, Econometrica, 27 (1959), pp. 245–256.
• [16] M. Hristache, A. Juditsky, and V. Spokoiny, Direct estimation of the index coefficient in a single index model, The Annals of Statistics, 29(3) (2001), pp. 595–623.
• [17] T. Hsing and R. J. Carroll, An asymptotic theory for sliced inverse regression, The Annals of Statistics, 20(2) (1992), pp. 1040–1061.
• [18] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, vol. 46, John Wiley & Sons, 2004.
• [19] M. Janzamin, H. Sedghi, and A. Anandkumar, Score function features for discriminative learning: Matrix and tensor framework, CoRR, abs/1412.2863 (2014).
• [20] M. Janzamin, H. Sedghi, and A. Anandkumar, Generalization Bounds for Neural Networks through Tensor Factorization, CoRR, abs/1506.08473 (2015).
• [21] K.-C. Li, Sliced Inverse Regression for Dimensional Reduction, Journal of the American Statistical Association, 86 (1991), pp. 316–327.
• [22] K.-C. Li, On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein’s Lemma, Journal of the American Statistical Association, 87 (1992), pp. 1025–1039.
• [23] K.-C. Li and N. Duan, Regression analysis under link violation, The Annals of Statistics, 17 (1989), p. 1009–1052.
• [24] Q. Lin, Z. Zhao, and J. S. Liu, On consistency and sparsity for sliced inverse regression in high dimensions, The Annals of Statistics, 46 (2018), pp. 580–610.
• [25] M. McDonald, A. M. and Pontil and S. Stamos, Spectral $k$-support norm regularization, in Advances in Neural Information Processing Systems, 2014.
• [26] C. Stein, Estimation of the Mean of a Multivariate Normal Distribution, The Annals of Statistics, 9 (1981), pp. 1135–1151.
• [27] G. Stewart and J.-G. Sun, Matrix perturbation theory (computer science and scientific computing), 1990.
• [28] T. Stoker, Consistent estimation of scaled coefficients, Econometrica, 54 (1986), p. 1461–1481.
• [29] A. B. Tsybakov, Introduction to Nonparametric Estimation, Springer, 2009.
• [30] V. Q. Vu, J. Lei, et al., Minimax sparse principal subspace estimation in high dimensions, The Annals of Statistics, 41 (2013), pp. 2905–2947.
• [31] H. Wang and Y. Xia, On directional regression for dimension reduction, in J. Amer. Statist. Ass, Citeseer, 2007.
• [32] H. Wang and Y. Xia, Sliced regression for dimension reduction, Journal of the American Statistical Association, 103 (2008), pp. 811–821.
• [33] J. W.F. Donoghue, Monotone Matrix Functions and Analytic Continuation, Springer, 1974.
• [34] Y. Xia, H. Tong, W. Li, and L.-X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), pp. 363–410.
• [35] Y. Xia, H. Tong, W. K. Li, and L.-X. Zhu, An adaptive estimation of dimension reduction space, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (2002), pp. 363–410.
• [36] S. S. Yang, General distribution theory of the concomitants of order statistics, The Annals of Statistics, 5 (1977), pp. 996–1002.
• [37] Y. Yu, T. Wang, R. J. Samworth, et al., A useful variant of the davis–kahan theorem for statisticians, Biometrika, 102 (2015), pp. 315–323.
• [38] M. Yuan, On the identifiability of additive index models, Statistica Sinica, 21 (2011), pp. 1901–1911.
• [39] L.-X. Zhu and K. W. Ng, Asymptotics of sliced inverse regression, Statistica Sinica, 5 (1995), pp. 727–736.