• Bernoulli
  • Volume 16, Number 1 (2010), 181-207.

Learning gradients on manifolds

Sayan Mukherjee, Qiang Wu, and Ding-Xuan Zhou

Full-text: Open access


A common belief in high-dimensional data analysis is that data are concentrated on a low-dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high-dimensional data with few observations. We obtain generalization error bounds for the gradient estimates and show that the convergence rate depends on the intrinsic dimension of the manifold and not on the dimension of the ambient space. We illustrate the efficacy of this approach empirically on simulated and real data and compare the method to other dimension reduction procedures.

Article information

Bernoulli Volume 16, Number 1 (2010), 181-207.

First available in Project Euclid: 12 February 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

classification feature selection manifold learning regression shrinkage estimator Tikhonov regularization


Mukherjee, Sayan; Wu, Qiang; Zhou, Ding-Xuan. Learning gradients on manifolds. Bernoulli 16 (2010), no. 1, 181--207. doi:10.3150/09-BEJ206.

Export citation


  • [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • [2] Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15 1373–1396.
  • [3] Bickel, P. and Li, B. (2007). Local polynomial regression on unknown manifolds. In Complex Datasets and Inverse Problems: Tomography, Networks and Beyond (R. Liu, W. Strawderman and C.-H. Zhang, eds.). IMS Lecture Notes–Monograph Series 54 177–186. Beachwood, OH: Inst. Math. Statist.
  • [4] Chen, S., Donoho, D. and Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • [5] Cook, R. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455–474.
  • [6] Cook, R. and Weisberg, S. (1991). Discussion of “sliced inverse regression for dimension reduction”. J. Amer. Statist. Assoc. 86 328–332.
  • [7] do Carmo, M.P. (1992). Riemannian Geometry. Boston, MA: Birkhäuser.
  • [8] Donoho, D. and Grimes, C. (2003). Hessian eigenmaps: New locally linear embedding techniques for highdimensional data. Proc. Natl. Acad. Sci. 100 5591–5596.
  • [9] Giné, E. and Koltchinskii, V. (2005). Empirical graph Laplacian approcimation of Laplace–Beltrami operators: Large sample results. In High Dimensional Probability IV (E. Giné, V. Koltchinskii, W. Li and J. Zinn, eds.). Beachwood, OH: Birkhäuser.
  • [10] Golub, G. and Loan, C.V. (1983). Matrix Computations. Baltimore, MD: Johns Hopkins Univ. Press.
  • [11] Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
  • [12] Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Mach. Learn. 46 389–422.
  • [13] Li, K. (1991). Sliced inverse regression for dimension reduction. J. Amer. Statist. Assoc. 86 316–342.
  • [14] Liang, F., Mukherjee, S. and West, M. (2007). Understanding the use of unlabelled data in predictive modeling. Statist. Sci.. 22 189–205.
  • [15] Mukherjee, S. and Wu, Q. (2006). Estimation of gradients and coordinate covariation in classification. J. Mach. Learn. Res. 7 2481–2514.
  • [16] Mukherjee, S. and Zhou, D. (2006). Learning coordinate covariances via gradients. J. Mach. Learn. Res. 7 519–549.
  • [17] Wu, Q., Maggioni, M., Guinney, J. and Mukherjee, S. (2008). Learning gradients: Predictive models that infer geometry and dependence. Technical report.
  • [18] Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323–2326.
  • [19] Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R. and Sellers, W.R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209.
  • [20] Tenenbaum, J., de Silva, V. and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
  • [21] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [22] West, M. (2003). Bayesian factor regression models in the “large p, small n” paradigm. In Bayesian Statistics 7 (J. Bernardo, M.J. Bayarri and A.P. Dawid, eds.) 723–732. New York: Oxford Univ. Press.
  • [23] Xia, Y., Tong, H., Li, W. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. Roy. Statist. Soc. Ser. B 64 363–410.
  • [24] Ye, G. and Zhou, D. (2008). Learning and approximation by Gaussians on Riemannian manifolds. Adv. Comput. Math. 29 291–310.