The Annals of Statistics

Regression on manifolds: Estimation of the exterior derivative

Anil Aswani, Peter Bickel, and Claire Tomlin

Full-text: Open access


Collinearity and near-collinearity of predictors cause difficulties when doing regression. In these cases, variable selection becomes untenable because of mathematical issues concerning the existence and numerical stability of the regression coefficients, and interpretation of the coefficients is ambiguous because gradients are not defined. Using a differential geometric interpretation, in which the regression coefficients are interpreted as estimates of the exterior derivative of a function, we develop a new method to do regression in the presence of collinearities. Our regularization scheme can improve estimation error, and it can be easily modified to include lasso-type regularization. These estimators also have simple extensions to the “large p, small n” context.

Article information

Ann. Statist., Volume 39, Number 1 (2011), 48-81.

First available in Project Euclid: 3 December 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression 58A10: Differential forms
Secondary: 62G20: Asymptotic properties 62J07: Ridge regression; shrinkage estimators

Nonparametric regression manifold collinearity model selection regularization


Aswani, Anil; Bickel, Peter; Tomlin, Claire. Regression on manifolds: Estimation of the exterior derivative. Ann. Statist. 39 (2011), no. 1, 48--81. doi:10.1214/10-AOS823.

Export citation


  • [1] Aitchison, P. W. (1982). Generalized inverse matrices and their applications. Internat. J. Math. Ed. Sci. Tech. 13 99–109.
  • [2] Andersson, M. (2009). A comparison of nine PLS1 algorithms. J. Chemometrics 23 518–529.
  • [3] Aswani, A., Bickel, P. and Tomlin, C. (2009). Statistics for sparse, high-dimensional, and nonparametric system identification. In IEEE International Conference on Robotics and Automation 2133–2138. IEEE Press, Piscataway, NJ.
  • [4] Aswani, A., Keränen, S., Brown, J., Fowlkes, C., Knowles, D., Biggin, M., Bickel, P. and Tomlin, C. (2010). Nonparametric identification of regulatory interactions from spatial and temporal gene expression data. BMC Bioinformatics 11 413.
  • [5] Belkin, M., Niyogi, P. and Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7 2399–2434.
  • [6] Bhatia, R. (2007). Perturbation Bounds for Matrix Eigenvalues. Classics in Applied Mathematics 53. SIAM, Philadelphia, PA.
  • [7] Bickel, P. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • [8] Bickel, P. and Li, B. (2007). Local polynomial regression on unknown manifolds. In Complex Datasets and Inverse Problems: Tomography, Networks and Beyond. Institute of Mathematical Statistics Lecture Notes—Monograph Series 54 177–186. Inst. Math. Statist., Beachwood, OH.
  • [9] Bickel, P. and Freedman, D. (1981). Some asymptotic theory for the bootstrap. Ann. Statist. 9 1196–1217.
  • [10] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [11] Carroll, R., Maca, J. and Ruppert, D. (1999). Nonparametric regression in the presence of measurement error. Biometrika 86 541–554.
  • [12] Costa, J. and Hero, A. (2004). Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Signal Process. 52 2210–2221.
  • [13] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [14] Fan, J. and Truong, Y. (1993). Nonparametric regression with errors in variables. Ann. Statist. 21 1900–1925.
  • [15] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. Monographs on Statistics and Applied Probability 66. Chapman and Hall, London.
  • [16] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–135.
  • [17] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
  • [18] Fu, W. (2000). Ridge estimator in singular design with application to age-period-cohort analysis of disease rates. Comm. Statist. Theory Methods 29 263–278.
  • [19] Fu, W. (2008). A smooth cohort model in age-period-cohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Sociol. Methods Res. 36 327–361.
  • [20] Geyer, C. (1994). On the asymptotics of constrained m-estimation. Ann. Statist. 22 1993–2010.
  • [21] Goldberg, Y., Zakai, A., Kushnir, D. and Ritov, Y. (2008). Manifold learning: The price of normalization. J. Mach. Learn. Res. 9 1909–1939.
  • [22] Golub, G. and Van Loan, C. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • [23] Hein, M. and Audibert, J.-Y. (2005). Intrinsic dimensionality estimation of submanifolds in ℝd. In International Conference on Machine Learning 289–296. ACM, New York.
  • [24] Helland, I. (1988). On the structure of partial least squares regression. Comm. Statist. Simulation Comput. 17 581–607.
  • [25] Hocking, R. (1976). The analysis and selection of variables in linear regression. Biometrics 32 431–453.
  • [26] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 8 27–51.
  • [27] Huffel, S. V. and Vandewalle, J. (1991). The Total Least Squares Problem: Computational Aspects and Analysis. SIAM, Philadelphia, PA.
  • [28] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • [29] Karoui, N. E. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [30] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [31] Kritchman, S. and Nadler, B. (2008). Determining the number of components in a factor model from limited noisy data. Chemometrics and Intelligent Laboratory Systems 94 19–32.
  • [32] Lafferty, J. and Wasserman, L. (2006). Rodeo: Sparse nonparametric regression in high dimensions. In Advances in Neural Information Processing Systems (NIPS) 18 707–714. MIT Press, Cambridge, MA.
  • [33] Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • [34] Lee, J. (2003). Introduction to Smooth Manifolds. Springer, New York.
  • [35] Levina, E. and Bickel, P. (2005). Maximum likelihood estimation of intrinsic dimension. In Advances in NIPS 17 777–784. MIT Press, Cambridge, MA.
  • [36] Lugosi, G. (2006). Concentration-of-measure inequalities. Technical report, Pompeu Fabra Univ.
  • [37] Massy, W. F. (1965). Principal components regression in exploratory statistical research. J. Amer. Statist. Assoc. 60 234–246.
  • [38] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [39] Meinshausen, N. and Buehlmann, P. (2010). Stability selection. J. Roy. Statist. Soc. Ser. B 72 417–473.
  • [40] Misner, C. W., Thorne, K. S. and Wheeler, J. A. (1973). Gravitation. W. H. Freeman and Co., San Francisco, CA.
  • [41] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • [42] Ng, A. (1997). Preventing overfitting of cross-validation data. In 14th International Conference on Machine Learning 245–253. Morgan Kaufmann, San Francisco, CA.
  • [43] Niyogi, P. (2008). Manifold regularization and semi-supervised learning: Some theoretical analyses. Technical Report TR-2008-01, Univ. Chicago, Computer Science Dept.
  • [44] Rao, R., Fung, G. and Rosales, R. (2008). On the dangers of cross-valdation: An experimental evaluation. In SIAM Data Mining. SIAM, Philadelphia, PA.
  • [45] Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. J. Mach. Learn. Res. 3 1371–1382.
  • [46] Ruppert, D. and Wand, M. (1994). Multivariate locally weighted least squares regression. Ann. Statist. 22 1346–1370.
  • [47] Sastry, S. (1999). Nonlinear Systems. Springer, New York.
  • [48] Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486–494.
  • [49] Shao, J. (1994). Bootstrap sample size in nonregular cases. Proc. Amer. Math. Soc. 122 1251–1262.
  • [50] Shao, J. (1996). Bootstrap model selection. J. Amer. Statist. Assoc. 91 655–665.
  • [51] Spivak, M. (1965). Calculus on Manifolds. A Modern Approach to Classical Theorems of Advanced Calculus. W. A. Benjamin, Inc., New York.
  • [52] Stewart, G. and Sun, J. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • [53] Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
  • [54] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [55] Wold, H. (1975). Soft modeling by latent variables: the nonlinear iterative partial least squares approach. In Perspectives in Probability and Statistics, Papers in Honour of M. S. Bartlett (J. Gani, ed.) 117–142. Univ. Sheffield, Sheffield.
  • [56] Wright, J., Yang, A., Ganesh, A., Sastry, S. and Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 210–227.
  • [57] Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Statist. 2 224–244.
  • [58] Yang, Y., Fu, W. and Land, K. (2004). A methodological comparison of age-period-cohort models: The intrinsic estimator and conventional generalized linear models. Sociological Methodology 34 75–110.
  • [59] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [60] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
  • [61] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.