Electronic Journal of Statistics

A scale-based approach to finding effective dimensionality in manifold learning

Xiaohui Wang and J. S. Marron

Full-text: Open access

Abstract

The discovering of low-dimensional manifolds in high-dimensional data is one of the main goals in manifold learning. We propose a new approach to identify the effective dimension (intrinsic dimension) of low-dimensional manifolds. The scale space viewpoint is the key to our approach enabling us to meet the challenge of noisy data. Our approach finds the effective dimensionality of the data over all scale without any prior knowledge. It has better performance compared with other methods especially in the presence of relatively large noise and is computationally efficient.

Article information

Source
Electron. J. Statist., Volume 2 (2008), 127-148.

Dates
First available in Project Euclid: 17 March 2008

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1205761031

Digital Object Identifier
doi:10.1214/07-EJS137

Mathematical Reviews number (MathSciNet)
MR2386090

Zentralblatt MATH identifier
1320.62115

Keywords
manifold learning intrinsic dimension scale space hypothesis test multivariate analysis

Citation

Wang, Xiaohui; Marron, J. S. A scale-based approach to finding effective dimensionality in manifold learning. Electron. J. Statist. 2 (2008), 127--148. doi:10.1214/07-EJS137. https://projecteuclid.org/euclid.ejs/1205761031


Export citation

References

  • [1] Balasubramanian, M., Schwartz, E.L., (2002). The Isomap algorithm and topologi al stability., Science, 295, 7a.
  • [2] Becker, R., Chambers, J., and Wilks, A., (1988), The New S Language. Belmont, CA: Wadsworth.
  • [3] Bruske, J., Sommer, G. (1998). Intrinsic dimensionality estimation with optimally topology preserving maps., IEEE Trans. on PAMI, 20(5), 572–575.
  • [4] Camastra, F., Vinciarell, A., (2002). Estimating the intrisic dimension of data with a fractal-based approach., IEEE Trans. on PAMI, 24(10), 1404–1407.
  • [5] Chaudhuri, P., Marron, J.S., (1999). SiZer for exploration of structures in curves., Journal of the American Statistical Association, 94, 807–823.
  • [6] Costa, J., Hero, A.O., (2004). Geodisic entropic graphs for dimension and entropy estimation in manifold learning., IEEE Trans. on Signal Processing, to appear.
  • [7] Devroye, L., Györfi, L., Lugosi, G., (1996)., A probabilistic theory of pattern recognition. Springer.
  • [8] DeMers, D., and Cottrell, G., (1993). Nonliear dimensionality reduction., Advances in Neural Information Processing System, 5, 580–587.
  • [9] Donoho, D.L., Grimes, C., (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data., PNAS, 100, no. 10, 5591-5596.
  • [10] Fukunaga, K., Olsen, D.R., (1971). An algorithm for finding intrinsic dimensionality of data., IEEE Trans. on Computers, C-20, 176–183.
  • [11] Gmamadesolam. R., (1977)., Methods for statistical data analysis of multivariate observations. John Wiley & Sons.
  • [12] Hastie, T. (1984). Principal curves and surfaces. Technical report, Standord University, Dept. of, Statistics.
  • [13] Hastie, T., Stuetzle. W., (1989). Principal curves., Journal of the American Statistical Association, 84, no. 406, 502–516.
  • [14] Irie, b., and Kawato, M., (1990). Acquisition of internal representation by multi-layered perception., IEICE Trans. Inf. & Syst. (Japanese Edition), vol. J73-D-II, no. 8, 1173–1178.
  • [13] Jones, M.C., Marron, J.S., and Sheather, S.J., (1996). A brief survey of bandwidth selection for density estimation., Journal of the American Statistical Association, 91, no. 433, 401–407.
  • [15] LeBlanc, M., Tibshirani, B., (1994). Adaptive principal surfaces. In, Journal of the American Statisitical Association, 89, no. 425, 53–64.
  • [16] Levina, E., Bickel, P.J., (2005). Maximum likelihood estimation of intrisic dimension. In, Advances in NIPS, 17, to appear.
  • [17] Lindeberg, T., (1993)., Scale-space theory in computer vision. Kluwer Academic Publishers.
  • [18] Grassberger, P., Procaccia, I., (1983) Measuring the strangeness of strange attactors. Physica, D9, 189–208.
  • [19] Ramsay, J.O., Silverman, S.W., (2002)., Applied Functional Data Analysis, Springer, New York.
  • [20] ter Haar Romeny, B.M., (2002)., Front-end vision and multi-scale image analysis. Kluwer Academic Publishers.
  • [21] Roweis, S., Saul, L., (2000). Nonlinear dimensionality reduction by locally linear embedding., Science, 290, 2323-2326.
  • [21] Shepare, R.N., (1974). Representation of structure in similarity data: problems and prospects., Psychometrika, vol 39, No. 4, 373-421.
  • [22] Schölkipf, B., Smola, A., and Müller, K., (1998). Nonlinear component analysis as a kernel eigenvalue problem., Neural Comput., vol. 10, no. 5, 1299–1319.
  • [23] Smith, R.L., (1992). Optimal estimation of fractal dimension., Nonlinear Modeling and Forecasting, SFI in the Sciences of Complexity, Proc., Vol. XII, Eds. m. Casdagli & S. Eubank, Addison-Wesley, 115-135.
  • [24] Smith, R.L., (1992). Estimating dimension in noisy chaotic time series., Journal of the Royal Statistical Society Series B-Statistical Methodology, 54, 329-351.
  • [25] Tenenbaum, J.B., de Silva, V., Langford, J.C., (2000). A global geometric framework for nonlinear dimensionality reduction., Science, 290, 2319–2322.
  • [26] Wang, H., Iyer, H., (2006). Application of local linear embedding to nonlinear exploratory latent structure. to apper in, Psychometrika.
  • [27] Wang, X., (2004) A Scale-Based Approach to Finding Effective Dimensionality., Dissertation.