The Annals of Applied Statistics

A multivariate adaptive stochastic search method for dimensionality reduction in classification

Tian Siva Tian, Gareth M. James, and Rand R. Wilcox

Full-text: Open access


High-dimensional classification has become an increasingly important problem. In this paper we propose a “Multivariate Adaptive Stochastic Search” (MASS) approach which first reduces the dimension of the data space and then applies a standard classification method to the reduced space. One key advantage of MASS is that it automatically adjusts to mimic variable selection type methods, such as the Lasso, variable combination methods, such as PCA, or methods that combine these two approaches. The adaptivity of MASS allows it to perform well in situations where pure variable selection or variable combination methods fail. Another major advantage of our approach is that MASS can accurately project the data into very low-dimensional non-linear, as well as linear, spaces. MASS uses a stochastic search algorithm to select a handful of optimal projection directions from a large number of random directions in each iteration. We provide some theoretical justification for MASS and demonstrate its strengths on an extensive range of simulation studies and real world data sets by comparing it to many classical and modern classification methods.

Article information

Ann. Appl. Stat., Volume 4, Number 1 (2010), 340-365.

First available in Project Euclid: 11 May 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Dimensionality reduction classification variable selection variable combination Lasso


Tian, Tian Siva; M. James, Gareth; Wilcox, Rand R. A multivariate adaptive stochastic search method for dimensionality reduction in classification. Ann. Appl. Stat. 4 (2010), no. 1, 340--365. doi:10.1214/09-AOAS284.

Export citation


  • Candes, E. and Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
  • Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. J. Roy. Statist. Soc. Ser. B 70 849–911.
  • Field, C. and Genton, M. G. (2006). The multivariate g-and-h distribution. Technometrics 48 104–111.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for bayesian variable selection. Statistica Sinica 7 339–373.
  • Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
  • Gosavi, A. (2003). Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning. Kluwer, Boston.
  • Knight, L. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Liberti, L. and Kucherenko, S. (2005). Comparison of deterministic and stochastic approaches to global optimization. Int. Trans. Oper. Res. 12 263–285.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression (with discussion). J. Amer. Statist. Assoc. 83 1023–1036.
  • Park, M.-Y. and Hastie, T. (2007). An l1 regularization-path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659–677.
  • Radchenko, P. and James, G. (2008). Variable inclusion and shrinkage algorithms. J. Amer. Statist. Assoc. 103 1304–1315.
  • Radchenko, P. and James, G. (2009). Forward-lasso with adaptive shrinkage. Under review.
  • Reinsch, C. (1967). Smoothing by spline functions. Numer. Math. 10 177–183.
  • Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323–2326.
  • Tenenbaum, J., de Silva, V. and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
  • Tian, T. S., Wilcox, R. R. and James, G. M. (2009). Data reduction in classification: A simulated annealing based projection method. Under review.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
  • Tsybakov, A. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.