Source: Ann. Appl. Stat. Volume 4, Number 1
(2010), 340-365.
High-dimensional classification has become an increasingly
important problem. In this paper we propose a “Multivariate
Adaptive Stochastic Search” (MASS) approach which first reduces
the dimension of the data space and then applies a standard
classification method to the reduced space. One key advantage of
MASS is that it automatically adjusts to mimic variable
selection type methods, such as the Lasso, variable combination
methods, such as PCA, or methods that combine these two
approaches. The adaptivity of MASS allows it to perform well in
situations where pure variable selection or variable combination
methods fail. Another major advantage of our approach is that
MASS can accurately project the data into very low-dimensional
non-linear, as well as linear, spaces. MASS uses a stochastic
search algorithm to select a handful of optimal projection
directions from a large number of random directions in each
iteration. We provide some theoretical justification for MASS
and demonstrate its strengths on an extensive range of
simulation studies and real world data sets by comparing it to
many classical and modern classification methods.
References
Candes, E. and Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. J. Roy. Statist. Soc. Ser. B 70 849–911.
Field, C. and Genton, M. G. (2006). The multivariate g-and-h distribution. Technometrics 48 104–111.
George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
George, E. I. and McCulloch, R. E. (1997). Approaches for bayesian variable selection. Statistica Sinica 7 339–373.
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
Gosavi, A. (2003). Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning. Kluwer, Boston.
Knight, L. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
Liberti, L. and Kucherenko, S. (2005). Comparison of deterministic and stochastic approaches to global optimization. Int. Trans. Oper. Res. 12 263–285.
Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression (with discussion). J. Amer. Statist. Assoc. 83 1023–1036.
Mathematical Reviews (MathSciNet):
MR997578
Park, M.-Y. and Hastie, T. (2007). An l1 regularization-path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659–677.
Radchenko, P. and James, G. (2008). Variable inclusion and shrinkage algorithms. J. Amer. Statist. Assoc. 103 1304–1315.
Radchenko, P. and James, G. (2009). Forward-lasso with adaptive shrinkage. Under review.
Reinsch, C. (1967). Smoothing by spline functions. Numer. Math. 10 177–183.
Mathematical Reviews (MathSciNet):
MR295532
Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 290 2323–2326.
Tenenbaum, J., de Silva, V. and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319–2323.
Tian, T. S., Wilcox, R. R. and James, G. M. (2009). Data reduction in classification: A simulated annealing based projection method. Under review.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
Tsybakov, A. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.