The Annals of Applied Statistics

Structured, sparse regression with application to HIV drug resistance

Daniel Percival, Kathryn Roeder, Roni Rosenfeld, and Larry Wasserman
Source: Ann. Appl. Stat. Volume 5, Number 2A (2011), 628-644.

Abstract

We introduce a new version of forward stepwise regression. Our modification finds solutions to regression problems where the selected predictors appear in a structured pattern, with respect to a predefined distance measure over the candidate predictors. Our method is motivated by the problem of predicting HIV-1 drug resistance from protein sequences. We find that our method improves the interpretability of drug resistance while producing comparable predictive accuracy to standard methods. We also demonstrate our method in a simulation study and present some theoretical results and connections.

First Page: Show Hide
Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1310562198
Digital Object Identifier: doi:10.1214/10-AOAS428
Mathematical Reviews number (MathSciNet): MR2840168
Zentralblatt MATH identifier: 1223.62164

References

Barron, A. R., Cohen, A., Dahmen, W. and DeVore, R. A. (2008). Approximation and learning by greedy algorithms. Ann. Statist. 36 64–94.
Mathematical Reviews (MathSciNet): MR2387964
Zentralblatt MATH: 1138.62019
Digital Object Identifier: doi:10.1214/009053607000000631
Project Euclid: euclid.aos/1201877294
Beerenwinkel, N., Daumer, M., Oette, M., Korn, K., Hoffmann, D., Kaiser, R., Lengauer, T., Selbig, J. and Walter, H. (2003). Geno2pheno: Estimating phenotypic drug resistance from HIV-1 genotypes. Nucleic Acids Res. 31 3850–3855.
Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the lasso. Electron. J. Statist. 1 169–194.
Mathematical Reviews (MathSciNet): MR2312149
Zentralblatt MATH: 1146.62028
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33 1–13.
Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
Mathematical Reviews (MathSciNet): MR2108039
Digital Object Identifier: doi:10.3150/bj/1106314846
Project Euclid: euclid.bj/1106314846
Huang, J., Zhang, T. and Metaxas, D. (2009). Learning with structured sparsity. In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning 417–424. ACM, New York.
Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning 433–440. ACM, New York.
Liu, D., Lin, X. and Ghosh, D. (2007). Semiparametric regression for multi-dimensional genomic pathway data: Least square kernel machines and linear mixed models. Biometrics 63 1079–1088.
Mathematical Reviews (MathSciNet): MR2414585
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00799.x
Petropoulos, C. J., Parkin, N. T., Limoli, K. L., Lie, Y. S., Wrin, T., Huang, W., Tian, H., Smith, D., Winslow, G. A., Capon, D. J. and Whitcomb, J. M. (2000). A novel phenotypic drug susceptibility assay for human immunodeficiency virus type 1. Antimicrobial Agents and Chemotherapy 44 920–928.
Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J. and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31 298–303.
Rhee, S.-Y., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, D. L. and Shafer, R. W. (2006). Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. USA 103 17355–17360.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91–108.
Mathematical Reviews (MathSciNet): MR2136641
Zentralblatt MATH: 1060.62049
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00490.x
Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Available at arXiv:math/0702301v2.
Mathematical Reviews (MathSciNet): MR2597190
Digital Object Identifier: doi:10.1109/TIT.2009.2032816
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
Zhang, T. (2009). On the consistency of feature selection using greedy least squares regression. J. Mach. Learn. Res. 10 555–568.
Mathematical Reviews (MathSciNet): MR2491749
Zentralblatt MATH: 1235.62096
Zhang, J., Rhee, S.-Y., Taylor, J. and Shafer, R. W. (2005). Comparison of the precision and sensitivity of the antivirogram and PhenoSense HIV drug susceptibility assays. Journal of Aquired Immune Deficiency Syndrones 38 439–444.

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?