Electronic Journal of Statistics

Randomized maximum-contrast selection: Subagging for large-scale regression

Jelena Bradic

Full-text: Open access

Abstract

We introduce a general method for variable selection in a large-scale regression setting where both the number of parameters and the number of samples are extremely large. The proposed method is based on careful combination of penalized estimators, each applied to a random projection of the sample space into a low-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks, each consisting of only a small portion of the original data. Within each block we select the projection yielding the smallest out-of-sample error. Our random ensemble estimator then aggregates the results according to a new maximal-contrast voting scheme to determine the final selected set. Our theoretical results illustrate the effect on performance of increasing the number of non-overlapping blocks. Moreover, we demonstrate that statistical optimality is retained along with the computational speedup. The proposed method achieves minimax rates for approximate recovery over all estimators, using the full set of samples. Furthermore, our theoretical results allow the number of subsamples to grow with the subsample size and do not require irrepresentable condition. The estimator is also compared empirically with several other popular high-dimensional estimators via an extensive simulation study, which reveals its excellent finite-sample performance.

Article information

Source
Electron. J. Statist. Volume 10, Number 1 (2016), 121-170.

Dates
Received: July 2015
First available in Project Euclid: 17 February 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1455715959

Digital Object Identifier
doi:10.1214/15-EJS1085

Mathematical Reviews number (MathSciNet)
MR3466179

Zentralblatt MATH identifier
1333.62175

Citation

Bradic, Jelena. Randomized maximum-contrast selection: Subagging for large-scale regression. Electron. J. Statist. 10 (2016), no. 1, 121--170. doi:10.1214/15-EJS1085. https://projecteuclid.org/euclid.ejs/1455715959.


Export citation

References

  • A. B. Antognini and S. Giannerini. Generalized pólya urn designs with null balance., Journal of Applied Probability, 44(3):661–669, 2007. ISSN 00219002.
  • F. Bach. Bolasso: model consistent lasso estimation through the bootstrap., CoRR, abs /0804.1302, 2008.
  • M. Banerjee and T. Richardson. Exchangeable bernoulli random variables and bayes? Postulate., Electron. J. Statist., 7 :2193–2208, 2013.
  • H. Battey, J. Fan, H. Liu, J. Lu, and Z. Zhu. Distributed Estimation and Inference with Statistical Guarantees., ArXiv e-prints, September 2015.
  • R. Bhatia., Matrix analysis, volume 169 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1997. ISBN 0-387-94846-5.
  • P. J. Bickel, F. Götze, and W. R. van Zwet. Resampling fewer than $n$ observations: gains, losses, and remedies for losses., Statist. Sinica, 7(1):1–31, 1997. ISSN 1017-0405. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).
  • P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector., Ann. Statist., 37(4) :1705–1732, 2009. ISSN 0090-5364.
  • L. Breiman. Bagging predictors., Machine Learning, 24(2):123–140, 1996. ISSN 0885-6125.
  • P. Bühlmann and B. Yu. Analyzing bagging., Ann. Statist., 30(4):927–961, 2002. ISSN 0090-5364.
  • F. Bunea., Consistent selection via the Lasso for high dimensional approximating regression models, volume Volume 3 of Collections, pages 122–137. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008.
  • L. Comminges and A. S. Dalalyan. Tight conditions for consistency of variable selection in the context of high dimensionality., Ann. Statist., 40(5) :2667–2696, 10 2012.
  • P. Diaconis and D. Freedman. Finite exchangeable sequences., Ann. Probab., 8(4):745–764, 08 1980.
  • B. Efron. Bootstrap methods: another look at the jackknife., Ann. Statist., 7(1):1–26, 1979. ISSN 0090-5364.
  • J. Fan, F. Han, and H. Liu. Challenges of Big Data Analysis., ArXiv e-prints, August 2013.
  • W. Fithian and T. Hastie. Local case-control sampling: Efficient subsampling in imbalanced data sets., The Annals of Statistics, 42(5) :1693–1724, 10 2014.
  • A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan. A scalable bootstrap for massive data., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4):795–816, 2014. ISSN 1467-9868.
  • L. Kontorovich and K. Ramanan. Concentration inequalities for dependent random variables via the martingale method., Ann. Probab., 36(6) :2126–2158, 11 2008. URL http://dx.doi.org/10.1214/07-AOP384.
  • B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection., Ann. Statist., 28(5) :1302–1338, 10 2000.
  • E. L. Lehmann and G. Casella., Theory of point estimation. Springer Texts in Statistics. Springer-Verlag, New York, second edition, 1998. ISBN 0-387-98502-6.
  • K. Lounici. Sup-norm convergence rate and sign concentration property of lasso and dantzig estimators., Electronic Journal of Statistics, 2:90–102, 2008.
  • K. Lounici, M. Pontil, S. van de Geer, and A. B. Tsybakov. Oracle inequalities and optimal inference under group sparsity., Ann. Statist., 39(4) :2164–2204, 08 2011.
  • R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pages 456–464, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. ISBN 1-932432-65-5.
  • N. Meinshausen and P. Bühlmann. Stability selection., J. R. Stat. Soc. Ser. B Stat. Methodol., 72(4):417–473, 2010. ISSN 1369-7412.
  • S. Minsker. Geometric Median and Robust Estimation in Banach Spaces., ArXiv e-prints, August 2013.
  • S. Minsker, S. Srivastava, L. Lin, and D. B. Dunson. Robust and Scalable Bayes via a Median of Subset Posterior Measures., ArXiv e-prints, March 2014.
  • R. Permantle and Y. Peres. Concentration of lipschitz functionals of determinantal and other strong rayleigh measures., Combinatorics, Probability and Computing, 23:140–160, 1 2014. ISSN 1469-2163. URL http://journals. cambridge.org/article_S0963548313000345.
  • D. N. Politis, J. P. Romano, and M. Wolf. On the asymptotic theory of subsampling., Statist. Sinica, 11(4) :1105–1124, 2001.
  • J. Præstgaard and J. A. Wellner. Exchangeably weighted bootstraps of the general empirical process., Ann. Probab., 21(4) :2053–2086, 1993. ISSN 0091-1798.
  • G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for correlated Gaussian designs., J. Mach. Learn. Res., 11 :2241–2259, 2010. ISSN 1532-4435.
  • R. Samworth. A note on methods of restoring consistency to the bootstrap., Biometrika, 90(4):985–990, 2003.
  • R. D. Shah and R. J. Samworth. Variable selection with error control: another look at stability selection., J. R. Stat. Soc. Ser. B. Stat. Methodol., 75(1):55–80, 2013. ISSN 1369-7412.
  • A. B. Tsybakov., Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009. ISBN 978-0-387-79051-0. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
  • S. A. van de Geer and P. Bühlmann. On the conditions used to prove oracle results for the Lasso., Electron. J. Stat., 3 :1360–1392, 2009. ISSN 1935–7524.
  • X. Wang, P. Peng, and D. B Dunson. Median selection subset aggregation for parallel inference. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2195–2203. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/5328-median-selection-subset-aggregation-for- parallel-inference.pdf.
  • Y. Zhang, J. C. Duchi, and M. Wainwright. Comunication-Efficient Algorithms for Statistical Optimization., ArXiv e-prints, September 2012.
  • Y. Zhang, J. C. Duchi, and M. J. Wainwright. Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates., ArXiv e-prints, May 2013.