Bernoulli

  • Bernoulli
  • Volume 19, Number 4 (2013), 1378-1390.

On statistics, computation and scalability

Michael I. Jordan

Full-text: Open access

Abstract

How should statistical procedures be designed so as to be scalable computationally to the massive datasets that are increasingly the norm? When coupled with the requirement that an answer to an inferential question be delivered within a certain time budget, this question has significant repercussions for the field of statistics. With the goal of identifying “time-data tradeoffs,” we investigate some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations.

Article information

Source
Bernoulli Volume 19, Number 4 (2013), 1378-1390.

Dates
First available in Project Euclid: 27 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.bj/1377612856

Digital Object Identifier
doi:10.3150/12-BEJSP17

Mathematical Reviews number (MathSciNet)
MR3102908

Zentralblatt MATH identifier
1273.62030

Citation

Jordan, Michael I. On statistics, computation and scalability. Bernoulli 19 (2013), no. 4, 1378--1390. doi:10.3150/12-BEJSP17. https://projecteuclid.org/euclid.bj/1377612856.


Export citation

References

  • Agarwal, A., Duchi, J., Bartlett, P. and Levrard, C. (2011). Oracle inequalities for computationally budgeted model selection. In 24th Annual Conference on Learning Theory, Budapest, Hungary.
  • Amini, A.A. and Wainwright, M.J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • Bickel, P.J., Götze, F. and van Zwet, W.R. (1997). Resampling fewer than $n$ observations: Gains, losses, and remedies for losses. Statist. Sinica 7 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).
  • Candès, E.J. and Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE 98 25–936.
  • Chandrasekaran, V. and Jordan, M.I. (2013). Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 13 E1181–E1190.
  • Chen, X. and Xie, M. (2012). A split-and-conquer approach for analysis of extraordinarily large data. Technical Report 2012-01, Dept. Statistics, Rutgers Univ.
  • Donoho, D.L. and Johnstone, I.M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879–921.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M.I. (2013). A scalable bootstrap for massive data. J. R. Stat. Soc. Ser. B Stat. Methodol. To appear.
  • Mackey, L., Talwalkar, A. and Jordan, M.I. (2012). Divide-and-conquer matrix factorization. Available at arXiv:1107.0789.
  • Politis, D.N., Romano, J.P. and Wolf, M. (1999). Subsampling. Springer Series in Statistics. New York: Springer.
  • Recht, B. (2011). A simpler approach to matrix completion. J. Mach. Learn. Res. 12 3413–3430.
  • Samworth, R. (2003). A note on methods of restoring consistency to the bootstrap. Biometrika 90 985–990.
  • Shalev-Shwartz, S., Shamir, O. and Tromer, E. (2012). Using more data to speed up training time. In Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands.
  • Vazirani, V. (2004). Approximation Algorithms. New York: Springer.