The Annals of Statistics

Maximin effects in inhomogeneous large-scale data

Nicolai Meinshausen and Peter Bühlmann

Full-text: Open access

Abstract

Large-scale data are often characterized by some degree of inhomogeneity as data are either recorded in different time regimes or taken from multiple sources. We look at regression models and the effect of randomly changing coefficients, where the change is either smoothly in time or some other dimension or even without any such structure. Fitting varying-coefficient models or mixture models can be appropriate solutions but are computationally very demanding and often return more information than necessary. If we just ask for a model estimator that shows good predictive properties for all regimes of the data, then we are aiming for a simple linear model that is reliable for all possible subsets of the data. We propose the concept of “maximin effects” and a suitable estimator and look at its prediction accuracy from a theoretical point of view in a mixture model with known or unknown group structure. Under certain circumstances the estimator can be computed orders of magnitudes faster than standard penalized regression estimators, making computations on large-scale data feasible. Empirical examples complement the novel methodology and theory.

Article information

Source
Ann. Statist., Volume 43, Number 4 (2015), 1801-1830.

Dates
Received: June 2014
Revised: November 2014
First available in Project Euclid: 17 June 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1434546223

Digital Object Identifier
doi:10.1214/15-AOS1325

Mathematical Reviews number (MathSciNet)
MR3357879

Zentralblatt MATH identifier
1317.62059

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators

Keywords
Mixture models regularization big data aggregation robustness

Citation

Meinshausen, Nicolai; Bühlmann, Peter. Maximin effects in inhomogeneous large-scale data. Ann. Statist. 43 (2015), no. 4, 1801--1830. doi:10.1214/15-AOS1325. https://projecteuclid.org/euclid.aos/1434546223


Export citation

References

  • Aitkin, M. and Rubin, D. B. (1985). Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 47 67–75.
  • Audibert, J.-Y., Bubeck, S. and Lugosi, G. (2014). Regret in online combinatorial optimization. Math. Oper. Res. 39 31–45.
  • Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235–256.
  • Bach, F. R. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In Proceedings of the 25th International Conference on Machine Learning 33–40. ACM, New York.
  • Bartlett, P. L., Dani, V., Hayes, T., Kakade, S., Rakhlin, A. and Tewari, A. (2008). High-probability regret bounds for bandit online linear optimization. Technical report, Univ. California, Berkeley, CA.
  • Bradic, J. (2013). Efficient support recovery via weighted maximum-contrast subagging. Preprint. Available at arXiv:1306.3494.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95 888–902.
  • Carlstein, E. G., Müller, H. G. and Siegmund, D. (1994). Change-Point Problems. IMS, Hayward, CA.
  • Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. 39 1–38.
  • Eldar, Y. C., Ben-Tal, A. and Nemirovski, A. (2004). Linear minimax regret estimation of deterministic parameters with bounded data uncertainties. IEEE Trans. Signal Process. 52 2177–2188.
  • Fan, J. and Zhang, W. (1999). Statistical estimation in varying coefficient models. Ann. Statist. 27 1491–1518.
  • Figueiredo, M. A. T. and Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24 381–396.
  • Foster, D. P. and Vohra, R. (1999). Regret in the on-line decision problem. Games Econom. Behav. 29 7–35.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2009). glmnet: Lasso and elastic-net regularized generalized linear models. R Package Version 1.
  • Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107–2143.
  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
  • Hand, D. J. (2006). Classifier technology and the illusion of progress. Statist. Sci. 21 1–34.
  • Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 55 757–796.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
  • Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799–821.
  • Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S. and Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics 272–280. ACM, New York.
  • Kolar, M., Song, L., Ahmed, A. and Xing, E. P. (2010). Estimating time-varying networks. Ann. Appl. Stat. 4 94–123.
  • Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • Lecué, G. and Mendelson, S. (2015). Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. To appear.
  • McCulloch, C. E. (2006). Generalized Linear Mixed Models. Wiley, New York.
  • McCulloch, C. E. and Neuhaus, J. (2001). Generalized Linear Mixed Models. Wiley, New York.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • Pinheiro, J. C. and Bates, D. M. (2000). Linear Mixed-Effects Models: Basic Concepts and Examples. Springer, Berlin.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • Wald, A. (1945). Statistical decision functions which minimize the maximum risk. Ann. of Math. (2) 46 265–280.
  • Zinkevich, M., Johanson, M., Bowling, M. and Piccione, C. (2007). Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 1729–1736. NIPS.