Statistical Science

Flexible Low-Rank Statistical Modeling with Missing Data and Side Information

William Fithian and Rahul Mazumder

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion. While the above problems can be naturally posed as rank-constrained optimization problems, which are nonconvex and computationally difficult, we show how to relax them via generalized nuclear norm regularization to obtain convex optimization problems. We discuss algorithms drawing inspiration from modern convex optimization methods to address these large scale convex optimization computational tasks. Finally, we illustrate our flexible approach in problems arising in functional data reconstruction and ecological species distribution modeling.

Article information

Source
Statist. Sci., Volume 33, Number 2 (2018), 238-260.

Dates
First available in Project Euclid: 3 May 2018

Permanent link to this document
https://projecteuclid.org/euclid.ss/1525313144

Digital Object Identifier
doi:10.1214/18-STS642

Mathematical Reviews number (MathSciNet)
MR3797712

Zentralblatt MATH identifier
1397.62180

Keywords
Matrix completion nuclear norm regularization matrix factorization convex optimization missing data

Citation

Fithian, William; Mazumder, Rahul. Flexible Low-Rank Statistical Modeling with Missing Data and Side Information. Statist. Sci. 33 (2018), no. 2, 238--260. doi:10.1214/18-STS642. https://projecteuclid.org/euclid.ss/1525313144


Export citation

References

  • Abernethy, J., Bach, F., Evgeniou, T. and Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10 803–826.
  • Aggarwal, C. C. and Chen, B.-C. (2009). Regression-based latent factor models. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 19–28. ACM, New York.
  • Agarwal, D. K. and Chen, B.-C. (2015). Statistical Methods for Recommender Systems. Cambridge Univ. Press, Cambridge.
  • Agarwal, D., Zhang, L. and Mazumder, R. (2011). Modeling item–item similarities for personalized recommendations on Yahoo! front page. Ann. Appl. Stat. 5 1839–1875.
  • Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Stat. 22 327–351.
  • Angst, R., Zach, C. and Pollefeys, M. (2011). The generalized trace-norm and its application to structure-from-motion problems. In 2011 IEEE International Conference on Computer Vision (ICCV) 2502–2509. IEEE, Los Alamitos, CA.
  • Atchadé, Y. F., Mazumder, R. and Chen, J. (2015). Scalable computation of regularized precision matrices via stochastic optimization. Preprint. Available at arXiv:1509.00426.
  • Audigier, V., Husson, F. and Josse, J. (2016). A principal component method to impute missing values for mixed data. Adv. Data Anal. Classif. 10 5–26.
  • Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
  • Bell, R. M. and Koren, Y. (2007). Lessons from the Netflix Prize Challenge. ACM SIGKDD Explor. Newsl. 9 75–79.
  • Bennett, J. and Lanning, S. (2007). The Netflix Prize. In Proceedings of KDD Cup and Workshop 3–6. ACM New York.
  • Bertsekas, D. P. (1999). Nonlinear Programming, 2nd ed. Athena Scientific, Belmont, MA.
  • Bertsimas, D., Copenhaver, M. S. and Mazumder, R. (2017). Certifiably optimal low rank factor analysis. J. Mach. Learn. Res. 18 Paper No. 29.
  • Bottou, L. and Bousquet, O. (2008). The trade-offs of large scale learning. In Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. T. Roweis, eds.) 161–168. MIT Press, Cambridge, MA.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 1–122.
  • Burer, S. and Monteiro, R. D. C. (2005). Local minima and convergence in low-rank semidefinite programming. Math. Program. 103 427–444.
  • Cai, T. and Zhou, W.-X. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. J. Mach. Learn. Res. 14 3619–3647.
  • Candes, E. and Plan, Y. (2010). Matrix completion with noise. Proc. IEEE 98 925–936.
  • Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 56 2053–2080.
  • Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. ID 11.
  • Candès, E. J., Eldar, Y. C., Strohmer, T. and Voroninski, V. (2015). Phase retrieval via matrix completion [reprint of MR3032952]. SIAM Rev. 57 225–251.
  • Carpentier, A., Klopp, O., Löffler, M. and Nickl, R. (2016). Adaptive confidence sets for matrix completion. Preprint. Available at arXiv:1608.04861.
  • Chen, Y. and Wainwright, M. J. (2015). Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. Preprint. Available at arXiv:1509.03025.
  • Cottet, V. and Alquier, P. (2018). 1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation. Mach. Learn. 107 579–603.
  • Davenport, M. A., Plan, Y., van den Berg, E. and Wootters, M. (2014). 1-bit matrix completion. Inf. Inference 3 189–223.
  • de Leeuw, J. and van der Heijden, P. G. M. (1988). Correspondence analysis of incomplete contingency tables. Psychometrika 53 223–233.
  • Devolder, O., Glineur, F. and Nesterov, Y. (2014). First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146 37–75.
  • Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Stanford Univ.
  • Fithian, W. and Mazumder, R. (2013). Scalable convex methods for flexible low-rank matrix modeling. Preprint. Available at arXiv:1308.4211.
  • Fithian, W., Elith, J., Hastie, T. and Keith, D. A. (2015). Bias correction in species distribution models: Pooling survey and collection data for multiple species. Methods Ecol. Evol. 6 424–438.
  • Foygel, R. and Srebro, N. (2011). Concentration-based guarantees for low-rank matrix reconstruction. In COLT 315–340.
  • Frank, M. and Wolfe, P. (1956). An algorithm for quadratic programming. Nav. Res. Logist. Q. 3 95–110.
  • Freund, R. M., Grigas, P. and Mazumder, R. (2017). An extended Frank–Wolfe method with “in-face” directions, and its application to low-rank matrix completion. SIAM J. Optim. 27 319–346.
  • Gerrish, S. and Blei, D. M. (2011). Predicting legislative roll calls from text. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 489–496.
  • Golub, G. H. and Van Loan, C. F. (1983). Matrix Computations. Johns Hopkins Series in the Mathematical Sciences 3. Johns Hopkins Univ. Press, Baltimore, MD.
  • Hastie, T., Buja, A. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73–102.
  • Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. CRC Press, Boca Raton, FL.
  • Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015). Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 16 3367–3402.
  • Huber, P. J. (2011). Robust Statistics. Springer, New York.
  • Jaggi, M. and Sulovsk, M. (2010). A simple algorithm for nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) 471–478.
  • Jain, P., Netrapalli, P. and Sanghavi, S. (2013). Low-rank matrix completion using alternating minimization (extended abstract). In STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 665–674. ACM, New York.
  • Josse, J. and Husson, F. (2012). Handling missing values in exploratory multivariate data analysis methods. J. SFdS 153 79–99.
  • Josse, J., Wager, S. and Husson, F. (2016). Confidence areas for fixed-effects PCA. J. Comput. Graph. Statist. 25 28–48.
  • Journée, M., Bach, F., Absil, P.-A. and Sepulchre, R. (2010). Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim. 20 2327–2351.
  • Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from a few entries. IEEE Trans. Inform. Theory 56 2980–2998.
  • Klopp, O., Lafond, J., Moulines, É. and Salmon, J. (2015). Adaptive multinomial matrix completion. Electron. J. Stat. 9 2950–2975.
  • Koren, Y. (2010). Collaborative filtering with temporal dynamics. Commun. ACM 53 89–97.
  • Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer 42 30–37.
  • Lafond, J. (2015). Low rank matrix completion with exponential family noise. Preprint. Available at arXiv:1502.06919.
  • Larsen, R. M. (2004). PROPACK—Software for large and sparse SVD calculations. Available at http://sun.stanford.edu/~rmunk/PROPACK.
  • Lee, J., Recht, B., Srebro, N., Tropp, J. and Salakhutdinov, R. (2010). Practical large-scale optimization for max-norm regularization. In Advances in Neural Information Processing Systems 1297–1305.
  • Lesieur, T., Krzakala, F. and Zdeborová, L. (2015). MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) 680–687. IEEE, Los Alamitos, CA.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
  • Martin, A. D. and Quinn, K. M. (2002). Dynamic ideal point estimation via Markov chain Monte Carlo for the US Supreme Court, 1953–1999. Polit. Anal. 10 134–153.
  • Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287–2322.
  • Mazumder, R., Radchenko, P. and Dedieu, A. (2017). Subset selection with shrinkage: Sparse linear modeling when the SNR is low. Preprint. Available at arXiv:1708.03288.
  • Menon, A. K. and Elkan, C. (2010). A log-linear model with latent features for dyadic prediction. In 2010 IEEE 10th International Conference on Data Mining (ICDM) 364–373. IEEE, Los Alamitos, CA.
  • Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
  • Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 87. Kluwer Academic, Boston, MA.
  • Nesterov, Yu. (2005). Smooth minimization of non-smooth functions. Math. Program. 103 127–152.
  • Parker, J. T., Schniter, P. and Cevher, V. (2014a). Bilinear generalized approximate message passing—Part I: Derivation. IEEE Trans. Signal Process. 62 5839–5853.
  • Parker, J. T., Schniter, P. and Cevher, V. (2014b). Bilinear generalized approximate message passing—Part II: Applications. IEEE Trans. Signal Process. 62 5854–5867.
  • Reinsel, G. C. and Velu, R. P. (1998). Multivariate Reduced-Rank Regression: Theory and Applications. Lecture Notes in Statistics 136. Springer, New York.
  • Rennie, J. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In ICML.
  • Roweis, S. (1998). EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems 10 626–632. MIT Press, Cambridge, MA.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592. With comments by R. J. A. Little and a reply by the author.
  • Salakhutdinov, R. and Mnih, A. (2008a). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning 880–887. ACM, New York.
  • Salakhutdinov, R. and Mnih, A. (2008b). Probabilistic matrix factorization. In Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. T. Roweis, eds.) 1257–1264. MIT Press, Cambridge, MA.
  • Salakhutdinov, R. and Srebro, N. (2010). Collaborative filtering in a non-uniform world: Learning with the weighted trace norm. Preprint. Available at arXiv:1002.2780.
  • Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17 1329–1336. MIT Press, Cambridge, MA.
  • Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Learning Theory. Lecture Notes in Computer Science 3559 545–560. Springer, Berlin.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 611–622.
  • Todeschini, A., Caron, F. and Chavent, M. (2013). Probabilistic low-rank matrix completion with adaptive spectral regularization algorithms. In Advances in Neural Information Processing Systems 845–853.
  • Udell, M., Horn, C., Zadeh, R. and Boyd, S. (2016). Generalized low rank models. Found. Trends Mach. Learn. 9 1–118.
  • Yang, Y., Ma, J. and Osher, S. (2013). Seismic data reconstruction via matrix completion. Inverse Probl. Imaging 7 1379–1392.
  • Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Stat. Model. 3 15–41.
  • Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 329–346.