The Annals of Statistics

Multilayer tensor factorization with applications to recommender systems

Xuan Bi, Annie Qu, and Xiaotong Shen

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Recommender systems have been widely adopted by electronic commerce and entertainment industries for individualized prediction and recommendation, which benefit consumers and improve business intelligence. In this article, we propose an innovative method, namely the recommendation engine of multilayers (REM), for tensor recommender systems. The proposed method utilizes the structure of a tensor response to integrate information from multiple modes, and creates an additional layer of nested latent factors to accommodate between-subjects dependency. One major advantage is that the proposed method is able to address the “cold-start” issue in the absence of information from new customers, new products or new contexts. Specifically, it provides more effective recommendations through sub-group information. To achieve scalable computation, we develop a new algorithm for the proposed method, which incorporates a maximum block improvement strategy into the cyclic blockwise-coordinate-descent algorithm. In theory, we investigate algorithmic properties for convergence from an arbitrary initial point and local convergence, along with the asymptotic consistency of estimated parameters. Finally, the proposed method is applied in simulations and IRI marketing data with 116 million observations of product sales. Numerical studies demonstrate that the proposed method outperforms existing competitors in the literature.

Article information

Ann. Statist., Volume 46, Number 6B (2018), 3308-3333.

Received: July 2017
Revised: September 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11]
Secondary: 90C26: Nonconvex programming, global optimization 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Cold-start problem context-aware recommender system maximum block improvement nonconvex optimization tensor completion


Bi, Xuan; Qu, Annie; Shen, Xiaotong. Multilayer tensor factorization with applications to recommender systems. Ann. Statist. 46 (2018), no. 6B, 3308--3333. doi:10.1214/17-AOS1659.

Export citation


  • [1] Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17 734–749.
  • [2] Adomavicius, G. and Tuzhilin, A. (2011). Context-aware recommender systems. In Recommender Systems Handbook 217–253. Springer, Berlin.
  • [3] Aswani, A. (2016). Low-rank approximation and completion of positive tensors. SIAM J. Matrix Anal. Appl. 37 1337–1364.
  • [4] Bhojanapalli, S. and Sanghavi, S. (2015). A new sampling technique for tensors. Preprint. Available at arXiv:1502.05023.
  • [5] Bi, X., Qu, A. and Shen, X. (2018). Supplement to “Multilayer tensor factorization with applications to recommender systems.” DOI:10.1214/17-AOS1659SUPP.
  • [6] Bi, X., Qu, A., Wang, J. and Shen, X. (2017). A group specific recommender system. J. Amer. Statist. Assoc. 112 1344–1353.
  • [7] Bobadilla, J., Ortega, F., Hernando, A. and Gutiérrez, A. (2013). Recommender systems survey. Knowl.-Based Syst. 46 109–132.
  • [8] Bronnenberg, B. J., Kruger, M. W. and Mela, C. F. (2008). Database paper—The IRI marketing data set. Mark. Sci. 27 745–748.
  • [9] Chen, B., He, S., Li, Z. and Zhang, S. (2012). Maximum block improvement and polynomial optimization. SIAM J. Optim. 22 87–107.
  • [10] Chi, E. C. and Kolda, T. G. (2012). On tensors, sparsity, and nonnegative factorizations. SIAM J. Matrix Anal. Appl. 33 1272–1299.
  • [11] Clausen, J. (1999). Branch and bound algorithms-principles and examples. Technical Report, Univ. Copenhagen.
  • [12] Colombo-Mendoza, L. O., Valencia-García, R., Rodríguez-González, A., Alor-Hernández, G. and Samper-Zapater, J. J. (2015). RecomMetz: A context-aware knowledge-based mobile recommender system for movie showtimes. Expert Syst. Appl. 42 1202–1222.
  • [13] de Silva, V. and Lim, L.-H. (2008). Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30 1084–1127.
  • [14] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer, Berlin.
  • [15] Fang, Y. and Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Comput. Statist. Data Anal. 56 468–477.
  • [16] Feuerverger, A., He, Y. and Khatri, S. (2012). Statistical significance of the Netflix challenge. Statist. Sci. 27 202–231.
  • [17] Forbes, P. and Zhu, M. (2011). Content-boosted matrix factorization for recommender systems: Experiments with recipe recommendation. In Proceedings of the Fifth ACM Conference on Recommender Systems 261–264. ACM, New York.
  • [18] Goldberg, K., Roeder, T., Gupta, D. and Perkins, C. (2001). Eigentaste: A constant time collaborative filtering algorithm. Inf. Retr. 4 133–151.
  • [19] Karatzoglou, A., Amatriain, X., Baltrunas, L. and Oliver, N. (2010). Multiverse recommendation: N-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the Fourth ACM Conference on Recommender Systems 79–86. ACM, New York.
  • [20] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455–500.
  • [21] Koren, Y. (2010). Collaborative filtering with temporal dynamics. Commun. ACM 53 89–97.
  • [22] Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18 95–138.
  • [23] Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • [24] Land, A. H. and Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica 28 497–520.
  • [25] Li, L. and Zhang, X. (2017). Parsimonious Tensor Response Regression. J. Amer. Statist. Assoc. 112 1131–1146.
  • [26] Li, Z., Suk, H.-I., Shen, D. and Li, L. (2016). Sparse multi-response tensor regression for Alzheimer’s disease study with multivariate clinical assessments. IEEE Trans. Med. Imag. 35 1927–1936.
  • [27] Li, Z., Uschmajew, A. and Zhang, S. (2015). On convergence of the maximum block improvement method. SIAM J. Optim. 25 210–233.
  • [28] Lombardi, S., Anand, S. S. and Gorgoglione, M. (2009). Context and customer behaviour in recommendation. In Workshop on Context-Aware Recommender Systems.
  • [29] Miranda, M., Zhu, H. and Ibrahim, J. G. (2015). TPRM: Tensor partition regression models with applications in imaging biomarker detection. Preprint. Available at arXiv:1505.05482.
  • [30] Nguyen, J. and Zhu, M. (2013). Content-boosted matrix factorization techniques for recommender systems. Stat. Anal. Data Min. 6 286–301.
  • [31] Nguyen, T. V., Karatzoglou, A. and Baltrunas, L. (2014). Gaussian process factorization machines for context-aware recommendations. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 63–72. ACM, New York.
  • [32] Paatero, P. (1999). The multilinear engine—a table-driven, least squares program for solving multilinear problems, including the $n$-way parallel factor analysis model. J. Comput. Graph. Statist. 8 854–888.
  • [33] Paatero, P. (2000). Construction and analysis of degenerate PARAFAC models. J. Chemom. 14 285–299.
  • [34] Palmisano, C., Tuzhilin, A. and Gorgoglione, M. (2008). Using context to improve predictive modeling of customers in personalization applications. IEEE Trans. Knowl. Data Eng. 20 1535–1549.
  • [35] Park, S.-T., Pennock, D., Madani, O., Good, N. and DeCoste, D. (2006). Naïve filterbots for robust cold-start recommendations. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 699–705.
  • [36] Rendle, S. (2012). Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3 57.
  • [37] Rendle, S., Gantner, Z., Freudenthaler, C. and Schmidt-Thieme, L. (2011). Fast context-aware recommendations with factorization machines. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval 635–644. ACM, New York.
  • [38] Salakhutdinov, R., Mnih, A. and Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning 791–798. ACM, New York.
  • [39] Shen, X. (1998). On the method of penalization. Statist. Sinica 8 337–357.
  • [40] Shen, X., Tseng, G. C., Zhang, X. and Wong, W. H. (2003). On $\psi$-learning. J. Amer. Statist. Assoc. 98 724–734.
  • [41] Shi, Y., Larson, M. and Hanjalic, A. (2014). Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Computing Surveys (CSUR) 47 3.
  • [42] Srebro, N., Alon, N. and Jaakkola, T. S. (2005). Generalization error bounds for collaborative prediction with low-rank matrices. In Advances in Neural Information Processing Systems 17 5–27.
  • [43] Verbert, K., Manouselis, N., Ochoa, X., Wolpers, M., Drachsler, H., Bosnic, I. and Duval, E. (2012). Context-aware recommender systems for learning: A survey and future challenges. IEEE Transactions on Learning Technologies 5 318–335.
  • [44] Wang, J. (2010). Consistent selection of the number of clusters via crossvalidation. Biometrika 97 893–904.
  • [45] Wang, P., Tsai, G. and Qu, A. (2012). Conditional inference functions for mixed-effects models with unspecified random-effects distribution. J. Amer. Statist. Assoc. 107 725–736.
  • [46] Welling, M. and Weber, M. (2001). Positive tensor factorization. Pattern Recogn. Lett. 22 1255–1261.
  • [47] Xiong, L., Chen, X., Huang, T.-K., Schneider, J. and Carbonell, J. G. (2010). Temporal collaborative filtering with Bayesian probabilistic tensor factorization. In Proceedings of the 2010 SIAM International Conference on Data Mining SIAM, Philadelphia, PA.
  • [48] Yuan, M. and Zhang, C.-H. (2016). Incoherent tensor norms and their applications in higher order tensor completion. Preprint. Available at arXiv:1606.03504.
  • [49] Yuan, M. and Zhang, C.-H. (2016). On tensor completion via nuclear norm minimization. Found. Comput. Math. 16 1031–1068.
  • [50] Zhou, H., Li, L. and Zhu, H. (2013). Tensor regression with applications in neuroimaging data analysis. J. Amer. Statist. Assoc. 108 540–552.
  • [51] Zhu, Y., Shen, X. and Ye, C. (2016). Personalized prediction and sparsity pursuit in latent factor models. J. Amer. Statist. Assoc. 111 241–252.

Supplemental materials

  • Supplement to “Multilayer tensor factorization with applications to recommender systems.”. Technical proof of all lemmas, propositions and theorems are provided in the supplementary material [5].