The Annals of Applied Statistics

Modeling item–item similarities for personalized recommendations on Yahoo! front page

Deepak Agarwal, Liang Zhang, and Rahul Mazumder

Full-text: Open access

Abstract

We consider the problem of algorithmically recommending items to users on a Yahoo! front page module. Our approach is based on a novel multilevel hierarchical model that we refer to as a User Profile Model with Graphical Lasso (UPG). The UPG provides a personalized recommendation to users by simultaneously incorporating both user covariates and historical user interactions with items in a model based way. In fact, we build a per-item regression model based on a rich set of user covariates and estimate individual user affinity to items by introducing a latent random vector for each user. The vector random effects are assumed to be drawn from a prior with a precision matrix that measures residual partial associations among items. To ensure better estimates of a precision matrix in high-dimensions, the matrix elements are constrained through a Lasso penalty. Our model is fitted through a penalized-quasi likelihood procedure coupled with a scalable EM algorithm. We employ several computational strategies like multi-threading, conjugate gradients and heavily exploit problem structure to scale our computations in the E-step. For the M-step we take recourse to a scalable variant of the Graphical Lasso algorithm for covariance selection.

Through extensive experiments on a new data set obtained from Yahoo! front page and a benchmark data set from a movie recommender application, we show that our UPG model significantly improves performance compared to several state-of-the-art methods in the literature, esp"otherecially those based on a bilinear random effects model (BIRE). In particular, we show that the gains of UPG are significant compared to BIRE when the number of users is large and the number of items to select from is small. For large item sets and relatively small user sets the results of UPG and BIRE are comparable. The UPG leads to faster model building and produces outputs which are interpretable.

Article information

Source
Ann. Appl. Stat., Volume 5, Number 3 (2011), 1839-1875.

Dates
First available in Project Euclid: 13 October 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1318514287

Digital Object Identifier
doi:10.1214/11-AOAS475

Mathematical Reviews number (MathSciNet)
MR2884924

Zentralblatt MATH identifier
1231.62207

Keywords
Recommender systems collaborative filtering matrix factorization item–item similarities graphical lasso

Citation

Agarwal, Deepak; Zhang, Liang; Mazumder, Rahul. Modeling item–item similarities for personalized recommendations on Yahoo! front page. Ann. Appl. Stat. 5 (2011), no. 3, 1839--1875. doi:10.1214/11-AOAS475. https://projecteuclid.org/euclid.aoas/1318514287


Export citation

References

  • Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng. 17 734–749.
  • Agarwal, D. and Chen, B. (2009). Regression-based latent factor models. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 19–28. ACM, New York.
  • Agarwal, D., Chen, B. C. and Elango, P. (2010). Fast online learning through offline initialization for time-sensitive recommendation. In Knowledge Discovery and Data Mining Conference 703–712. ACM, New York.
  • Agarwal, D., Chen, B., Elango, P., Ramakrishnan, R., Motgi, N., Roy, S. and Zachariah, J. (2009). Online models for content optimization. Adv. Neural Inf. Process. Syst. 21 17–24.
  • Agarwal, D., Agrawal, R., Khanna, R. and Kota, N. (2010). Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 213–222. ACM, New York.
  • Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235–256.
  • Auer, P., Cesa-Bianchi, N., Freund, Y. and Shapire, R. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In FOCS 322–331. IEEE Computer Society, Washington, DC.
  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Bell, R., Koren, Y. and Volinsky, C. (2007a). Chasing 1,000,000: How we won the Netflix progress prize. ASA Statistical and Computing Graphics Newsletter 18 4–12.
  • Bell, R., Koren, Y. and Volinsky, C. (2007b). Modeling relationships at multiple scales to improve accuracy of large recommender systems. In KDD 95–104. ACM, New York.
  • Bennett, J. and Lanning, S. (2007). The netflix prize. In Proceedings of KDD Cup and Workshop 2007, San Jose, California 3–6. ACM, New York.
  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall, London.
  • Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24 179–195.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • Breese, J., Heckerman, D., Kadie, C. et al. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence 43–52. AUAI Press, Corvallis, OR.
  • Breslow, N. and Clayton, D. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Broder, A. (2008). Computational advertising and recommender systems. In Proceedings of the 2008 ACM Conference on Recommender Systems 1–2. ACM, New York.
  • Chakrabarti, D., Agarwal, D. and Josifovski, V. (2008). Contextual advertising by combining relevance with click feedback. In WWW’08: Proceedings of the 17th International Conference on World Wide Web 417–426. ACM.
  • Chen, Y., Pavlov, D. and Canny, J. (2009). Large-scale behavioral targeting. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 209–218. ACM, New York.
  • Cooley, R., Mobasher, B., Srivastava, J. et al. (1999). Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1 5–32.
  • Das, A., Datar, M., Garg, A. and Rajaram, S. (2007). Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web 271–280. ACM, New York.
  • Demmel, J. W. (1997). Applied Numerical Linear Algebra. SIAM, Philadelphia, PA.
  • Dempster, A. (1972). Covariance selection. Biometrics 28 157–175.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Fan, R., Chang, K., Hsieh, C., Wang, X. and Lin, C. (2008). LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 1871–1874.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge Univ. Press, Cambridge.
  • Goldberg, D, Nichols, D., Oki, B. M. and Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM 35 61–70.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Hestenes, M. R. and Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards 49 409–436.
  • Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 50–57. ACM, New York.
  • Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vaudoise Sci. Nat. 37 547–579.
  • Kendall, M. and Stuart, A. (1961). The Advanced Theory of Statistics, Vol. I, II, III. Griffin, London.
  • Kleinberg, R., Slivkins, A. and Upfal, E. (2008). Multi-armed bandits in metric spaces. In STOC’08 681–690. ACM, New York.
  • Koren, Y. (2010). Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD) 4 1–24.
  • Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer 42 30–37.
  • Kuk, A. Y. C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. Roy. Statist. Soc. Ser. B 57 395–407.
  • Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • Langford, J., Strehl, A. and Wortman, J. (2008). Exploration scavenging. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 528–535. ACM, New York.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • Lin, C.-J., Weng, R. C. and Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. J. Mach. Learn. Res. 9 627–650.
  • Linden, G., Smith, B. and York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7 76–80.
  • Mazumder, R., Agarwal, D. and Zhang, L. (2011). Block proximal point methods for large scale covariance selection. Technical report, Stanford Univ.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Nag, B. (2008). Vibes: A platform-centric approach to building recommender systems. IEEE Data Eng. Bulletin 31 23–31.
  • Pandey, S., Chakrabarti, D. and Agarwal, D. (2007). Multi-armed bandit problems with dependent arms. In ICML 721–728. ACM, New York.
  • Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer, New York.
  • Rennie, J. D. M. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05 713–719. ACM, New York.
  • Richardson, M., Dominowska, E. and Ragno, R. (2007). Predicting clicks: Estimating the click-through rate for new ads. In WWW’07: Proceedings of the 16th International Conference on World Wide Web 521–530. ACM, New York.
  • Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statist. Sci. 16 351–367.
  • Salakhutdinov, R., Mnih, A. and Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07 791–798. ACM, New York.
  • Salakhutdinov, R. and Mnih, A. (2008a). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning 880–887. ACM, New York.
  • Salakhutdinov, R. and Mnih, A. (2008b). Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20 1257–1264.
  • Sarkar, J. (1991). One-armed bandit problems with covariates. Ann. Statist. 19 1978–2002.
  • Sarwar, B., Karypis, G., Konstan, J. and Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th International Conference on World Wide Web 285–295. ACM, New York.
  • Srinivas, N., Krause, A., Kakade, S. M. and Seeger, M. (2009). Gaussian process bandits without regret: An experimental design approach. Computing Research Repository—CoRR. Available at arXiv:0912.3995v1.
  • Wang, J., de Vries, A. P. and Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 501–508. ACM, New York.
  • Wang, C.-C., Kulkarni, S. R. and Poor, H. V. (2005). Bandit problems with side observations. IEEE Trans. Automat. Control 50 338–355.
  • Yang, Y. and Zhu, D. (2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 100–121.