The Annals of Applied Statistics

Single stage prediction with embedded topic modeling of online reviews for mobile app management

Shawn Mankad, Shengli Hu, and Anandasivam Gopal

Full-text: Open access


Mobile apps are one of the building blocks of the mobile digital economy. A differentiating feature of mobile apps to traditional enterprise software is online reviews, which are available on app marketplaces and represent a valuable source of consumer feedback on the app. We create a supervised topic modeling approach for app developers to use mobile reviews as useful sources of quality and customer feedback, thereby complementing traditional software testing. The approach is based on a constrained matrix factorization that leverages the relationship between term frequency and a given response variable in addition to co-occurrences between terms to recover topics that are both predictive of consumer sentiment and useful for understanding the underlying textual themes. The factorization is combined with ordinal regression to provide guidance from online reviews on a single app’s performance as well as systematically compare different apps over time for benchmarking of features and consumer sentiment. We apply our approach using a dataset of over 100,000 mobile reviews over several years for three of the most popular online travel agent apps from the iTunes and Google Play marketplaces.

Article information

Ann. Appl. Stat., Volume 12, Number 4 (2018), 2279-2311.

Received: July 2016
Revised: February 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Mobile apps online reviews text analysis topic modeling matrix factorization


Mankad, Shawn; Hu, Shengli; Gopal, Anandasivam. Single stage prediction with embedded topic modeling of online reviews for mobile app management. Ann. Appl. Stat. 12 (2018), no. 4, 2279--2311. doi:10.1214/18-AOAS1152.

Export citation


  • ABIresearch (2012). m-commerce growing to 24% of total e-commerce market value on back of smartphone adoption. Available at Accessed: 2016-06-18.
  • Abrahams, A. S., Fan, W., Wang, G. A., Zhang, Z. J. and Jiao, J. (2015). An integrated text analytic framework for product defect discovery. Prod. Oper. Manag. 24 975–990.
  • Agarwal, D. and Chen, B.-C. (2010). fLDA: Matrix factorization through latent Dirichlet allocation. In Proceedings of the Third ACM International Conference on Web Search and Data Mining 91–100. ACM, New York.
  • Anthes, G. (2011). Invasion of the mobile apps. Commun. ACM 54 16–18.
  • Archer, K. J. and Williams, A. A. A. (2012). $\mathrm{L}_{1}$ penalized continuation ratio models for ordinal response prediction using high-dimensional datasets. Stat. Med. 31 1464–1474.
  • Armstrong, B. G. and Sloan, M. (1989). Ordinal regression models for epidemiologic data. Am. J. Epidemiol. 129 191–204.
  • Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y. and Zhu, M. (2013). A practical algorithm for topic modeling with provable guarantees. In International Conference on Machine Learning 280–288.
  • Bavota, G. (2016). Mining unstructured data in software repositories: Current and future trends. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 5 1–12. IEEE, Los Alamitos, CA.
  • Bertsekas, D. P. (1976). On the Goldstein–Levitin–Polyak gradient projection method. IEEE Trans. Automat. Control 21 174–184.
  • Bertsekas, D. P. (1999). Nonlinear Programming, 2nd ed. Athena Scientific, Belmont, MA.
  • Blei, D. M. (2012). Probabilistic topic models. Commun. ACM 55 77–84.
  • Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal. 1 121–143.
  • Blei, D. M. and Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning 113–120. ACM, New York.
  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
  • Boyd-Graber, J., Mimno, D. and Newman, D. (2015). Care and feeding of topic models: Problems, diagnostics, and improvements. In Handbook of Mixed Membership Models and Their Applications. 225–254. CRC Press, Boca Raton, FL.
  • Brody, S. and Elhadad, N. (2010). An unsupervised aspect-sentiment model for online reviews. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics 804–812. Association for Computational Linguistics, Stroudsburg, PA.
  • Burtch, G. and Hong, Y. (2014). What happens when word of mouth goes mobile? In Proceedings of the International Conference on Information Systems.
  • Büschken, J. and Allenby, G. M. (2016). Sentence-based text analysis for customer reviews. Mark. Sci. 35 953–975.
  • Cao, Q., Duan, W. and Gan, Q. (2011). Exploring determinants of voting for the helpfulness of online user reviews: A text mining approach. Decision Support Syst. 50 511–521.
  • Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. R package version 1.3.2.
  • Chen, Z. and Lurie, N. H. (2013). Temporal contiguity and negativity bias in the impact of online word of mouth. J. Mark. Res. 50 463–476.
  • Chen, N., Lin, J., Hoi, S. C., Xiao, X. and Zhang, B. (2014). AR-Miner: Mining informative reviews for developers from mobile app marketplace. In Proceedings of the 36th International Conference on Software Engineering 767–778. ACM, New York.
  • Cox, C. (1988). Multinomial regression models based on continuation ratios. Stat. Med. 7 435–441.
  • Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W. and Harshman, R. A. (1990). Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41 391–407.
  • Ding, C., Li, T. and Jordan, M. I. (2010). Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32 45–55.
  • Ding, C., Li, T. and Peng, W. (2008). On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Statist. Data Anal. 52 3913–3927.
  • Feinerer, I., Hornik, K. and Meyer, D. (2008). Text mining infrastructure in R. J. Stat. Softw. 25 Art. ID 5.
  • Fienberg, S. E. (2007). The Analysis of Cross-Classified Categorical Data. Springer, New York.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 Art. ID 1.
  • Fu, B., Lin, J., Li, L., Faloutsos, C., Hong, J. and Sadeh, N. (2013). Why people hate your app: Making sense of user feedback in a mobile app store. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1276–1284. ACM, New York.
  • Galvis Carreño, L. V. and Winbladh, K. (2013). Analysis of user comments: An approach for software requirements evolution. In Proceedings of the 2013 International Conference on Software Engineering 582–591. IEEE Press, Los Alamitos, CA.
  • Godes, D. and Mayzlin, D. (2004). Using online conversations to study word-of-mouth communication. Mark. Sci. 23 545–560.
  • Grün, B. and Hornik, K. (2011). topicmodels: An R package for fitting topic models. J. Stat. Softw. 40 Art. ID 11.
  • Han, H. J., Mankad, S., Gavirneni, S. and Verma, R. (2016). What guests really think of your hotel: Text analytics of online customer reviews. Cornell Hospitality Report, Cornell Univ., Ithaca, NY.
  • Harrell, F. (2015). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer, New York.
  • Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 50–57. ACM, New York.
  • Holmes, D. I. (1985). The analysis of literary style—A review. J. Roy. Statist. Soc. Ser. A 14 328–341.
  • Iacob, C. and Harrison, R. (2013). Retrieving and analyzing mobile apps feature requests from online reviews. In 2013 10th IEEE Working Conference on Mining Software Repositories (MSR) 41–44. IEEE, Los Alamitos, CA.
  • Ickin, S., Wac, K., Fiedler, M., Janowski, L., Hong, J.-H. and Dey, A. K. (2012). Factors influencing quality of experience of commonly used mobile applications. IEEE Commun. Mag. 50 48–56.
  • Jo, Y. and Oh, A. H. (2011). Aspect and sentiment unification model for online review analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 815–824. ACM, New York.
  • Kan, S. H., Basili, V. R. and Shapiro, L. N. (1994). Software quality: An overview from the perspective of total quality management. IBM Syst. J. 33 4–19.
  • Krishnan, M. S., Kriebel, C. H., Kekre, S. and Mukhopadhyay, T. (2000). An empirical analysis of productivity and quality in software products. Manage. Sci. 46 745–759.
  • Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401 788–791.
  • Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 556–562.
  • Li, X. and Hitt, L. M. (2008). Self-selection and information role of online product reviews. Inf. Syst. Res. 19 456–474.
  • Lim, S. L., Bentley, P. J., Kanakam, N., Ishikawa, F. and Honiden, S. (2015). Investigating country differences in mobile app user behavior and challenges for software engineering. IEEE Trans. Softw. Eng. 41 40–64.
  • Lin, C.-J. (2007). Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19 2756–2779.
  • Lu, Y., Mei, Q. and Zhai, C. (2011). Investigating task performance of probabilistic topic models: An empirical study of PLSA and LDA. Inf. Retr. 14 178–203.
  • Lu, B., Ott, M., Cardie, C. and Tsou, B. K. (2011). Multi-aspect sentiment analysis with topic models. In 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW) 81–88. IEEE, Los Alamitos, CA.
  • Maalej, W. and Nabil, H. (2015). Bug report, feature request, or simply praise? On automatically classifying app reviews. In 2015 IEEE 23rd International Requirements Engineering Conference (RE) 116–125. IEEE, Los Alamitos, CA.
  • Mankad, S., Hu, S. and Gopal, A. (2018). Supplement to “Single stage prediction with embedded topic modeling of online reviews for mobile app management.” DOI:10.1214/18-AOAS1152SUPP.
  • Mankad, S. and Michailidis, G. (2013). Discovery of path-important nodes using structured semi-nonnegative matrix factorization. In 2013 IEEE 5th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 288–291.
  • Mankad, S. and Michailidis, G. (2015). Analysis of multiview legislative networks with structured matrix factorization: Does Twitter influence translate to the real world? Ann. Appl. Stat. 9 1950–1972.
  • Mankad, S., Han, H. J., Goh, J. and Gavirneni, S. (2016). Understanding online hotel reviews through automated text analysis. Serv. Sci. 8 124–138.
  • McAuley, J. and Leskovec, J. (2013). Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems 165–172. ACM, New York.
  • McAuley, J., Leskovec, J. and Jurafsky, D. (2012). Learning attitudes and attributes from multi-aspect reviews. In 2012 IEEE 12th International Conference on Data Mining (ICDM) 1020–1025. IEEE, Los Alamitos, CA.
  • Mcauliffe, J. D. and Blei, D. M. (2008). Supervised topic models. In Advances in Neural Information Processing Systems 121–128.
  • McCullagh, P. (1980). Regression models for ordinal data. J. Roy. Statist. Soc. Ser. B 42 109–142.
  • Mimno, D., Wallach, H. M., Talley, E., Leenders, M. and McCallum, A. (2011). Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing 262–272. Association for Computational Linguistics, Stroudsburg, PA.
  • Mobile Business Insights (2016). Mobile commerce trends: Retail in 2017, 2018 and beyond. Available at Accessed: 2017-04-05.
  • O’Callaghan, D., Greene, D., Carthy, J. and Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42 5645–5657.
  • Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D. and De Lucia, A. (2013). How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13) 522–531. IEEE Press, Piscataway, NJ.
  • Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G. and Gall, H. C. (2015). How can I improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) 281–290. IEEE, Los Alamitos, CA.
  • Parasuraman, A., Zeithaml, V. A. and Berry, L. L. (1988). Servqual. J. Retail. 64 12–40.
  • Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P. and Welling, M. (2008). Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’08) 569–577. ACM, New York.
  • Pressman, R. S. (2005). Software Engineering: A Practitioner’s Approach. Palgrave Macmillan, New York.
  • Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for IDF. J. Doc. 60 503–520.
  • Salton, G. and Michael, J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill, New York.
  • Serrano, N., Hernantes, J. and Gallardo, G. (2013). Mobile web apps. IEEE Softw. 30 22–27.
  • Snyder, B. and Barzilay, R. (2007). Multiple aspect ranking using the good grief algorithm. In HLT-NAACL 300–307.
  • Taddy, M. (2013). Multinomial inverse regression for text analysis. J. Amer. Statist. Assoc. 108 755–770.
  • R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
  • Thomas, S. W., Nagappan, M., Blostein, D. and Hassan, A. E. (2013). The impact of classifier configuration and classifier combination on bug localization. IEEE Trans. Softw. Eng. 39 1427–1443.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tirunillai, S. and Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent Dirichlet allocation. J. Mark. Res. 51 463–479.
  • Titov, I. and McDonald, R. T. (2008a). A joint model of text and aspect ratings for sentiment summarization. In ACL 8 308–316.
  • Titov, I. and McDonald, R. (2008b). Modeling online reviews with multi-grain topic models. In Proceedings of the 17th International Conference on World Wide Web 111–120. ACM, New York.
  • Wang, C. and Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 448–456. ACM, New York.
  • Wang, H., Lu, Y. and Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 783–792. ACM, New York.
  • Wasserman, A. I. (2010). Software engineering issues for mobile application development. In Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research 397–400. ACM, New York.
  • Wu, Y. and Ester, M. (2015). Flame: A probabilistic model combining aspect based opinion mining and collaborative filtering. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining 199–208. ACM, New York.
  • Xu, W., Liu, X. and Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’03) 267–273. ACM, New York.

Supplemental materials