Annals of Applied Statistics

Active matrix factorization for surveys

Chelsea Zhang, Sean J. Taylor, Curtiss Cobb, and Jasjeet Sekhon

Full-text: Open access


Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampling procedure on real-world surveys as well as a Facebook survey experiment, we find active question selection achieves efficiency gains over baselines. The reduction in imputation error is heterogeneous across questions and depends on the latent concepts they capture. Modeling responses with the ordered logit likelihood improves imputations and yields an adaptive question order. We find for the Facebook survey that potential biases from order effects are likely to be small. With our method, survey researchers obtain principled suggestions of questions to retain and, if desired, can automate the design of shorter instruments.

Article information

Ann. Appl. Stat., Volume 14, Number 3 (2020), 1182-1206.

Received: June 2019
Revised: December 2019
First available in Project Euclid: 18 September 2020

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Active learning adaptive surveys matrix factorization multidimensional adaptive testing optimal design survey imputation


Zhang, Chelsea; Taylor, Sean J.; Cobb, Curtiss; Sekhon, Jasjeet. Active matrix factorization for surveys. Ann. Appl. Stat. 14 (2020), no. 3, 1182--1206. doi:10.1214/20-AOAS1322.

Export citation


  • Abadie, A., Diamond, A. and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. J. Amer. Statist. Assoc. 105 493–505.
  • Adams, R. P., Dahl, G. E. and Murray, I. (2010). Incorporating side information in probabilistic matrix factorization with Gaussian processes. Preprint. Available at arXiv:1003.4944.
  • Ansolabehere, S. and Jones, P. E. (2010). Constituents’ responses to congressional roll-call voting. Amer. J. Polit. Sci. 54 583–597.
  • Ansolabehere, S. and Schaffner, B. (2010). CCES common content, 2012. V3 [Version]. Available at (accessed June 2, 2014).
  • Athey, S., Bayati, M., Doudchenko, N., Imbens, G. and Khosravi, K. (2018). Matrix completion methods for causal panel data models Technical report, National Bureau of Economic Research, Cambridge, MA.
  • Attenberg, J. and Provost, F. (2011). Inactive learning?: Difficulties employing active learning in practice. ACM SIGKDD Explor. Newsl. 12 36–41.
  • Ben-Michael, E., Feller, A. and Rothstein, J. (2018). The augmented synthetic control method. Preprint. Available at arXiv:1811.04170.
  • Broockman, D. E., Kalla, J. L. and Sekhon, J. S. (2017). The design of field experiments with survey outcomes: A framework for selecting more efficient, robust, and ethical designs. Polit. Anal. 25 435–464.
  • Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • Cao, Y. and Xie, Y. (2015). Categorical matrix completion. In 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 369–372. IEEE, Piscataway, NJ.
  • Chakraborty, S., Zhou, J., Balasubramanian, V., Panchanathan, S., Davidson, I. and Ye, J. (2013). Active matrix completion. In 2013 IEEE 13th International Conference on Data Mining 81–90. IEEE, Piscataway, NJ.
  • Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10 273–304.
  • Chen, H., Felt, M.-H. and Huynh, K. P. (2017). Retail payment innovations and cash usage: Accounting for attrition by using refreshment samples. J. Roy. Statist. Soc. Ser. A 180 503–530.
  • Chiang, K.-Y., Hsieh, C.-J. and Dhillon, I. S. (2015). Matrix completion with noisy side information. In Advances in Neural Information Processing Systems 3447–3455.
  • Productivity Commission (2018). Superannuation: Assessing efficiency and competitiveness. Productivity Commission draft report overview. Published April.
  • Davenport, M. A., Plan, Y., van den Berg, E. and Wootters, M. (2014). 1-bit matrix completion. Inf. Inference 3 189–223.
  • Dillman, D. A., Sinclair, M. D. and Clark, J. R. (1993). Effects of questionnaire length, respondent-friendly design, and a difficult question on response rates for occupant-addressed census mail surveys. Public Opin. Q. 57 289–304.
  • Early, K., Mankoff, J. and Fienberg, S. E. (2017). Dynamic question ordering in online surveys. J. Off. Stat. 33 625–657.
  • Edwards, P., Roberts, I., Clarke, M., DiGuiseppi, C., Pratap, S., Wentz, R. and Kwan, I. (2002). Increasing response rates to postal questionnaires: Systematic review. BMJ 324 Art. ID 1183.
  • Elahi, M., Ricci, F. and Rubens, N. (2016). A survey of active learning in collaborative filtering recommender systems. Comput. Sci. Rev. 20 29–50.
  • Fan, W. and Yan, Z. (2010). Factors affecting response rates of the web survey: A systematic review. Comput. Hum. Behav. 26 132–139.
  • Fithian, W. and Mazumder, R. (2013). Flexible low-rank statistical modeling with side information. Preprint. Available at arXiv:1308.4211.
  • Fullerton, A. S. and Xu, J. (2012). The proportional odds with partial proportionality constraints model for ordinal response variables. Soc. Sci. Res. 41 182–198.
  • Gabry, J. and Goodrich, B. (2016). rstanarm: Bayesian applied regression modeling via Stan. R Package Version 2.
  • Garnett, R., Krishnamurthy, Y., Xiong, X., Schneider, J. and Mann, R. (2012). Bayesian optimal active search and surveying. Preprint. Available at arXiv:1206.6406.
  • Golbandi, N., Koren, Y. and Lempel, R. (2010). On bootstrapping recommender systems. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management 1805–1808. ACM, New York.
  • Gonzalez, J. M. and Eltinge, J. L. (2008). Adaptive matrix sampling for the consumer expenditure quarterly interview survey. In Proceedings of the Section on Survey Research Methods 2081–2088. American Statistical Association, Alexandria, VA.
  • Groves, R. M. (2011). Three eras of survey research. Public Opin. Q. 75 861–871.
  • Hastie, T., Mazumder, R., Lee, J. D. and Zadeh, R. (2015). Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 16 3367–3402.
  • Heberlein, T. A. and Baumgartner, R. (1978). Factors affecting response rates to mailed questionnaires: A quantitative analysis of the published literature. Am. Sociol. Rev. 447–462.
  • Herzog, A. R. and Bachman, J. G. (1981). Effects of questionnaire length on response quality. Public Opin. Q. 45 549–559.
  • Josse, J., Husson, F. et al. (2016). missMDA: A package for handling missing values in multivariate data analysis. J. Stat. Softw. 70 1–31.
  • Karimi, R., Freudenthaler, C., Nanopoulos, A. and Schmidt-Thieme, L. (2011b). Towards optimal active learning for matrix factorization in recommender systems. In 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence 1069–1076. IEEE, Piscataway, NJ.
  • Karimi, R., Freudenthaler, C., Nanopoulos, A. and Schmidt-Thieme, L. (2011a). Non-myopic active learning for recommender systems based on matrix factorization. In 2011 IEEE International Conference on Information Reuse & Integration 299–303. IEEE, Piscataway, NJ.
  • Klopp, O., Lafond, J., Moulines, É. and Salmon, J. (2015). Adaptive multinomial matrix completion. Electron. J. Stat. 9 2950–2975.
  • Kreuter, F. (2013). Facing the nonresponse challenge. Ann. Am. Acad. Polit. Soc. Sci. 645 23–35.
  • Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Appl. Cogn. Psychol. 5 213–236.
  • Künzel, S. R., Sekhon, J. S., Bickel, P. J. and Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. 116 4156–4165.
  • Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization with Gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning 601–608. ACM, New York.
  • Lim, Y. J. and Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD Cup and Workshop 7 15–21.
  • Lowell, D., Lipton, Z. C. and Wallace, B. C. (2019). Practical obstacles to deploying active learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP–IJCNLP) 21–30.
  • Marcus, B., Bosnjak, M., Lindner, S., Pilischenko, S. and Schütz, A. (2007). Compensating for low topic interest and long surveys: A field experiment on nonresponse in web surveys. Soc. Sci. Comput. Rev. 25 372–383.
  • Marlin, B. M. and Zemel, R. S. (2009). Collaborative prediction and ranking with non-random missing data. In Proceedings of the Third ACM Conference on Recommender Systems 5–12. ACM, New York.
  • Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287–2322.
  • Montgomery, J. M. and Cutler, J. (2013). Computerized adaptive testing for public opinion surveys. Polit. Anal. 21 172–192.
  • Mulder, J. and van der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika 74 273–296.
  • Munger, G. F. and Loyd, B. H. (1988). The use of multiple matrix sampling for survey research. J. Exp. Educ. 56 187–191.
  • Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
  • Porteous, I., Asuncion, A. and Welling, M. (2010). Bayesian matrix factorization with side information and Dirichlet process mixtures. In Twenty-Fourth AAAI Conference on Artificial Intelligence.
  • Recht, B. (2011). A simpler approach to matrix completion. J. Mach. Learn. Res. 12 3413–3430.
  • Reiter, J. P. and Raghunathan, T. E. (2007). The multiple adaptations of multiple imputation. J. Amer. Statist. Assoc. 102 1462–1471.
  • Rennie, J. D. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning 713–719. ACM, New York.
  • Roberts, C., Gilbert, E., Allum, N. and Eisner, L. (2019). Research synthesis: Satisficing in surveys: A systematic review of the literature. Public Opin. Q. 83 598–626.
  • Rubens, N. and Sugiyama, M. (2007). Influence-based collaborative active learning. In Proceedings of the 2007 ACM Conference on Recommender Systems 145–148. ACM, New York.
  • Rubin, D. B. (2004). Multiple Imputation for Nonresponse in Surveys. Wiley Classics Library. Wiley Interscience, Hoboken, NJ.
  • Rubinsteyn, A. and Feldman, S. (2016). fancyimpute: Multivariate imputation and matrix completion algorithms implemented in Python.
  • Salakhutdinov, R. and Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning 880–887. ACM, New York.
  • Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. and Joachims, T. (2016). Recommendations as treatments: Debiasing learning and evaluation. Preprint. Available at arXiv:1602.05352.
  • Seeger, M. and Bouchard, G. (2012). Fast variational Bayesian inference for non-conjugate matrix factorization models. In Artificial Intelligence and Statistics 1012–1018.
  • Segall, D. O. (2009). Principles of multidimensional adaptive testing. In Elements of Adaptive Testing 57–75. Springer, Berlin.
  • Sengupta, N., Srebro, N. and Evans, J. (2018). Simple surveys: Response retrieval inspired by recommendation systems. Preprint. Available at arXiv:1901.09659.
  • Settles, B. (2009). Active learning literature survey Technical report, Dept. Computer Sciences, Univ. Wisconsin-Madison.
  • Sheehan, K. B. (2001). E-mail survey response rates: A review. J. Comput.-Mediat. Commun. 6 JCMC621.
  • Silva, J. and Carin, L. (2012). Active learning for online Bayesian matrix factorization. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 325–333. ACM, New York.
  • Srebro, N., Rennie, J. and Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 1329–1336.
  • Srebro, N. and Salakhutdinov, R. R. (2010). Collaborative filtering in a non-uniform world: Learning with the weighted trace norm. In Advances in Neural Information Processing Systems 2056–2064.
  • Sutherland, D. J., Póczos, B. and Schneider, J. (2013). Active learning and search on low-rank matrices. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 212–220. ACM, New York.
  • Thomas, N., Raghunathan, T. E., Schenker, N., Katzoff, M. J. and Johnson, C. L. (2006). An evaluation of matrix sampling methods using data from the National Health and Nutrition Examination Survey. Surv. Methodol. 32 217–231.
  • Tourangeau, R., Kreuter, F. and Eckman, S. (2015). Motivated misreporting: Shaping answers to reduce survey burden. In Survey Measurements. Techniques, Data Quality and Sources of Error 24–41. Campus Verlag, Frankfurt.
  • Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D. and Blei, D. M. (2016). Edward: A library for probabilistic modeling, inference, and criticism. Preprint. Available at arXiv:1610.09787.
  • Udell, M. and Townsend, A. (2019). Why are big data matrices approximately low rank? SIAM J. Math. Data Sci. 1 144–160.
  • Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opin. Q. 74 223–243.
  • Xu, M., Jin, R. and Zhou, Z.-H. (2013). Speedup matrix completion with side information: Application to multi-label learning. In Advances in Neural Information Processing Systems 2301–2309.
  • Yammarino, F. J., Skinner, S. J. and Childers, T. L. (1991). Understanding mail survey response behavior a meta-analysis. Public Opin. Q. 55 613–639.
  • Zhang, C., Taylor, S. J., Cobb, C. and Sekhon, J. (2020). Supplement to “Active matrix factorization for surveys.”,
  • Zhou, T., Shan, H., Banerjee, A. and Sapiro, G. (2012). Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In Proceedings of the 2012 SIAM International Conference on Data Mining 403–414. SIAM, Philadelphia, PA.

Supplemental materials