The Annals of Applied Statistics

Weakly supervised clustering: Learning fine-grained signals from coarse labels

Stefan Wager, Alexander Blocker, and Niall Cardin

Full-text: Open access

Abstract

Consider a classification problem where we do not have access to labels for individual training examples, but only have average labels over subpopulations. We give practical examples of this setup and show how such a classification task can usefully be analyzed as a weakly supervised clustering problem. We propose three approaches to solving the weakly supervised clustering problem, including a latent variables model that performs well in our experiments. We illustrate our methods on an analysis of aggregated elections data and an industry data set that was the original motivation for this research.

Article information

Source
Ann. Appl. Stat., Volume 9, Number 2 (2015), 801-820.

Dates
Received: June 2014
Revised: February 2015
First available in Project Euclid: 20 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1437397112

Digital Object Identifier
doi:10.1214/15-AOAS812

Mathematical Reviews number (MathSciNet)
MR3371336

Zentralblatt MATH identifier
06499931

Keywords
Latent variables model uncertain class label

Citation

Wager, Stefan; Blocker, Alexander; Cardin, Niall. Weakly supervised clustering: Learning fine-grained signals from coarse labels. Ann. Appl. Stat. 9 (2015), no. 2, 801--820. doi:10.1214/15-AOAS812. https://projecteuclid.org/euclid.aoas/1437397112


Export citation

References

  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7 108–116.
  • Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
  • Bucklin, R. E. and Sismeiro, C. (2009). Click here for Internet insight: Advances in clickstream data analysis in marketing. J. Interact. Market 23 35–48.
  • Copas, J. B. (1988). Binary regression models for contaminated data. J. Roy. Statist. Soc. Ser. B 50 225–265.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Amer. Statist. Assoc. 78 316–331.
  • Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. CRC press, Boca Raton, FL.
  • Fraley, C., Raftery, A. E., Murphy, T. B. and Scrucca, L. (2012). MCLUST version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical report.
  • Gordon, A. D. (1999). Classification. Chapman & Hall, London.
  • Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42 177–196.
  • Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
  • Kück, H. and de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence 332–339. AUAI Press, Arlington, VA.
  • Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1–59.
  • Levy, S. (2011). In the Plex: How Google Thinks, Works, and Shapes Our Lives. Simon and Schuster, New York.
  • Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146 195–203.
  • Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling. Springer, New York.
  • Quadrianto, N., Smola, A. J., Caetano, T. S. and Le, Q. V. (2009). Estimating labels from label proportions. J. Mach. Learn. Res. 10 2349–2374.
  • Rueping, S. (2010). SVM classifier estimation from group probabilities. In Proceedings of the 27th International Conference on Machine Learning 911–918.
  • Sculley, D., Malkin, R. G., Basu, S. and Bayardo, R. J. (2009). Predicting bounce rates in sponsored search advertisements. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1325–1334. ACM, New York.
  • Surdeanu, M., Tibshirani, J., Nallapati, R. and Manning, C. D. (2012). Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning 455–465. Association for Computational Linguistics, Stroudsburg, PA.
  • Täckström, O. and McDonald, R. (2011a). Discovering fine-grained sentiment with latent variable structured prediction models. In Advances in Information Retrieval 368–374. Springer, Berlin.
  • Täckström, O. and McDonald, R. (2011b). Semi-supervised latent variable models for sentence-level sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Short Papers, Volume 2 569–574. Association for Computational Linguistics, Stroudsburg, PA.
  • Toutanova, K. and Johnson, M. (2007). A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In Advances in Neural Information Processing Systems 1521–1528. Curran Associates, Red Hook, NY.
  • van der Maaten, L., Chen, M., Tyree, S. and Weinberger, K. Q. (2013). Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning 410–418.
  • Wager, S., Wang, S. and Liang, P. (2013). Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems. Curran Associates, Red Hook, NY.
  • Xing, E. P., Jordan, M. I., Russell, S. and Ng, A. (2002). Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 505–512. Curran Associates, Red Hook, NY.
  • Xu, G., Yang, S.-H. and Li, H. (2009). Named entity mining from click-through data using weakly supervised latent Dirichlet allocation. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1365–1374. ACM, New York.
  • Yasui, Y., Pepe, M., Hsu, L., Adam, B.-L. and Feng, Z. (2004). Partially supervised learning using an EM-boosting algorithm. Biometrics 60 199–206.