The Annals of Applied Statistics

On multi-view learning with additive models

Mark Culp, George Michailidis, and Kjell Johnson

Full-text: Open access


In many scientific settings data can be naturally partitioned into variable groupings called views. Common examples include environmental (1st view) and genetic information (2nd view) in ecological applications, chemical (1st view) and biological (2nd view) data in drug discovery. Multi-view data also occur in text analysis and proteomics applications where one view consists of a graph with observations as the vertices and a weighted measure of pairwise similarity between observations as the edges. Further, in several of these applications the observations can be partitioned into two sets, one where the response is observed (labeled) and the other where the response is not (unlabeled). The problem for simultaneously addressing viewed data and incorporating unlabeled observations in training is referred to as multi-view transductive learning. In this work we introduce and study a comprehensive generalized fixed point additive modeling framework for multi-view transductive learning, where any view is represented by a linear smoother. The problem of view selection is discussed using a generalized Akaike Information Criterion, which provides an approach for testing the contribution of each view. An efficient implementation is provided for fitting these models with both backfitting and local-scoring type algorithms adjusted to semi-supervised graph-based learning. The proposed technique is assessed on both synthetic and real data sets and is shown to be competitive to state-of-the-art co-training and graph-based techniques.

Article information

Ann. Appl. Stat., Volume 3, Number 1 (2009), 292-318.

First available in Project Euclid: 16 April 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Multi-view learning generalized additive model semi-supervised learning smoothing model selection


Culp, Mark; Michailidis, George; Johnson, Kjell. On multi-view learning with additive models. Ann. Appl. Stat. 3 (2009), no. 1, 292--318. doi:10.1214/08-AOAS202.

Export citation


  • Abney, S. (2002). Bootstrapping. In Association for Computational Linguistics 360–367.
  • Abney, S. (2004). Understanding the Yarowsky algorithm. Comput. Linguist. 30 365–395.
  • Blum, A. and Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In International Conference on Machine Learning 19–26.
  • Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference in Computational Learning Theory (Madison, WI, 1998). 92100 (electronic). ACM, New York.
  • Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth Advanced Book and Software, Belmont, CA.
  • Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 543–555.
  • Chapelle, O., Schölkopf, B. and Zien, A., eds. (2006). Semi-Supervised Learning. MIT Press, Cambridge, MA. Available at
  • Chapelle, O., Sindhwani, V. and Keerthi, S. (2008). Optimization techniques for semi-supervised support vector machines. J. Machine Learning Research 9 203–233.
  • Cohen, J. (1960). A coefficient of agreement for nominal data. Education and Psychological Measurement 20 37–46.
  • Culp, M. and Michailidis, G. (2008a). An iterative algorithm for extending learners to a semi-supervised setting. J. Comput. Graph. Statist. 17 1–27.
  • Culp, M. and Michailidis, G. (2008b). Graph-based semisupervised learning. IEEE Trans. Pattern Analysis and Machine Intelligence 30 174–179.
  • Fox, S., Farr-Jones, S., Sopchak, L., Boggs, A., Wang, H., Khoury, R. and Biros, M. (1959). High-throughput screening: Update on practices and success. J. Biomolecular Screening 11 864–869.
  • Friedman, J. (1991). Multivariate adaptive regression splines (with discussion and a rejoinder by the author). Ann. Statist. 19 1–141.
  • Gould, R. (1998). Graph Theory. Benjamin/Cummings, Merlo Park, CA.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability, Vol. 43. Chapman and Hall, London.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning (Data Mining, Inference and Prediction). Springer, New York.
  • Hunter, W. (1995). Rational drug design: A multidiciplinary approach. Molecular Medical Today 1 31–34.
  • Joachims, T. (2003). Transductive learning via spectral graph partitioning. In International Conference on Machine Learning 290–297.
  • Johnson, R. and Zhang, T. (2007). On the effectiveness of the Laplacian normalization for graph semi-supervised learning. J. Mach. Learn. Res. 8 1489–1517 (electronic).
  • Kakutani, S. (1941). A generalization of Brouwer’s fixed point theorem. Duke Math. J. 8 457–459.
  • Kansy, M., Senner, F. and Gubemator, K. (2001). Physicoshemical high throughput screening: Parallel artificial membrane permeation assay in the description of passive absorption process. J. Medicinal Chemistry 44 923–930.
  • Krishnapuram, B., Williams, D., Xue, Y., Hartemink, A., Carin, L. and Figueiredo, M. (2005). On semi-supervised classification. In Advances in NIPS 721–728.
  • Lafferty, J. and Wasserman, L. (2007). Statistical analysis of semi-supervised regression. In Advances in NIPS 801–808.
  • Leach, A. and Gillet, V., eds (2003). An Introduction to Chemoinformatics. Kluwer Academic, London.
  • De Leeuw, J. (1994). Block relaxation algorithms in statistics. In Information Systems and Data Analysis (H. H. Bock, W. Lenski and M. M. Richter, eds.) 308–325. Springer, Berlin.
  • Lundblad, R. (2004). Chemical Reagents for Protein Modification, 3rd ed. CRC Press, Boca Raton, FL.
  • McCallum, A., Nigam, K., Rennie, J. and Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval J. 3 127–163.
  • Mevik, B. and Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. J. Statist. Software 18 1–24.
  • Nabieva, E., Jim, K., Agarwal, A., Chazelle, B. and Singh, M. (2005). Whole-proteome prediction of protein function via graph-theoretic analysis of interaction map. Bioinformatics 21 302–310.
  • Neville, J. and Jensen, D. (2005). Leveraging relational autocorrelation with latent group models. In Proceedings of Multi-Relational Data Mining 49–55.
  • Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
  • Wang, F. and Zhang, C. (2006). Label propagation through linear neighborhoods. In International Conference on Machine Learning 985–992.
  • Yamanishi, Y., Vert, J. and Kanehisa, M. (2004). Protein network inference from multiple genomic data: A supervised approach. Bioinformatics 20 363–370.
  • Zhu, X. (2005). Semi-supervised learning with graphs. Technical report, Carnegie Mellon Univ., Pittsburgh, PA.
  • Zhu, X. (2007). Semi-supervised learning literature survey. Technical report, Computer Sciences, Univ. Wisconsin–Madison.
  • Zhu, X., Ghahramani, Z. and Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In International Conference on Machine Learning 912–919.