The Annals of Applied Statistics

Customized training with an application to mass spectrometric imaging of cancer tissue

Scott Powers, Trevor Hastie, and Robert Tibshirani

Full-text: Open access


We introduce a simple, interpretable strategy for making predictions on test data when the features of the test data are available at the time of model fitting. Our proposal—customized training—clusters the data to find training points close to each test point and then fits an $\ell_{1}$-regularized model (lasso) separately in each training cluster. This approach combines the local adaptivity of $k$-nearest neighbors with the interpretability of the lasso. Although we use the lasso for the model fitting, any supervised learning method can be applied to the customized training sets. We apply the method to a mass-spectrometric imaging data set from an ongoing collaboration in gastric cancer detection which demonstrates the power and interpretability of the technique. Our idea is simple but potentially useful in situations where the data have some underlying structure.

Article information

Ann. Appl. Stat., Volume 9, Number 4 (2015), 1709-1725.

Received: April 2015
Revised: July 2015
First available in Project Euclid: 28 January 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Transductive learning local regression classification clustering


Powers, Scott; Hastie, Trevor; Tibshirani, Robert. Customized training with an application to mass spectrometric imaging of cancer tissue. Ann. Appl. Stat. 9 (2015), no. 4, 1709--1725. doi:10.1214/15-AOAS866.

Export citation


  • Bache, K. and Lichman, M. (2013). UCI machine learning repository. Univ. California Irvine School of Information and Computer Science, Irvine, CA.
  • Bien, J. and Tibshirani, R. (2011). Hierarchical clustering with prototypes via minimax linkage. J. Amer. Statist. Assoc. 106 1075–1084.
  • Bottou, L. and Vapnik, V. (1992). Local learning algorithms. Neural Comput. 4 888–900.
  • Cortes, C. and Mohri, M. (2007). On transductive regression. In Advances in Neural Information Processing Systems 19. Vancouver, BC, Canada.
  • Eberlin, L. S., Tibshirani, R. J., Zhang, J., Longacre, T. A., Berry, G. J., Bingham, D. B., Norton, J. A., Zare, R. N. and Poultsides, G. A. (2014). Molecular assessment of surgical-resection margins of gastric cancer by mass-spectrometric imaging. Proc. Natl. Acad. Sci. USA 111 2436–2441.
  • Fu, Z., Robles-Kelly, A. and Zhou, J. (2010). Mixing linear SVMs for nonlinear classification. IEEE Trans. Neural Netw. 21 1963–1975.
  • Gil, D., Girela, J. L., De Juan, J., Gomez-Torres, M. J. and Johnsson, M. (2012). Predicting seminal quality with artificial intelligence methods. Expert Syst. Appl. 39 12564–12573.
  • Gu, Q. and Han, J. (2013). Clustered support vector machines. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics 307–315. Scottsdale, AZ. Available at DOI:10.1186/1475-925X-6-23.
  • Hamburg, M. A. and Collins, F. S. (2010). The path to personalized medicine. N. Engl. J. Med. 363 301–304.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6 181–214.
  • Kahraman, H. T., Sagiroglu, S. and Colak, I. (2013). The development of intuitive knowledge classifier and the modeling of domain dependent data. Knowledge-Based Systems 37 283–295.
  • Ladicky, L. and Torr, P. (2011). Locally linear support vector machines. In Proceedings of the 28th International Conference on Machine Learning 985–992. Bellevue, WA.
  • Little, M. A., McSharry, P. E., Roberts, S. J., Costello, D. A. and Moroz, I. M. (2007). Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 6 23.
  • Ma, T. M. (2012). Local and personalised modelling for renal medical decision support system. Ph.D. thesis, Auckland Univ. Technology.
  • Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R. and Consonni, V. (2013). Quantitative structure-activity relationship models for ready biodegradability of chemicals. J. Chem. Inf. Model. 53 867–878.
  • Shahbaba, B. and Neal, R. (2009). Nonlinear models using Dirichlet process mixtures. J. Mach. Learn. Res. 10 1829–1850.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Torgo, L. and DaCosta, J. P. (2003). Clustered partial linear regression. Mach. Learn. 50 303–319.
  • Tsanas, A., Little, M. A., Fox, C. and Ramig, L. O. (2014). Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22 181–190.
  • Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.
  • World Health Organization (2013). Cancer. WHO Fact Sheet No. 297. Available at
  • Wu, M. and Schölkopf, B. (2007). Transductive classification via local learning regularization. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics 628–635. San Juan, Puerto Rico.
  • Yu, K., Zhang, T. and Gong, Y. (2009). Nonlinear learning using local coordinate coding. Adv. Neural Inf. Process. Syst. 21 2223–2231.
  • Zhou, D., Bousquet, O., Lal, T. N., Weston, J. and Schölkopf, B. (2004). Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16 321–328.
  • Zhu, X. (2007). Semi-supervised learning literature survey. Technical Report No. 1530, Dept. Computer Science, Univ. Wisconsin-Madison, Madison, WI.
  • Zhu, J., Chen, N. and Xing, E. P. (2011). Infinite SVM: A Dirichlet process mixture of large-margin kernel machines. In Proceedings of the 28th International Conference on Machine Learning 617–624. Bellevue, WA.