Open Access
September 2011 Local kernel canonical correlation analysis with application to virtual drug screening
Daniel Samarov, J. S. Marron, Yufeng Liu, Christopher Grulke, Alexander Tropsha
Ann. Appl. Stat. 5(3): 2169-2196 (September 2011). DOI: 10.1214/11-AOAS472


Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested.

In this paper we propose several novel approaches to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and on a kernel-based extension. Spectral learning ideas motivate our proposed new method called Indefinite Kernel CCA (IKCCA). We show the strong performance of this approach both for a toy problem as well as using real world data with dramatic improvements in predictive accuracy of virtual screening over an existing methodology.


Download Citation

Daniel Samarov. J. S. Marron. Yufeng Liu. Christopher Grulke. Alexander Tropsha. "Local kernel canonical correlation analysis with application to virtual drug screening." Ann. Appl. Stat. 5 (3) 2169 - 2196, September 2011.


Published: September 2011
First available in Project Euclid: 13 October 2011

zbMATH: 1228.62072
MathSciNet: MR2884936
Digital Object Identifier: 10.1214/11-AOAS472

Keywords: canonical correlation analysis , drug discovery , indefinite kernels , kernel methods , virtual screening

Rights: Copyright © 2011 Institute of Mathematical Statistics

Vol.5 • No. 3 • September 2011
Back to Top