Local kernel canonical correlation analysis with application to virtual drug screening

Daniel Samarov; J. S. Marron; Yufeng Liu; Christopher Grulke; Alexander Tropsha

doi:10.1214/11-AOAS472

September 2011 Local kernel canonical correlation analysis with application to virtual drug screening

Daniel Samarov, J. S. Marron, Yufeng Liu, Christopher Grulke, Alexander Tropsha

Ann. Appl. Stat. 5(3): 2169-2196 (September 2011). DOI: 10.1214/11-AOAS472

Abstract

Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested.

In this paper we propose several novel approaches to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and on a kernel-based extension. Spectral learning ideas motivate our proposed new method called Indefinite Kernel CCA (IKCCA). We show the strong performance of this approach both for a toy problem as well as using real world data with dramatic improvements in predictive accuracy of virtual screening over an existing methodology.

Citation

Download Citation

Daniel Samarov. J. S. Marron. Yufeng Liu. Christopher Grulke. Alexander Tropsha. "Local kernel canonical correlation analysis with application to virtual drug screening." Ann. Appl. Stat. 5 (3) 2169 - 2196, September 2011. https://doi.org/10.1214/11-AOAS472