Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested.
In this paper we propose several novel approaches to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and on a kernel-based extension. Spectral learning ideas motivate our proposed new method called Indefinite Kernel CCA (IKCCA). We show the strong performance of this approach both for a toy problem as well as using real world data with dramatic improvements in predictive accuracy of virtual screening over an existing methodology.
"Local kernel canonical correlation analysis with application to virtual drug screening." Ann. Appl. Stat. 5 (3) 2169 - 2196, September 2011. https://doi.org/10.1214/11-AOAS472