Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.
"Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment." Ann. Appl. Stat. 5 (4) 2603 - 2629, December 2011. https://doi.org/10.1214/11-AOAS486