Open Access
December 2011 Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment
Irina Czogiel, Ian L. Dryden, Christopher J. Brignell
Ann. Appl. Stat. 5(4): 2603-2629 (December 2011). DOI: 10.1214/11-AOAS486


Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.


Download Citation

Irina Czogiel. Ian L. Dryden. Christopher J. Brignell. "Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment." Ann. Appl. Stat. 5 (4) 2603 - 2629, December 2011.


Published: December 2011
First available in Project Euclid: 20 December 2011

zbMATH: 1234.62141
MathSciNet: MR2907128
Digital Object Identifier: 10.1214/11-AOAS486

Keywords: Bioinformatics , chemoinformatics , kriging , Markov chain Monte Carlo , Procrustes , ‎reproducing kernel Hilbert ‎space , shape , size , spatial , steroids

Rights: Copyright © 2011 Institute of Mathematical Statistics

Vol.5 • No. 4 • December 2011
Back to Top