Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.
Ann. Appl. Stat.
5(4):
2603-2629
(December 2011).
DOI: 10.1214/11-AOAS486
Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404. MR51437 0037.20701 10.1090/S0002-9947-1950-0051437-7Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404. MR51437 0037.20701 10.1090/S0002-9947-1950-0051437-7
Carbo, R., Leyda, L. and Arnau, M. (1980). An electron density measure of the similarity between two compounds. International Journal of Quantum Chemistry 17 1185–1189.Carbo, R., Leyda, L. and Arnau, M. (1980). An electron density measure of the similarity between two compounds. International Journal of Quantum Chemistry 17 1185–1189.
Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. MR1239641 0799.62002Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. MR1239641 0799.62002
Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011a). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPA.Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011a). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPA.
Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011b). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPB.Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011b). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPB.
Dryden, I. L., Hirst, J. D. and Melville, J. L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics 63 237–251. MR2345594 10.1111/j.1541-0420.2006.00622.xDryden, I. L., Hirst, J. D. and Melville, J. L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics 63 237–251. MR2345594 10.1111/j.1541-0420.2006.00622.x
Green, P. J. and Mardia, K. V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93 235–254. MR2278080 1153.62020 10.1093/biomet/93.2.235Green, P. J. and Mardia, K. V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93 235–254. MR2278080 1153.62020 10.1093/biomet/93.2.235
Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–390. MR1294070 0798.62109 10.1080/01621459.1994.10476754Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–390. MR1294070 0798.62109 10.1080/01621459.1994.10476754
Kearsley, S. K. and Smith, G. M. (1990). An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlaps. Tetrahedron Computer Methodology 3 315–633.Kearsley, S. K. and Smith, G. M. (1990). An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlaps. Tetrahedron Computer Methodology 3 315–633.
Kenobi, K. and Dryden, I. L. (2010). Bayesian matching of unlabelled point sets using Procrustes and configuration models. Technical report, Univ. Nottingham. Available at arXiv:1009.3072v1. 1009.3072v1Kenobi, K. and Dryden, I. L. (2010). Bayesian matching of unlabelled point sets using Procrustes and configuration models. Technical report, Univ. Nottingham. Available at arXiv:1009.3072v1. 1009.3072v1
Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671–680. MR702485 10.1126/science.220.4598.671Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671–680. MR702485 10.1126/science.220.4598.671
Ruffieux, Y. and Green, P. J. (2009). Alignment of multiple configurations using hierarchical models. J. Comput. Graph. Statist. 18 756–773. MR2572636 10.1198/jcgs.2009.07048Ruffieux, Y. and Green, P. J. (2009). Alignment of multiple configurations using hierarchical models. J. Comput. Graph. Statist. 18 756–773. MR2572636 10.1198/jcgs.2009.07048
Schmidler, S. C. (2007). Fast Bayesian shape matching using geometric algorithms. In Bayesian Statistics 8 471–490. Oxford Univ. Press, Oxford. MR2433204 1252.62005Schmidler, S. C. (2007). Fast Bayesian shape matching using geometric algorithms. In Bayesian Statistics 8 471–490. Oxford Univ. Press, Oxford. MR2433204 1252.62005
Taylor, J. E. and Worsley, K. J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1–27. MR2387962 1144.62083 10.1214/009053607000000406 euclid.aos/1201877292
Taylor, J. E. and Worsley, K. J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1–27. MR2387962 1144.62083 10.1214/009053607000000406 euclid.aos/1201877292
Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244. MR148188 10.1080/01621459.1963.10500845Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244. MR148188 10.1080/01621459.1963.10500845
Worsley, K. J. (1994). Local maxima and the expected Euler characteristic of excursion sets of χ2, F and t fields. Adv. in Appl. Probab. 26 13–42. MR1260300 0797.60042 10.2307/1427576Worsley, K. J. (1994). Local maxima and the expected Euler characteristic of excursion sets of χ2, F and t fields. Adv. in Appl. Probab. 26 13–42. MR1260300 0797.60042 10.2307/1427576