Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment

Irina Czogiel; Ian L. Dryden; Christopher J. Brignell

doi:10.1214/11-AOAS486

December 2011 Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment

Irina Czogiel, Ian L. Dryden, Christopher J. Brignell

Ann. Appl. Stat. 5(4): 2603-2629 (December 2011). DOI: 10.1214/11-AOAS486

Abstract

Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.

References

1.

Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404. MR51437 0037.20701 10.1090/S0002-9947-1950-0051437-7Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404. MR51437 0037.20701 10.1090/S0002-9947-1950-0051437-7

2.

Carbo, R., Leyda, L. and Arnau, M. (1980). An electron density measure of the similarity between two compounds. International Journal of Quantum Chemistry 17 1185–1189.Carbo, R., Leyda, L. and Arnau, M. (1980). An electron density measure of the similarity between two compounds. International Journal of Quantum Chemistry 17 1185–1189.

3.

Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. MR1239641 0799.62002Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. MR1239641 0799.62002

4.

Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011a). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPA.Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011a). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPA.

5.

Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011b). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPB.Czogiel, I., Dryden, I. L. and Brignell, C. J. (2011b). Supplement to “Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment.” DOI:10.1214/11-AOAS486SUPPB.

6.

Dryden, I. L., Hirst, J. D. and Melville, J. L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics 63 237–251. MR2345594 10.1111/j.1541-0420.2006.00622.xDryden, I. L., Hirst, J. D. and Melville, J. L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics 63 237–251. MR2345594 10.1111/j.1541-0420.2006.00622.x

7.

Dryden, I. L. and Mardia, K. V. (1998). Statistical Shape Analysis. Wiley, Chichester. MR1646114Dryden, I. L. and Mardia, K. V. (1998). Statistical Shape Analysis. Wiley, Chichester. MR1646114

8.

Good, A. C., So, S. S. and Richards, W. G. (1993). Structure-activity relationships from molecular similarity matrices. J. Med. Chem. 36 433–438.Good, A. C., So, S. S. and Richards, W. G. (1993). Structure-activity relationships from molecular similarity matrices. J. Med. Chem. 36 433–438.

9.

Green, P. J. and Mardia, K. V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93 235–254. MR2278080 1153.62020 10.1093/biomet/93.2.235Green, P. J. and Mardia, K. V. (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93 235–254. MR2278080 1153.62020 10.1093/biomet/93.2.235

10.

Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–390. MR1294070 0798.62109 10.1080/01621459.1994.10476754Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–390. MR1294070 0798.62109 10.1080/01621459.1994.10476754

11.

Kearsley, S. K. and Smith, G. M. (1990). An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlaps. Tetrahedron Computer Methodology 3 315–633.Kearsley, S. K. and Smith, G. M. (1990). An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlaps. Tetrahedron Computer Methodology 3 315–633.

12.

Kenobi, K. and Dryden, I. L. (2010). Bayesian matching of unlabelled point sets using Procrustes and configuration models. Technical report, Univ. Nottingham. Available at arXiv:1009.3072v1. 1009.3072v1Kenobi, K. and Dryden, I. L. (2010). Bayesian matching of unlabelled point sets using Procrustes and configuration models. Technical report, Univ. Nottingham. Available at arXiv:1009.3072v1. 1009.3072v1

13.

Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671–680. MR702485 10.1126/science.220.4598.671Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671–680. MR702485 10.1126/science.220.4598.671

14.

R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

15.

Richards, W. G. (1993). Computers in drug design. Pure and Applied Chemistry 65 231–234.Richards, W. G. (1993). Computers in drug design. Pure and Applied Chemistry 65 231–234.

16.

Ruffieux, Y. and Green, P. J. (2009). Alignment of multiple configurations using hierarchical models. J. Comput. Graph. Statist. 18 756–773. MR2572636 10.1198/jcgs.2009.07048Ruffieux, Y. and Green, P. J. (2009). Alignment of multiple configurations using hierarchical models. J. Comput. Graph. Statist. 18 756–773. MR2572636 10.1198/jcgs.2009.07048

17.

Schmidler, S. C. (2007). Fast Bayesian shape matching using geometric algorithms. In Bayesian Statistics 8 471–490. Oxford Univ. Press, Oxford. MR2433204 1252.62005Schmidler, S. C. (2007). Fast Bayesian shape matching using geometric algorithms. In Bayesian Statistics 8 471–490. Oxford Univ. Press, Oxford. MR2433204 1252.62005

18.

Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York. MR1697409Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York. MR1697409

19.

Taylor, J. E. and Worsley, K. J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1–27. MR2387962 1144.62083 10.1214/009053607000000406 euclid.aos/1201877292 Taylor, J. E. and Worsley, K. J. (2008). Random fields of multivariate test statistics, with applications to shape analysis. Ann. Statist. 36 1–27. MR2387962 1144.62083 10.1214/009053607000000406 euclid.aos/1201877292

20.

Wackernagel, H. (2003). Multivariate Geostatistics, 3rd ed. Springer, Berlin.Wackernagel, H. (2003). Multivariate Geostatistics, 3rd ed. Springer, Berlin.

21.

Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244. MR148188 10.1080/01621459.1963.10500845Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244. MR148188 10.1080/01621459.1963.10500845

22.

Worsley, K. J. (1994). Local maxima and the expected Euler characteristic of excursion sets of χ², F and t fields. Adv. in Appl. Probab. 26 13–42. MR1260300 0797.60042 10.2307/1427576Worsley, K. J. (1994). Local maxima and the expected Euler characteristic of excursion sets of χ², F and t fields. Adv. in Appl. Probab. 26 13–42. MR1260300 0797.60042 10.2307/1427576

Citation Download Citation

Irina Czogiel, Ian L. Dryden, and Christopher J. Brignell "Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment," The Annals of Applied Statistics 5(4), 2603-2629, (December 2011). https://doi.org/10.1214/11-AOAS486

Published: December 2011

Access the abstract

JOURNAL ARTICLE
27 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY