Institute of Mathematical Statistics Collections
previous :: next

Model selection and sensitivity analysis for sequence pattern models

Mayetri Gupta

Abstract

In this article we propose a maximal a posteriori (MAP) criterion for model selection in the motif discovery problem and investigate conditions under which the MAP asymptotically gives a correct prediction of model size. We also investigate robustness of the MAP to prior specification and provide guidelines for choosing prior hyper-parameters for motif models based on sensitivity considerations.

First Page: Show Hide
Primary Subjects: 62F15, 62P10
Secondary Subjects: 62F12
Keywords: Bayes factor; MAP; model selection; motif discovery
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.imsc/1207058288
Digital Object Identifier: doi:10.1214/193940307000000301

References

[1] Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover, New York.
[2] Berger, J. O. (1993). Statistical Decision Theory and Bayesian Analysis. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1234489
Zentralblatt MATH: 0572.62008
[3] Chen, M.-H. and Shao, Q.-M. (1997a). Estimating ratios of normalizing constants for densities with different dimensions. Statist. Sinica 7 607–630.
Mathematical Reviews (MathSciNet): MR1467451
Zentralblatt MATH: 0885.62029
[4] Chen, M.-H. and Shao, Q.-M. (1997b). On Monte Carlo methods for estimating ratios of normalizing constants. Ann. Statist. 25 1563–1594.
Mathematical Reviews (MathSciNet): MR1463565
Zentralblatt MATH: 0936.62028
Digital Object Identifier: doi:10.1214/aos/1031594732
Project Euclid: euclid.aos/1031594732
[5] Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. J. Amer. Statist. Assoc. 96 270–281.
Mathematical Reviews (MathSciNet): MR1952737
Zentralblatt MATH: 1015.62020
Digital Object Identifier: doi:10.1198/016214501750332848
[6] Gupta, M. and Liu, J. S. (2003). Discovery of conserved sequence patterns using a stochastic dictionary model. J. Amer. Statist. Assoc. 98 55–66.
Mathematical Reviews (MathSciNet): MR1965674
Zentralblatt MATH: 1047.62107
Digital Object Identifier: doi:10.1198/016214503388619094
[7] Gupta, M. and Liu, J. S. (2006). Bayesian modeling and inference for motif discovery. Bayesian Inference for Gene Expression and Proteomics. Cambridge Univ. Press.
[8] Kass, R. E. (1993). Bayes factors in practice. Statistician 42 551–560.
[9] Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208–14.
[10] Lawrence, C. E. and Reilly, A. A. (1990). An expectation-maximization (EM) algorithm for the identification and characterization of common sites in biopolymer sequences. Proteins 7 41–51.
[11] Leamer, E. E. (1982). Sets of posterior means with bounded variance prior. Econometrica 50 725–736.
Mathematical Reviews (MathSciNet): MR662728
Digital Object Identifier: doi:10.2307/1912610
[12] Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350–1360.
Mathematical Reviews (MathSciNet): MR1186253
Zentralblatt MATH: 0763.62015
Digital Object Identifier: doi:10.1214/aos/1176348772
Project Euclid: euclid.aos/1176348772
[13] Meng, X. L. and Wong, W. (1996). Simulating ratios of normalising constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831–860.
Mathematical Reviews (MathSciNet): MR1422406
Zentralblatt MATH: 0857.62017
[14] Polasek, W. (1982). Local sensitivity analysis and matrix derivatives. In Operations Research in Progress (G. Feichtinger et al., eds.) 425–443. Reidel, Dordrecht.
Mathematical Reviews (MathSciNet): MR710505
Zentralblatt MATH: 0516.62105
[15] Sandve, G. K. and Drablos, F. (2006). A survey of motif discovery methods in an integrated framework. Biology Direct 1 11.
[16] Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183–1187.
[17] Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W. and Lawrence, C. E. (2000). Human-mouse genome comparisons to locate regulatory sites. Nature Genetics 26 225–228.
previous :: next

2012 © Institute of Mathematical Statistics

Institute of Mathematical Statistics Collections

Institute of Mathematical Statistics Collections