The Annals of Applied Statistics

A model for sequential evolution of ligands by exponential enrichment (SELEX) data

Juli Atherton, Nathan Boley, Ben Brown, Nobuo Ogawa, Stuart M. Davidson, Michael B. Eisen, Mark D. Biggin, and Peter Bickel

Full-text: Open access


A Systematic Evolution of Ligands by EXponential enrichment (SELEX) experiment begins in round one with a random pool of oligonucleotides in equilibrium solution with a target. Over a few rounds, oligonucleotides having a high affinity for the target are selected. Data from a high throughput SELEX experiment consists of lists of thousands of oligonucleotides sampled after each round. Thus far, SELEX experiments have been very good at suggesting the highest affinity oligonucleotide, but modeling lower affinity recognition site variants has been difficult. Furthermore, an alignment step has always been used prior to analyzing SELEX data.

We present a novel model, based on a biochemical parametrization of SELEX, which allows us to use data from all rounds to estimate the affinities of the oligonucleotides. Most notably, our model also aligns the oligonucleotides. We use our model to analyze a SELEX experiment containing double stranded DNA oligonucleotides and the transcription factor Bicoid as the target. Our SELEX model outperformed other published methods for predicting putative binding sites for Bicoid as indicated by the results of an in-vivo ChIP-chip experiment.

Article information

Ann. Appl. Stat., Volume 6, Number 3 (2012), 928-949.

First available in Project Euclid: 31 August 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

SELEX transcription factor binding


Atherton, Juli; Boley, Nathan; Brown, Ben; Ogawa, Nobuo; Davidson, Stuart M.; Eisen, Michael B.; Biggin, Mark D.; Bickel, Peter. A model for sequential evolution of ligands by exponential enrichment (SELEX) data. Ann. Appl. Stat. 6 (2012), no. 3, 928--949. doi:10.1214/12-AOAS537.

Export citation


  • Atherton, J., Boley, N., Brown, B., Ogawa, N., Davidson, S. M., Eisen, M. B., Biggin, M. D. and Bickel, P. (2012). Supplement to “A model for sequential evolution of ligands by exponential enrichment (SELEX) data.” DOI:10.1214/12-AOAS537SUPP.
  • Atkins, P. (1998). Physical Chemistry. Freeman, New York.
  • Ay, A. and Arnosti, D. N. (2011). Mathematical modeling of gene expression: A guide for the perplexed biologist. Crit. Rev. Biochem. Mol. Biol. 46 137–151.
  • Bailey, T. L., Williams, N., Misleh, C. and Li, W. W. (2006). MEME: Discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34 369–373.
  • Berman, B. P., Pfeiffer, B. D., Laverty, T. R., Salzberg, S. L., Rubin, G. M., Eisen, M. B. and Celniker, S. E. (2004). Computational identification of developmental enhancers: Conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5 R61.
  • Biggin, M. D. (2011). Animal transcription networks as highly connected, quantitative continua. Dev. Cell 21 611–626.
  • Boyle, A. P., Song, L., Lee, B. K., London, D., Keefe, D., Birney, E., Iyer, V. R., Crawford, C. E. and Furey, T. S. (2010). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Research 21 456–464.
  • Djordjevic, M. (2007). SELEX experiments: New prospects, applications and data analysis in inferring regulatory pathways. Biomol. Eng. 24 179–189.
  • Djordjevic, M. and Sengupta, A. M. (2006). Quantitative modelling and data analysis of SELEX experiments. Physical Biology 3 13–28.
  • Ellington, A. D. and Szostak, J. W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346 818–822.
  • Freede, P. and Brantl, S. (2004). Transcriptional repressor CopR: Use of SELEX to study the copR operator indicates that evolution was directed at maximal binding. Journal of Bacteriology 186 6254–6264.
  • Guo, K., Paul, A., Schichor, C., Ziemer, G. and Wendel, H. P. (2008). CELL-SELEX: Novel perspectives of aptamer-based therapeutics. International Journal of Molecular Sciences 9 668–678.
  • Kaplan, T., Li, X. Y., Sabo, P., Peter, J. S., Thomas, S., Stamatoyannopoulos, J. A., Biggin, M. D. and Eisen, M. B. (2011). Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genetics 7 e1001290.
  • Kim, S., Shi, H., Lee, D. and Lis, J. T. (2003). Specific SR protein-dependent splicing substrates identified through genomic SELEX. Nuclei Acids Research 31 1955–1961.
  • Li, X.-Y., MacArthur, S., Bourgon, R., Nix, D., Pollard, D. A., Iyer, V. N., Hechmer, A., Simirenko, L., Stapleton, M., Hendriks, C. L. L., Chu, H. C., Ogawa, N., Inwood, W., Sementchenko, V., Beaton, A., Weiszmann, R., Celniker, S. E., Knowles, D. W., Gingeras, G., Speed, T. P., Eisen, M. B. and Biggin, M. D. (2008). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biology 6 e27.
  • Li, X.-Y., Thomas, S., Sabo, P. J., Eisen, M. B., Stamatoyannopoulos, J. A. and Biggin, M. D. (2011). The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol. 12 R34.
  • MacArthur, S., Li, X.-Y., Li, J., Brown, J. B., Chu, H. C., Zeng, L., Grondona, B. P., Hechmer, A., Simirenko, L., Keranen, S. V. E., Knowles, D. W., Stapleton, M., Bickel, P. J., Biggin, M. D. and Eisen, M. B. (2009). Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biology 10 R80.
  • Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal 7 308–313.
  • Ng, E. W. M., Shima, D. T., Calias, P., Cunningham, E. T. J. and Guyer, D. R. (2006). Pegaptanib, a targeted anti-VEGF aptamer for ocular vascular disease. Nature Reviews Drug Discovery 5 123–132.
  • Nocedal, J. and Wright, S. (2006). Numerical Optimization, 2nd ed. Springer, Berlin.
  • Ogawa, N. and Biggin, M. D. (2011). Gene regulatory networks: Methods and protocols. In High-Throughput SELEX Determination of DNA Sequences Bound by Transcription Factors in vitro (B. Deplancke and N. Gheldof, eds.). Methods in Molecular Biology 786 51–63. Humana Press, Clifton, NJ.
  • Powell, M. J. D. (1964). An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7 155–162.
  • Ravasi, T., Suzuki, H., Cannistraci, C. V., Katayama, S., Bajic, V. B., Tan, K., Akalin, A., Schmeier, S., Kanamori-Katayama, M., Bertin, N., Carninci, P., Daub, C. O., Forrest, A. R. R., Gough, J., Grimmond, S., Han, J. H., Hashimoto, T., Hide, W., Hofmann, O., Kamburov, A., Kaur, M., Kawaji, H., Kubosaki, A., Lassmann, T., v. Nimwegen, E., MacPherson, C. R., Ogawa, C., Radovanovic, A., Schwartz, A., Teasdale, R. D., Tegnér, J., Lenhard, B., Teichmann, S. A., Arakawa, T., Ninomiya, N., Murakami, Tagami, M., Fukuda, S., Imamura, K., Kai, C., Ishihara, R., Kitazume, Y., Kawai, J., Hume, D. A., Ideker, T. and HayashizakiSee, Y. (2010). An atlas of combinatorial transcription regulation in mouse. Cell 140 744–752.
  • Segal, E., Sadka, T., Schroeder, M., Unnerstall, U. and Gaul, U. (2006). Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451 535–540.
  • Sharon, E., Lubliner, S. and Segal, E. (2008). A feature-based approach to modeling protein-DNA interactions. PLoS Comput. Biol. 4 e1000154.
  • Tuerk, C. and Gold, L. (1990). Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249 505–510.
  • von Hipple, P. H. (2007). From ‘simple’ DNA–protein interactions to the macromolecular machines of gene expression. Annual Review of Biophysics 36 79–105.
  • Zhoa, Y., Granas, D. and Stormo, G. D. (2009). Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5 e1000590.

Supplemental materials

  • Supplementary material: Code for SELEX model. The code for the SELEX model used in the application of this paper is available at the above url. Extra simulations, mentioned in Section 5.1, are also provided as supplementary material.