Electronic Journal of Statistics

Long range search for maximum likelihood in exponential families

Saisuke Okabayashi and Charles J. Geyer

Full-text: Open access


Exponential families are often used to model data sets with complex dependence. Maximum likelihood estimators (MLE) can be difficult to estimate when the likelihood is expensive to compute. Markov chain Monte Carlo (MCMC) methods based on the MCMC-MLE algorithm in [17] are guaranteed to converge in theory under certain conditions when starting from any value, but in practice such an algorithm may labor to converge when given a poor starting value. We present a simple line search algorithm to find the MLE of a regular exponential family when the MLE exists and is unique. The algorithm can be started from any initial value and avoids the trial and error experimentation associated with calibrating algorithms like stochastic approximation. Unlike many optimization algorithms, this approach utilizes first derivative information only, evaluating neither the likelihood function itself nor derivatives of higher order than first. We show convergence of the algorithm for the case where the gradient can be calculated exactly. When it cannot, it has a particularly convenient form that is easily estimable with MCMC, making the algorithm still useful to a practitioner.

Article information

Electron. J. Statist. Volume 6 (2012), 123-147.

First available in Project Euclid: 3 February 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Markov chain Monte Carlo exponential families Potts Ising exponential random graph stochastic approximation


Okabayashi, Saisuke; Geyer, Charles J. Long range search for maximum likelihood in exponential families. Electron. J. Statist. 6 (2012), 123--147. doi:10.1214/11-EJS664. http://projecteuclid.org/euclid.ejs/1328280900.

Export citation


  • [1] Andrieu, C., Moulines, E. and Priouret, P. (2005). Stability of Stochastic Approximation under Verifable Conditions., SIAM Journal on Control and Optimization 44 283–312.
  • [2] Barndorff-Nielsen, O. (1978)., Information and Exponential Families in Statistical Theory. John Wiley & Sons.
  • [3] Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems., Journal of the Royal Statistical Society, Series B 36 192-236.
  • [4] Besag, J. (1975). Statistical Analysis of Non-lattice Data., The Statistician 24 179-195.
  • [5] Brown, L. D. (1986)., Fundamentals of Statistical Exponential Families: with Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayward, CA.
  • [6] Chan, K. S. and Geyer, C. J. (1994). Discussion of the Paper by Tierney., Annals of Statistics 22 1747–1758.
  • [7] Chen, H.-F. (2002)., Stochastic Approximation and Its Applications. Kluwer Academic Publishers, Dordrecht.
  • [8] Fletcher, R. (1987)., Practical Methods of Optimization, Second ed. John Wiley & Sons.
  • [9] Geyer, C. J. (1990). Likelihood and Exponential Families PhD thesis, University of, Washington.
  • [10] Geyer, C. J. (1991). Markov chain Monte Carlo Maximum Likelihood. In, Computing Science and Statistics: Proc. 23rd Symp. Interface (E. Keramidas, ed.) 156–163. Interface Foundation.
  • [11] Geyer, C. J. (1994). On the Convergence of Monte Carlo Maximum Likelihood Calculations., Journal of the Royal Statistical Society, Series B 56 261-274.
  • [12] Geyer, C. J. (2009a). Likelihood Inference in Exponential Families and Directions of Recession., Electronic Journal of Statistics 3 259–289.
  • [13] Geyer, C. J. (2009b). mcmc: Markov chain Monte Carlo. R pakage version, 0.7-3.
  • [14] Geyer, C. J. (2010). aster2: Aster models. R pakage version, 0.1.
  • [15] Geyer, C. J. (2011). Introduction to MCMC. In, Handbook of Markov Chain Monte Carlo (S. P. Brooks, A. E. Gelman, G. L. Jones and X. L. Meng, eds.) Chapman & Hall/CRC, Boca Raton, FL.
  • [16] Geyer, C. J. and Johnson, L. T. (2010). potts: Markov chain Monte Carlo for Potts Models. R package version, 0.4.
  • [17] Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo Maximum Likelihood for Dependent Data., Journal of the Royal Statistical Society, Series B 54 657-699.
  • [18] Goodreau, S. M. (2007). Advances in Exponential Random Graph (p*) Models Applied to a Large Social Network., Social Networks 29 231–248.
  • [19] Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks., Demography 46 103—125.
  • [20] Gu, M. G. and Zhu, H.-T. (2001). Maximum Likelihood Estimation for Spatial Models by Markov Chain Monte Carlo Stochastic Approximation., Journal of the Royal Statistical Society, Series B 63 339–355.
  • [21] Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M. and Morris, M. (2003). statnet: Software Tools for the Statistical Modeling of Network Data. Version 2.0. Project home page at, http://statnetproject.org.
  • [22] Hummel, R., Hunter, D. R. and Handcock, M. S. (2010). A Steplength Algorithm for Fitting ERGMs Technical Report No. 10-03, Pennsylvania State, University.
  • [23] Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M. and Morris, M. (2008). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks., Journal of Statistical Software 24.
  • [24] Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus., Zeitschrift für Physik A Hadrons and Nuclei 31 253–258.
  • [25] Jaynes, E. T. (1978). Where Do We Stand on Maximum Entroy? In, The Maximum Entropy Formalism (R. D. Levine and M. Tribus, eds.) Cambridge: Massassachusetts Institute of Technology Press.
  • [26] Jones, G. L. (2004). On the Markov Chain Central Limit Theorem., Probability Surveys 1 299–320.
  • [27] Kushner, H. J. and Yin, G. G. (1997)., Stochastic Approximation Algorithms and Applications. Springer, New York.
  • [28] Lehmann, E. L. and Casella, G. (1998)., Theory of Point Estimation, Second ed. Springer.
  • [29] Lehmann, E. L. and Romano, J. P. (2005)., Testing Statistical Hypotheses, 3rd ed. Springer.
  • [30] Liang, F. (2010). Trajectory Averaging for Stochastic Approximation MCMC Algorithms., The Annals of Applied Statistics 38 2823–2856.
  • [31] Moyeed, R. A. and Baddeley, A. J. (1991). Stochastic Approximation of the MLE for a Spatial Point Pattern., Scandinavian Journal of Statistics 18 39–50.
  • [32] Nocedal, J. and Wright, S. J. (1999)., Numerical Optimization, First ed. Springer.
  • [33] Okabayashi, S. (2011a). Parameter Estimation in Social Network Models PhD thesis, University of, Minnesota.
  • [34] Okabayashi, S. (2011b). Supporting Theory and Data Analysis for “Long range search for maximum likelihood in exponential families” Technical Report No. 686, University of, Minnesota.
  • [35] Okabayashi, S., Johnson, L. and Geyer, C. J. (2011). Extending Pseudo-likelihood for Potts Models., Statistica Sinica 21 331–347.
  • [36] Penttinen, A. (1984). Modelling Interactions in Spatial Point Patterns: Parameter Estimation by the Maximum Likelihood Method., Jyväskylä Studies in Computer Science, Economics and Statistics 7.
  • [37] Potts, R. B. (1952). Some Generalized Order-Disorder Transformations., Proceedings of the Cambridge Philosphical Society 48 106–109.
  • [38] Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009). On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models., Electronic Journal of Statistics 3 446–484.
  • [39] Robbins, H. and Monro, S. (1951). A Stochastic Approximation Method., Annals of Mathematical Statistics 22 400–407.
  • [40] Roberts, G. O. and Rosenthal, J. S. (1997). Geometric Ergodicity and Hybrid Markov Chains., Electronic Communications in Probability 2 13–25.
  • [41] Roberts, G. O. and Rosenthal, J. S. (2004). General State Space Markov Chains and MCMC Algorithms., Probability Surveys 1 20–71.
  • [42] Rockafellar, R. T. and Wets, R. J.-B. (2004)., Variational Analysis. corrected second printing. Springer-Verlag, Berlin.
  • [43] Saul, Z. M. and Filkov, V. (2007). Exploring Biological Network Structure using Exponential Random Graph Models., Bioinformatics 23 2604-02611.
  • [44] Shaw, R. G., Geyer, C. J., Wagenius, S., Hangelbroek, H. H. and Etterson, J. R. (2008). Unifying Life-History Analyses for Inference of Fitness and Population Growth., The American Naturalist 172 E35-E47.
  • [45] Snijders, T. A. B. (2002). Markov Chain Monte Carlo Estimation of Exponential Random Graph Models., Journal of Social Structure 3.
  • [46] Strauss, D. and Ikeda, M. (1990). Pseudolikelihood Estimation for Social Networks., Journal of the American Statistical Association 85 204-212.
  • [47] Sun, W. and Yuan, Y.-X. (2006)., Optimization Theory and Methods: Nonlinear Programming. Springer.
  • [48] Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal Critical Dynamics in Monte Carlo Simulations., Physics Review Letters 58 86-88.
  • [49] van Duijn, M. A. J., Gile, K. J. and Handcock, M. S. (2009). A Framework for the Comparison of Maximum Pseudo-likelihood and Maximum Likelihood Estimation of Exponential Family Random Graph Models., Social Networks 31 52-62.
  • [50] Wang, J. S. and Swendsen, R. H. (1990). Cluster Monte Carlo Algorithms., Physics A 167 565–579.
  • [51] Wasserman, S. and Pattison, P. (1996). Logit Models and Logistic Regression for Social Networks: I. An Introduction to Markov Graphs and p*., Psychometrika 61 401-425.
  • [52] Younes, L. (1988). Estimation and Annealing for Gibbsian Fields., Ann. Inst. Henri Poincare 24 269–294.
  • [53] Younes, L. (1989). Parametric Inference for Imperfectly Observed Gibbsian Fields., Probability Theory and Related Fields 82 625–645.