Statistical Science

Gibbs Sampling, Exponential Families and Orthogonal Polynomials

Persi Diaconis, Kshitij Khare, and Laurent Saloff-Coste

Full-text: Open access

Abstract

We give families of examples where sharp rates of convergence to stationarity of the widely used Gibbs sampler are available. The examples involve standard exponential families and their conjugate priors. In each case, the transition operator is explicitly diagonalizable with classical orthogonal polynomials as eigenfunctions.

Article information

Source
Statist. Sci. Volume 23, Number 2 (2008), 151-178.

Dates
First available in Project Euclid: 21 August 2008

Permanent link to this document
http://projecteuclid.org/euclid.ss/1219339107

Digital Object Identifier
doi:10.1214/07-STS252

Mathematical Reviews number (MathSciNet)
MR2446500

Citation

Diaconis, Persi; Khare, Kshitij; Saloff-Coste, Laurent. Gibbs Sampling, Exponential Families and Orthogonal Polynomials. Statistical Science 23 (2008), no. 2, 151--178. doi:10.1214/07-STS252. http://projecteuclid.org/euclid.ss/1219339107.


Export citation

References

  • [1] Akhiezer, N. and Glazman, I. (1993). Theory of Linear Operators in Hilbert Space. Dover, New York.
  • [2] Amit, Y. (1996). Convergence properties of the Gibbs sampler for perturbations of Gaussians. Ann. Statist. 24 122–140.
  • [3] Anderson, W. (1991). Continuous-Time Markov Chains. An Applications-Oriented Approach. Springer, New York.
  • [4] Athreya, K., Doss, H. and Sethuraman, J. (1996). On the convergence of the Markov chain simulation method. Ann. Statist. 24 89–100.
  • [5] Baik, J., Kriecherbauer, T., McLaughlin, K. and Miller, P. (2003). Uniform asymptotics for polynomials orthogonal with respect to a general class of weights and universality results for associated ensembles. Intern. Math. Res. Not. 15 821–858.
  • [6] Bakry, D. and Mazet, O. (2003). Characterization of Markov semigroups on ℝ associated to some families of orthogonal polynomials. Séminaire de Probabilités XXXVII 60–80. Lecture Notes in Math. 1832. Springer, Berlin.
  • [7] Bar-Lev, S., Bshouty, D., Enis, P., Letac, G., Lu, I.-L. and Richards, D. (1994). The diagonal multivariate natural exponential families and their classification. J. Theoret. Probab. 7 883–929.
  • [8] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
  • [9] Baxendale, P. (2005). Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15 700–738.
  • [10] Ben Arous, G., Bovier, A. and Gayard, V. (2003). Glauber dynamics of the random energy model. I, II. Comm. Math Phys. 235 379–425, 236 1–54.
  • [11] Brown, L. (1986). Fundamentals of Statistical Exponential Families. IMS, Hayward, CA.
  • [12] Bryc, W. (2006). Approximation operators, exponential, and free exponential families. Preprint, Dept. Math. Sci., Univ. Cincinnati.
  • [13] Buja, A. C. (1990). Remarks on functional canonical variates, alternating least square methods and ACE. Ann. Statist. 18 1032–1069.
  • [14] Cannings, C. (1974). The latent roots of certain Markov chains asrising in genetics: A new approach. I. Haploid models. Adv. in Appl. Probab. 6 260–290.
  • [15] Casalis, M. (1996). The 2d+4 simple quadratic families on Rd. Ann. Statist. 24 1828–1854.
  • [16] Casella, G. and George, E. (1992). Explaining the Gibbs sampler. Amer. Statist. 46 167–174.
  • [17] Chamayou, J. and Letac, G. (1991). Explicit stationary distributions for compositions of random functions and products of random matrices. J. Theoret. Probab. 4 3–36.
  • [18] Chihara, T. (1978). An Introduction to Orthogonal Polynomials. Gordon and Breach, New York.
  • [19] Consonni, G. and Veronese, P. (1992). Conjugate priors for exponential families having quadratic variance functions. J. Amer. Statist. Assoc. 87 1123–1127.
  • [20] Cooper, R., Hoare, M. and Rahman, M. (1977). Stochastic Processes and special functions: On the probabilistic origin of some positive kernels associated with classical orthogonal polynomials. J. Math. Anal. Appl. 61 262–291.
  • [21] Dauxois, J. and Pousse, A. (1975). Une extension de l’analyse canonique. Quelques applications. Ann. Inst. H. Poincaré Sect. B (N.S.) 11 355–379.
  • [22] Deutsch, F. (2001). Best Approximation in Inner Product Spaces. Springer, New York.
  • [23] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev. 41 45–76.
  • [24] Diaconis, P., Khare, K. and Saloff-Coste, L. (2006). Stochastic alternating projections. Preprint, Dept. Statistics, Stanford Univ.
  • [25] Diaconis, P., Khare, K. and Saloff-Coste, L. (2006). Gibbs sampling, exponential families and coupling. Preprint, Dept. of Statistics, Stanford Univ.
  • [26] Diaconis, P. and Saloff-Coste, L. (1993). Comparison theorems for Markov chains. Ann. Appl. Probab. 3 696–730.
  • [27] Diaconis, P. and Saloff-Coste, L. (2006). Separation cut-offs for birth and death chains. Ann. Appl. Probab. 16 2098–2122.
  • [28] Diaconis, P. and Stanton, D. (2006). A hypergeometric walk. Preprint, Dept. Statistics, Stanford Univ.
  • [29] Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7 269–281.
  • [30] Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion. In Bayesian Statistics 2 (J. Bernardo et al. eds.) 133–156. North-Holland, Amsterdam.
  • [31] Donoho, D. and Johnstone, I. (1989). Projection-based approximation and a duality with kernel methods. Ann. Statist. 17 58–106.
  • [32] Dyer, M., Goldberg, L., Jerrum, M. and Martin, R. (2005). Markov chain comparison. Probab. Surv. 3 89–111.
  • [33] Eaton, M. L. (1992). A statistical diptych: Admissible inferences—Recurrence of symmetric Markov chains. Ann. Statist. 20 1147–1179.
  • [34] Eaton, M. L. (1997). Admissibility in quadratically regular problems and recurrence of symmetric Markov chains: Why the connection? J. Statist. Plann. Inference 64 231–247.
  • [35] Eaton, M. L. (2001). Markov chain conditions for admissibility on estimation problems with quadratic loss. In State of the Art in Statistics and Probability. A Festschrift for Willen von Zwet (M. de Gunst, C. Klaassen and A. van der Vrad, eds.) 223–243. IMS, Beachwood, OH.
  • [36] Eagleson, G. (1964). Polynomial expansions of bivariate distributions. Ann. Math. Statist. 25 1208–1215.
  • [37] Esch, D. (2003). The skew-t distribution: Properties and computations. Ph.D. dissertation, Dept. Statistics, Harvard Univ.
  • [38] Ewens, W. (2004). Mathematical Population Genetics. I. Theoretical Introduction, 2nd ed. Springer, New York.
  • [39] Feinsilver, P. (1986). Some classes of orthogonal polynomials associated with martingales. Proc. Amer. Math. Soc. 98 298–302.
  • [40] Feinsilver, P. (1991). Orthogonal polynomials and coherent states. In Symmetries in Science V 159–172. Plenum Press, New York.
  • [41] Feinsilver, P. and Schott, R. (1993). Algebraic Structures and Operator Calculus. I. Representations and Probability Theory. Kluwer Academic Press, Dordrecht.
  • [42] Feller, W. (1951). Diffusion processes in genetics. Proc. Second Berkeley Symp. Math. Statist. Probab. 227–246. Univ. California Press, Berkeley.
  • [43] Feller, W. (1968). An Introduction to Probability Theory and Its Applications. I, 3rd ed. Wiley, New York.
  • [44] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. II, 2nd ed. Wiley, New York.
  • [45] Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • [46] Geman, S. and Geman, D. (1984). Stochastic relaxation Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intelligence 6 721–741.
  • [47] Gilks, W. Richardson, S. and Spiegelhalter, D. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall, London.
  • [48] Gill, J. (2002). Bayesian Methods: A Social and Behavioral Sciences Approach. Chapman and Hall, Boca Raton, FL.
  • [49] Glauber, R. (1963). Time dependent statistics of the Ising model. J. Math. Phys. 4 294–307.
  • [50] Goodman, J. and Sokal, A. (1984). Multigrid Monte Carlo method conceptual foundations. Phys. Rev. D 40 2035–2071.
  • [51] Griffiths, R. C. (1971). Orthogonal polynomials on the multinomial distribution. Austral. J. Statist. 13 27–35.
  • [52] Griffiths, R. C. (1978). On a bivariate triangular distribution. Austral. J. Statist. 20 183–185.
  • [53] Griffiths, R. C. (2006). n-kernel orthogonal polynomials on the Dirichlet, Dirichlet-multinomial, Ewens’ sampling distribution and Poisson Dirichlet processes. Lecture notes available at http://www.stats.ox.ac.uk/~griff/.
  • [54] Gross, L. (1979). Decay of correlations in classical lattice models at high temperatures. Comm. Math. Phys. 68 9–27.
  • [55] Gutierrez-Pena, E. and Smith, A. (1997). Exponential and Bayes in conjugate families: Review and extensions. Test 6 1–90.
  • [56] Harkness, W. and Harkness, M. (1968). Generalized hyperbolic secant distributions. J. Amer. Statist. Assoc. 63 329–337.
  • [57] Hassairi, A. and Zarai, M. (2004). Characterization of the cubic exponential families by orthogonality of polynomials. Ann. Statist. 32 2463–2476.
  • [58] Hoare, M. and Rahman, M. (1979). Distributed processes in discrete systems. Physica 97A 1–41.
  • [59] Hoare, M. and Rahman, M. (1983). Cumulative Bernoulli trials and Krawtchouk processes. Stochastic Process. Appl. 16 113–139.
  • [60] Hoare, M. and Rahman, M. (2007). A probabilistic origin for a new class of bivariate polynomials. Preprint, School of Math. and Stat. Carleton Univ., Ottawa.
  • [61] Hobert, J. P. and Robert, C. P. (1999). Eaton’s Markov chain, its conjugate partner and P-admissibility. Ann. Statist. 27 361–373.
  • [62] Ismail, M. (1977). Connection relations and bilinear formulas for the classical orthogonal polynomials. J. Math. Anal. Appl. 57 487–496.
  • [63] Ismail, M. (2005). Classical and Quantum Orthogonal Polynomials. Cambridge Univ. Press.
  • [64] Jones, G. and Hobert, J. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16 312–334.
  • [65] Jones, G. L. and Hobert, J. P. (2004). Sufficient burn-in for Gibbs samplers for a hierarchical random effects model. Ann. Statist. 32 784–817.
  • [66] Jorgensen, C. (1997). The Theory of Dispersion Models. Chapman and Hall, London.
  • [67] Karlin, S. and McGregor, J. (1961). The Hahn polynomials, formulas and application. Scripta. Math. 26 33–46.
  • [68] Koekoek, R. and Swarttouw, R. (1998). The Askey-scheme of hypergeometric orthogonal polynomials and its q-analog. Available at http://math.nist.gov/opsf/projects/koekoek.html.
  • [69] Koudou, A. and Pommeret, D. (2000). A construction of Lancaster probabilities with margins in the multidimensional Meixner class. Austr. N. Z. J. Stat. 42 59–66.
  • [70] Koudou, A. (1996). Probabilities de Lancaster. Exp. Math. 14 247–275.
  • [71] Koudou, A. (1998). Lancaster bivariate probability distributions with Poisson, negative binomial and gamma margins. Test 7 95–110.
  • [72] Lancaster, H. O. (1969). The Chi-Squared Distribution. Wiley, New York.
  • [73] Lehmann, E. and Romano, J. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • [74] Letac, G. (1992). Lectures on Natural Exponential Families and Their Variance Functions. Monografías de Matemática 50. IMPA, Rio de Janeiro.
  • [75] Letac, G. (2002). Donkey walk and Dirichlet distributions. Statist. Probab. Lett. 57 17–22.
  • [76] Letac, G. and Mora, M. (1990). Natural real exponential families with cubic variance functions. Ann. Statist. 18 1–37.
  • [77] Liu, J., Wong, W. and Kong, A. (1995). Covariance structure and convergence rates of the Gibbs sampler with various scans. J. Roy. Statist. Soc. Ser. B 57 157–169.
  • [78] Liu, J. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
  • [79] Malouche, D. (1998). Natural exponential families related to Pick functions. Test 7 391–412.
  • [80] Mckenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Adv. in Appl. Probab. 20 822–835.
  • [81] Marchev, D. and Hobert, J. P. (2004). Geometric ergodicity of van Dyk and Meng’s algorithm for the multivariate Student’s t model. J. Amer. Statist. Assoc. 99 228–238.
  • [82] Meng, X. L. and Zaslavsky, A. (2002). Single observation unbiased priors. Ann. Statist. 30 1345–1375.
  • [83] Meixner, J. (1934). Orthogonal polynom system mit einer Besonderth Gestalt der Erzengerder function. J. London Math. Soc. 9 6–13.
  • [84] Meyer, P. A. (1966). Probability and Potentials. Blaisdell, Waltham, MA.
  • [85] Moreno, E. and Girón, F. (1998). Estimating with incomplete count data: A Bayesian approach. J. Statist. Plann. Inference 66 147–159.
  • [86] Morris, C. (1982). Natural exponential families with quadratic variance functions. Ann. Statist. 10 65–80.
  • [87] Morris, C. (1983). Natural exponential families with quadratic variance functions: Statistical theory. Ann. Statist. 11 515–589.
  • [88] Newman, M. and Barkema, G. (1999). Monte Carlo Methods in Statistical Physics. Oxford Univ. Press.
  • [89] Pitt, M. and Walker, S. (2006). Extended constructions of stationary autoregressive processes. Statist. Probab. Lett. 76 1219–1224.
  • [90] Pitt, M. and Walker, S. (2005). Constructing stationary time series models using auxiliary variables with applications. J. Amer. Statist. Assoc. 100 554–564.
  • [91] Pitt, M., Chatfield, C. and Walker, S. (2002). Constructing first order stationary autoregressive models via latent processes. Scand. J. Statist. 29 657–663.
  • [92] Pommeret, D. (1996). Natural exponential families and Lie algebras. Exp. Math. 14 353–381.
  • [93] Pommeret, D. (2001). Posterior variance for quadratic natural exponential families. Statist. Probab. Lett. 53 357–362.
  • [94] Ringrose, J. (1971). Compact Non-Self-Adjoint Operators. Van Nostrand, New York.
  • [95] Rosenthal, J. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90 558–566.
  • [96] Rosenthal, J. (1996). Analysis of the Gibbs sampler for a model related to James-Stein estimations. Statist. Comput. 6 269–275.
  • [97] Rosenthal, J. S. (2002). Quantitative convergence rates of Markov chains: A simple account. Electron. Comm. Probab. 7 123–128.
  • [98] Roy, V. and Hobert, J. P. (2007). Convergence rates and asymptotic standard errors for MCMC algorithms for Bayesian probit regression. J. Roy. Statist. Soc. Ser. B 69 607–623.
  • [99] Saloff-Coste, L. (2004). Total variation lower bounds for finite Markov chains: Wilson’s lemma. In Random Walks and Geometry (V. Kaimanovich and W. Woess, eds.) 515–532. de Gruyter, Berlin.
  • [100] Szego, G. (1959). Orthogonal Polynomials, rev. ed. Amer. Math. Soc., New York.
  • [101] Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550.
  • [102] Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701–1762.
  • [103] Turcin, V. (1971). On the computation of multidimensional integrals by the Monte Carlo method. Probab. Appl. 16 720–724.
  • [104] Van Doorn, E. A. (2003). Birth-death processes and associated polynomials. In Proceedings of the Sixth International Symposium on Orthogonal Polynomials, Special Functions and their Applications (Rome, 2001). J. Comput. Appl. Math. 153 497–506.
  • [105] Walter, G. and Hamedani, G. (1991). Bayes empirical Bayes estimation for natural exponential families with quadratic variance functions. Ann. Statist. 19 1191–1224.
  • [106] Wilson, D. (2004). Mixing times of Lozenge tiling and card shuffling Markov chains. Ann. Appl. Probab. 14 274–325.
  • [107] Yuen, W. K. (2000). Applications of geometric bounds to the convergence rate of Markov chains in ℝn. Stochastic Process. Appl. 87 1–23.