## Electronic Journal of Statistics

### Noise contrastive estimation: Asymptotic properties, formal comparison with MC-MLE

#### Abstract

A statistical model is said to be un-normalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MC-MLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial data-points to approximate the normalising constant. While the asymptotics of MC-MLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MC-MLE under several asymptotic regimes. In particular, we show that, when $m\rightarrow \infty$ while $n$ is fixed ($m$ and $n$ being respectively the number of artificial data-points, and actual data-points), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial data-points are IID, and when $n\rightarrow \infty$ while $m/n$ converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MC-MLE estimator. We illustrate the variance reduction brought by NCE through a numerical study.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 3473-3518.

Dates
First available in Project Euclid: 17 October 2018

https://projecteuclid.org/euclid.ejs/1539741651

Digital Object Identifier
doi:10.1214/18-EJS1485

Mathematical Reviews number (MathSciNet)
MR3865268

Zentralblatt MATH identifier
06970010

#### Citation

Riou-Durand, Lionel; Chopin, Nicolas. Noise contrastive estimation: Asymptotic properties, formal comparison with MC-MLE. Electron. J. Statist. 12 (2018), no. 2, 3473--3518. doi:10.1214/18-EJS1485. https://projecteuclid.org/euclid.ejs/1539741651

#### References

• Ando, T. (1979). Concavity of certain maps on positive definite matrices and applications to hadamard products., Linear Algebra and its Applications, 26:203–241.
• Barthelmé, S. and Chopin, N. (2015). The Poisson transform for unnormalised statistical models., Stat. Comput., 25(4):767–780.
• Boyd, S. and Vandenberghe, L. (2004)., Convex optimization. Cambridge university press.
• Bradley, R. C. et al. (2005). Basic properties of strong mixing conditions. a survey and some open questions., Probability surveys, 2:107–144.
• Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models., Social Networks, 33(1):41–55.
• Delyon, B. (2018). Estimation paramétrique., Unpublished lecture notes.
• Geyer, C. J. (1994). On the convergence of Monte Carlo maximum likelihood calculations., J. Roy. Statist. Soc. Ser. B, 56(1):261–274.
• Geyer, C. J. (2012). The Wald consistency theorem., Unpublished lecture notes.
• Gu, M. G. and Zhu, H.-T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation., J. R. Stat. Soc. Ser. B Stat. Methodol., 63(2):339–355.
• Gutmann, M. U. and Hyvärinen, A. (2012). Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics., J. Mach. Learn. Res., 13:307–361.
• Jones, G. L. (2004). On the Markov chain central limit theorem., Probab. Surv., 1:299–320.
• Meyn, S. P. and Tweedie, R. L. (2012)., Markov chains and stochastic stability. Springer Science & Business Media.
• Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. In, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1751–1758.
• Roberts, G. O. and Rosenthal, J. S. (2004). General state space Markov chains and MCMC algorithms., Probab. Surv., 1:20–71.
• Rockafellar, R. T. and Wets, R. J.-B. (2009)., Variational analysis, volume 317. Springer Science & Business Media.
• Salakhutdinov, R. and Hinton, G. (2009). Deep boltzmann machines. In, Artificial Intelligence and Statistics, pages 448–455.
• Wald, A. (1949). Note on the consistency of the maximum likelihood estimate., Ann. Math. Statistics, 20:595–601.
• Walker, S. G. (2011). Posterior sampling when the normalizing constant is unknown., Comm. Statist. Simulation Comput., 40(5):784–792.
• Wang, C., Komodakis, N., and Paragios, N. (2013). Markov random field modeling, inference & learning in computer vision & image understanding: A survey., Computer Vision and Image Understanding, 117(11):1610–1627.