The Annals of Applied Statistics

Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations

Jo Bovy, David W. Hogg, and Sam T. Roweis

Full-text: Open access


We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation–Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual d-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or “underlying” distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a “split-and-merge” procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.

Article information

Ann. Appl. Stat. Volume 5, Number 2B (2011), 1657-1677.

First available in Project Euclid: 13 July 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian inference density estimation Expectation–Maximization missing data multivariate estimation noise


Bovy, Jo; Hogg, David W.; Roweis, Sam T. Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations. Ann. Appl. Stat. 5 (2011), no. 2B, 1657--1677. doi:10.1214/10-AOAS439.

Export citation


  • Antoja, T., Figueras, F., Fernández, D. and Torra, J. (2008). Origin and evolution of moving groups. I. Characterization in the observational kinematic-age-metallicity space. Astron. Astrophys. 490 135.
  • Baxter, R. A. (1995). Finding overlapping distributions with MML. Technical Report No. 244, Dept. Computer Science, Monash Univ., Clayton, Australia.
  • Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, Univ. College London.
  • Binney, J. and Merrifield, M. (1998). Galactic Astronomy. Princeton Univ. Press, Princeton, NJ.
  • Blaauw, A., Gum, C. S., Pawsey, J. L. and Westerhout, G. (1960). The new IAU system of galactic coordinates (1958 revision). Mon. Not. R. Astron. Soc. 121 123.
  • Bovy, J. (2010). Tracing the Hercules stream around the galaxy. Astrophys. J. 725 1676.
  • Bovy, J., Hogg, D. W. and Roweis, S. T. (2009). The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups. Astrophys. J. 700 1794.
  • Bovy, J. and Hogg, D. W. (2010). The velocity distribution of nearby stars from Hipparcos data II. The nature of the low-velocity moving groups. Astrophys. J. 717 617.
  • Broniatowski, M., Celeux, G. and Diebolt, J. (1983). Reconaissance de Densités par un Algorithme d’Apprentissage Probabiliste. In Data Analysis and Informatics, Vol. 3 359–373. North-Holland, Amsterdam.
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Chapman and Hall/CRC, Boca Raton, FL.
  • Celeux, G. and Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Statist. Quart 2 73.
  • Celeux, G. and Diebolt, J. (1986). L’Algorithme SEM: un Algorithme d’Apprentissage Probabiliste pour la Reconnaisance de Mélanges de Densités. Rev. Stat. Appl. 34 35.
  • De Simone, R., Wu, X. and Tremaine, S. (2004). The stellar velocity distribution in the solar neighbourhood. Mon. Not. R. Astron. Soc. 350 627.
  • Dehnen, W. (1998). The distribution of nearby stars in velocity space inferred from Hipparcos data. Astron. J. 115 2384.
  • Dehnen, W. (2000). The effect of the outer Lindblad resonance of the galactic bar on the local stellar velocity distribution. Astron. J. 119 800.
  • Dehnen, W. and Binney, J. J. (1998). Local stellar kinematics from Hipparcos data. Mon. Not. R. Astron. Soc. 298 387.
  • Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli 14 562–579.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. Stat. 39 1–38.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B Methodol. Stat. 56 363–375.
  • ESA (1997). The Hipparcos and Tycho Catalogues. ESA SP-1200, Noordwijk.
  • Famaey, B., Jorissen, A., Luri, X., Mayor, M., Udry, S., Dejonghe, H. and Turon, C. (2005). Local kinematics of K and M giants from CORAVEL/Hipparcos/Tycho-2 data. Revisiting the concept of superclusters. Astron. Astrophys. 430 165.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2000). Bayesian Data Analysis. Chapman and Hall/CRC, Boca Raton, FL.
  • Ghahramani, Z. and Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. In Advances in Neural Information Processing Systems 12 ( S. A. Solla, T. K. Leen and K. R. Muller, eds.) 449. MIT Press, Cambridge, MA.
  • Ghahramani, Z. and Jordan, M. I. (1994a). Learning from incomplete data. CBCL Technical Report No. 108. Center for Biological and Computational Learning, MIT.
  • Ghahramani, Z. and Jordan, M. I. (1994b). Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems 6 ( J. D. Cowan, G. Tesauro and J. Alspector, eds.) 120–127. Morgan Kaufman, San Francisco.
  • Helmi, A., White, S. D. M., de Zeeuw, P. T. and Zhao, H. (1999). Debris streams in the solar neighbourhood as relicts from the formation of the milky way. Nature 402 53–55.
  • Hogg, D. W., Blanton, M. R., Roweis, S. T. and Johnston, K. V. (2005). Modeling complete distributions with incomplete observations: The velocity ellipsoid from Hipparcos data. Astrophys. J. 629 268.
  • Holmberg, J., Nordström, B. and Andersen, J. (2009). The Geneva–Copenhagen survey of the solar neighbourhood III. Improved distances, ages, and kinematics. Astron. Astrophys. 501 941.
  • Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50–67.
  • MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, Cambridge.
  • McLachlan, G. J. and Basford, K. (1988). Mixture Models: Inference and Application to Clustering. Dekker, New York.
  • Nordström, B., Mayor, M., Andersen, J., Holmberg, J., Pont, F., Jørgensen, B. R., Olsen, E. H., Udry, S. and Mowlavi, N. (2004). The Geneva–Copenhagen survey of the solar neighbourhood. Ages, metallicities, and kinematic properties of ∼14 000 F and G dwarfs. Astron. Astrophys. 418 989.
  • Oliver, J. J., Baxter, R. A. and Wallace, C. S. (1996). Unsupervised learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96) 364. Morgan Kaufmann, San Francisco.
  • Ormoneit, D. and Tresp, V. (1996). Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27–30, 1995 ( D. S. Touretzky, M. Mozer and M. E. Hasselmo, eds.) 542–548. MIT Press, Cambridge.
  • Quillen, A. C. and Minchev, I. (2005). The effect of spiral structure on the stellar velocity distribution in the solar neighborhood. Astron. J. 130 576.
  • Rabiner, L. and Biing-Hwang, J. (1993). Fundamentals of Speech Recognition. Prentice-Hall, New York.
  • Rasmussen, C. (2000). The infinite Gaussian mixture model. In Advances in Neural Information Processing Systems 12 ( S. A. Solla, T. K. Leen and K. R. Muller, eds.) 554–560. MIT Press, Cambridge.
  • Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B Methodol. Stat. 59 731–792.
  • Rissanen, J. (1978). Modeling by shortest data description. Automatica 14 465.
  • Roberts, S. J., Husmeier, D., Rezek, I. and Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 20 1133.
  • Schafer, D. W. (1993). Likelihood analysis for probit regression with measurement errors. Biometrika 80 899.
  • Schafer, D. W. and Purdy, K. G. (1996). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika 83 813–824.
  • Schwartz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton, FL.
  • Skuljan, J., Hearnshaw, J. B. and Cottrell, P. L. (1999). Velocity distribution of stars in the solar neighbourhood. Mon. Not. R. Astron. Soc. 308 731.
  • Staudenmayer, J., Ruppert, D. and Buonaccorsi, J. (2008). Density estimation in the presence of heteroscedastic measurement error. J. Amer. Statist. Assoc. 103 726–736.
  • Stefanski, L. A. and Carroll, R. J. (1990). Deconvoluting kernel density estimators. Statistics 21 169–184.
  • Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol. Stat. 36 111–147.
  • Ueda, N., Nakano, R., Ghahramani, Z. and Hinton, G. E. (1998). Split and merge EM algorithm for improving Gaussian mixture density estimates. In Neural Networks for Signal Processing VIII, 1998. Proceedings of the 1998 IEEE Signal Processing Society Workshop 274–283. IEEE.
  • van Leeuwen, F. (2007a). Hipparcos, the New Reduction of the Raw Data. Astrophysics and Space Science Library 250. Springer, Dordrecht.
  • van Leeuwen, F. (2007b). Validation of the new Hipparcos reduction. Astron. Astrophys. 474 653.
  • Wallace, C. S. and Boulton, D. M. (1968). An information measure for classification. Comput. J. 11 185.
  • Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95–103.
  • Zhang, C. H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806–831.