The Annals of Statistics

Approximation of conditional densities by smooth mixtures of regressions

Andriy Norets

Full-text: Open access


This paper shows that large nonparametric classes of conditional multivariate densities can be approximated in the Kullback–Leibler distance by different specifications of finite mixtures of normal regressions in which normal means and variances and mixing probabilities can depend on variables in the conditioning set (covariates). These models are a special case of models known as “mixtures of experts” in statistics and computer science literature. Flexible specifications include models in which only mixing probabilities, modeled by multinomial logit, depend on the covariates and, in the univariate case, models in which only means of the mixed normals depend flexibly on the covariates. Modeling the variance of the mixed normals by flexible functions of the covariates can weaken restrictions on the class of the approximable densities. Obtained results can be generalized to mixtures of general location scale densities. Rates of convergence and easy to interpret bounds are also obtained for different model specifications. These approximation results can be useful for proving consistency of Bayesian and maximum likelihood density estimators based on these models. The results also have interesting implications for applied researchers.

Article information

Ann. Statist., Volume 38, Number 3 (2010), 1733-1766.

First available in Project Euclid: 24 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation
Secondary: 41A30: Approximation by other special function classes

Finite mixtures of normal distributions smoothly mixing regressions mixtures of experts Bayesian conditional density estimation


Norets, Andriy. Approximation of conditional densities by smooth mixtures of regressions. Ann. Statist. 38 (2010), no. 3, 1733--1766. doi:10.1214/09-AOS765.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Geweke, J. and Keane, M. (2007). Smoothly mixing regressions. J. Econometrics 138 252–290.
  • Ghosh, J. and Ramamoorthi, R. (2003). Bayesian Nonparametrics, 1st ed. Springer, New York.
  • Hotz, J. and Miller, R. (1993). Conditional choice probabilities and the estimation of dynamic models. Rev. Econom. Stud. 60 497–530.
  • Jacobs, R. A., Jordan, M. I., Nowlan, S. J. and Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Comput. 3 79–87. Available at
  • Jansen, R. C. (1993). Maximum likelihood in a generalized linear finite mixture model by using the em algorithm. Biometrics 49 227–231.
  • Jiang, W. and Tanner, M. (1999). Hierarchical mixtures-of-experts for exponential family regression models: Approximation and maximum likelihood estimation. Ann. Statist. 27 987–1011.
  • Jones, P. and McLachlan, G. J. (1992). Fitting finite mixture models in a regression context. Aust. N. Z. J. Stat. 34 233–240.
  • Jordan, M. and Xu, L. (1995). Convergence results for the em approach to mixtures of experts architectures. Neural Networks 8 1409–1431.
  • Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6 181–214.
  • Kiefer, N. M. (1978). Discrete parameter variation: Efficient estimation of a switching regression model. Econometrica 46 427–434.
  • Li, J. Q. and Barron, A. R. (1999). Mixture density estimation. In Advances in Neural Information Processing Systems 12 279–285. MIT Press, Cambridge, MA.
  • Maiorov, V. and Meir, R. (1998). Approximation bounds for smooth functions in c(rd) by neural and mixture networks. Neural Networks, IEEE Transactions 9 969–978.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Norets, A. and Pelenis, J. (2009). Bayesian modeling of joint and conditional distributions. Unpublished manuscript, Princeton Univ.
  • Peng, F., Jacobs, R. A. and Tanner, M. A. (1996). Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. J. Amer. Statist. Assoc. 91 953–960.
  • Quandt, R. E. and Ramsey, J. B. (1978). Estimating mixtures of normal distributions and switching regressions. J. Amer. Statist. Assoc. 73 730–738.
  • Roeder, K. and Wasserman, L. (1997). Practical bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902.
  • Rust, J. (1996). Numerical dynamic programming in economics. In Handbook of Computational Economics (H. Amman, D. Kendrick and J. Rust, eds.). North-Holland, Amsterdam. Available at
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–540.
  • Villani, M., Kohn, R. and Giordani, P. (2009). Regression density estimation using smooth adaptive Gaussian mixtures. J. Econometrics 153 155–173.
  • Wedel, M. and DeSarbo, W. (1995). A mixture likelihood approach for generalized linear models. J. Classification 12 21–55.
  • Wood, S., Jiang, W. and Tanner, M. (2002). Bayesian mixture of splines for spatially adaptive nonparametric regression. Biometrika 89 513–528.
  • Zeevi, A., Meir, R. and Maiorov, V. (1998). Error bounds for functional approximation and estimation using mixtures of experts. IEEE Trans. Inform. Theory 44 1010–1025.
  • Zeevi, A. J. and Meir, R. (1997). Density estimation through convex combinations of densities: Approximation and estimation bounds. Neural Networks 10 99–109.