Statistical Science

Shape Constrained Density Estimation Via Penalized Rényi Divergence

Roger Koenker and Ivan Mizera

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Shape constraints play an increasingly prominent role in nonparametric function estimation. While considerable recent attention has been focused on log concavity as a regularizing device in nonparametric density estimation, weaker forms of concavity constraints encompassing larger classes of densities have received less attention but offer some additional flexibility. Heavier tail behavior and sharper modal peaks are better adapted to such weaker concavity constraints. When paired with appropriate maximal entropy estimation criteria, these weaker constraints yield tractable, convex optimization problems that broaden the scope of shape constrained density estimation in a variety of applied subject areas.

In contrast to our prior work, Koenker and Mizera [Ann. Statist. 38 (2010) 2998–3027], that focused on the log concave ($\alpha=1$) and Hellinger ($\alpha=1/2$) constraints, here we describe methods enabling imposition of even weaker, $\alpha\leq0$ constraints. An alternative formulation of the concavity constraints for densities in dimension $d\geq2$ also significantly expands the applicability of our proposed methods for multivariate data. Finally, we illustrate the use of the Rényi divergence criterion for norm-constrained estimation of densities in the absence of a shape constraint.

Article information

Statist. Sci., Volume 33, Number 4 (2018), 510-526.

First available in Project Euclid: 29 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Density estimation shape constraints Rényi entropy convex optimization


Koenker, Roger; Mizera, Ivan. Shape Constrained Density Estimation Via Penalized Rényi Divergence. Statist. Sci. 33 (2018), no. 4, 510--526. doi:10.1214/18-STS658.

Export citation


  • Afriat, S. N. (1967). The construction of utility functions from expenditure data. Internat. Econom. Rev. 8 67–77.
  • Afriat, S. N. (1972). Efficiency estimation of production functions. Internat. Econom. Rev. 13 568–598.
  • Andersen, E. D. (2010). The Mosek Optimization Tools Manual, Version 6.0. Available at
  • Avriel, M. (1972). $r$-convex functions. Math. Program. 2 309–323.
  • Basu, A., Harris, I. R., Hjort, N. L. and Jones, M. C. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika 85 549–559.
  • Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
  • Birgé, L. (1997). Estimation of unimodal densities without smoothness assumptions. Ann. Statist. 25 970–981.
  • Broniatowski, M. and Keziou, A. (2006). Minimization of $\phi$-divergences on sets of signed measures. Studia Sci. Math. Hungar. 43 403–442.
  • Broniatowski, M. and Keziou, A. (2009). Parametric estimation and tests through divergences and the duality technique. J. Multivariate Anal. 100 16–36.
  • Broniatowski, M. and Vajda, I. (2012). Several applications of divergence criteria in continuous families. Kybernetika (Prague) 48 600–636.
  • Burg, J. (1967). Maximum entropy spectral analysis. In Proceedings of 37th Annual Meeting of the Society of Exploration Geophysicists. SEG, Oklahoma City, OK.
  • Cichocki, A. and Amari, S. (2010). Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 12 1532–1568.
  • Cox, D. R. (1966). Notes on the analysis of mixed frequency distributions. Br. J. Math. Stat. Psychol. 19 39–47.
  • Cule, M., Samworth, R. and Stewart, M. (2010). Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 545–607.
  • Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29 1–65.
  • Doss, C. R. and Wellner, J. A. (2016). Global rates of convergence of the MLEs of log-concave and $s$-concave densities. Ann. Statist. 44 954–981.
  • Dümbgen, L. and Rufibach, K. (2009). Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli 15 40–68.
  • Eggermont, P. P. B. and LaRiccia, V. N. (2001). Maximum Penalized Likelihood Estimation. Vol. I: Density Estimation. Springer, New York.
  • Friberg, H. A. (2012). Users Guide to the R-to-Mosek Interface. Available at
  • Ghosh, A. (2015). Influence function analysis of the restricted minimum divergence estimators: A general form. Electron. J. Stat. 9 1017–1040.
  • Good, I. J. (1971). A nonparametric roughness penalty for probability densities. Nature 229 29–30.
  • Grenander, U. (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr. 39 125–153.
  • Groeneboom, P., Jongbloed, G. and Wellner, J. A. (2001). Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 29 1653–1698.
  • Guvenen, F., Karahan, F., Ozkan, S. and Song, J. (2016). What do data on millions of U.S. Workers reveal about life-cycle earnings dynamics? Federal Reserve Bank of New York Staff Reports.
  • Han, Q. and Wellner, J. A. (2016). Approximation and estimation of $s$-concave densities via Rényi divergences. Ann. Statist. 44 1332–1359.
  • Hardy, G. H., Littlewood, J. E. and Pólya, G. (1934). Inequalities. Cambridge Univ. Press, London.
  • Hartigan, J. A. and Hartigan, P. M. (1985). The dip test of unimodality. Ann. Statist. 13 70–84.
  • Havrda, J. and Charvát, F. (1967). Quantification method of classification processes. Concept of structural $a$-entropy. Kybernetika (Prague) 3 30–35.
  • Hjort, N. L. and Pollard, D. (2011). Asymptotics for minimisers of convex processes. Preprint. Available at arXiv:1107.3806.
  • Hoffleit, D. and Warren, W. H. (1991). The Bright Star Catalog, 5th ed. Yale Univ. Observatory, New Haven.
  • Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, CA, 1965/66), Vol. I: Statistics 221–233. Univ. California Press, Berkeley, CA.
  • Kim, A. K. H., Guntuboyina, A. and Samworth, R. J. (2016). Adaptation in log-concave density estimation. Preprint. Available at
  • Kim, A. K. H. and Samworth, R. J. (2016). Global rates of convergence in log-concave density estimation. Ann. Statist. 44 2756–2779.
  • Koenker, R. and Mizera, I. (2006). The alter egos of the regularized maximum likelihood density estimators: Deregularized maximum-entropy, Shannon, Rényi, Simpson, Gini, and stretched strings. In Prague Stochastics 2006, Proceedings of the Joint Session of 7th Prague Symposium on Asymptotic Statistics and 15th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes (M. Hušková and M. Janžura, eds.) 145–157. Matfyzpress, Prague.
  • Koenker, R. and Mizera, I. (2007). Density estimation by total variation regularization, Essays in honor of Kjell A. Doksum. In Advances in Statistical Modeling and Inference. Ser. Biostat. 3 613–633. World Sci. Publ., Hackensack, NJ.
  • Koenker, R. and Mizera, I. (2008). Primal and dual formulations relevant for the numerical estimation of a probability density via regularization. In Tatra Mountains Mathematical Publications (A. Pázman, J. Volaufová and V. Witkovský, eds.). Proceedings of the Conference ProbaStat ’06 Held in Smolenice, Slovakia, June 59, 2006, 39 255–264. Slovak Academy of Sciences.
  • Koenker, R. and Mizera, I. (2010). Quasi-concave density estimation. Ann. Statist. 38 2998–3027.
  • Koenker, R. and Mizera, I. (2017). “MeddeR: Maximum Entropy Deregularized Density Estimation in R.” R package version 0.51. Available at
  • Kooperberg, C. and Stone, C. J. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327–347.
  • Laha, N. and Wellner, J. (2017). Bi-$s^{*}$-concave distributions. Available at arXiv:1705.00252.
  • Liese, F. and Vajda, I. (2006). On divergences and informations in statistics and information theory. IEEE Trans. Inform. Theory 52 4394–4412.
  • MacDonell, W. (1902). On criminal anthropometry and the identification of criminals. Biometrika 1 177–227.
  • Pal, J. K., Woodroofe, M. and Meyer, M. (2007). Estimating a Polya frequency function. In Complex Datasets and Inverse Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 54 239–249. IMS, Beachwood, OH.
  • Perez, A. (1967). Information-theoretic risk estimates in statistical decision. Kybernetika (Prague) 3 1–21.
  • Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econometric Theory 7 186–199.
  • Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density. Sankhyā Ser. A 31 23–36.
  • R Core Team (2017). R: A Language and Environment for Statistical Computing. Available at
  • Rockafellar, R. T. (1970). Convex Analysis. Princeton Mathematical Series 28. Princeton Univ. Press, Princeton, NJ.
  • Seijo, E. and Sen, B. (2011). Nonparametric least squares estimation of a multivariate convex regression function. Ann. Statist. 39 1633–1657.
  • Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. J. Roy. Statist. Soc. Ser. B 43 97–99.
  • Silverman, B. W. (1982). On the estimation of a probability density function by the maximum penalized likelihood method. Ann. Statist. 10 795–810.
  • “Student” (1908). The probable error of the mean. Biometrika 6 1–23.
  • Tsallis, C. (1988). Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52 479–487.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Stat. 20 595–601.
  • Walther, G. (2002). Detecting the presence of mixing with multiscale maximum likelihood. J. Amer. Statist. Assoc. 97 508–513.
  • Walther, G. (2009). Inference and modeling with log-concave distributions. Statist. Sci. 24 319–327.