## Electronic Journal of Statistics

### A recursive procedure for density estimation on the binary hypercube

#### Abstract

This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For $d$ covariates, there are $2^{d}$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error for moderate sample sizes, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity.

#### Article information

Source
Electron. J. Statist. Volume 7 (2013), 820-858.

Dates
First available in Project Euclid: 25 March 2013

https://projecteuclid.org/euclid.ejs/1364220672

Digital Object Identifier
doi:10.1214/13-EJS787

Mathematical Reviews number (MathSciNet)
MR3040561

Zentralblatt MATH identifier
1337.62070

Subjects
Primary: 62G07: Density estimation
Secondary: 62G20: Asymptotic properties 62C20: Minimax procedures

#### Citation

Raginsky, Maxim; Silva, Jorge G.; Lazebnik, Svetlana; Willett, Rebecca. A recursive procedure for density estimation on the binary hypercube. Electron. J. Statist. 7 (2013), 820--858. doi:10.1214/13-EJS787. https://projecteuclid.org/euclid.ejs/1364220672

#### References

• [1] J. Aitchison and C. G. G. Aitken. Multivariate binary discrimination by the kernel method., Biometrika, 63(3):413–420, 1976.
• [2] R. R. Bahadur. A representation of the joint distribution of $n$ dichotomous items. In H. Solomon, editor, Studies in Item Analysis and Prediction, pages 169–176. Stanford Univ. Press, 1961.
• [3] J. Bergh and J. Löfström., Interpolation Spaces: An Introduction. Springer-Verlag, 1976.
• [4] E. Candès. Modern statistical estimation via oracle inequalities., Acta Numerica, 15:257–325, 2006.
• [5] E. J. Candès and T. Tao. Near-optimal signal recovery from random projections: universal encoding strategies?, IEEE Trans. Inform. Theory, 52(12) :5406–5425, December 2006.
• [6] J. M. Carro. Estimating dynamic panel data discrete choice models with fixed effects., J. Econometrics, 140:503–528, 2007.
• [7] X. R. Chen, P. R. Krishnaiah, and W. W. Liang. Estimation of multivariate binary density using orthogonal functions., J. Multivariate Anal., 31:178–186, 1989.
• [8] T. M. Cover and J. A. Thomas., Elements of Information Theory. Wiley, New York, 2nd edition, 2006.
• [9] I. Dinur, E. Friedgut, G. Kindler, and R. O’Donnell. On the Fourier tails of bounded functions over the discrete cube., Israel J. Math., 160(389-412), 2007.
• [10] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Density estimation by wavelet thresholding., Ann. Statist., 24(2):508–539, 1996.
• [11] S. Efromovich., Nonparametric Curve Estimation. Springer, 1999.
• [12] M. J. García-Zattera, A. Jara, E. Lesaffre, and D. Declerck. Conditional independence of multivariate binary data with an application in caries research., Computational Statistics and Data Analysis, 51 :3223–3234, 2007.
• [13] Z. Ghahramani and K. Heller. Bayesian sets. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 435–442. MIT Press, Cambridge, MA, 2006.
• [14] A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Near-optimal sparse Fourier representations via sampling. In, Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, 2002.
• [15] A. C. Gilbert, S. Muthukrishnan, and M. J. Strauss. Improved time bounds for near-optimal sparse Fourier representation via sampling. In, Proc. SPIE Wavelets XI, San Diego, CA, 2005.
• [16] A. C. Gilbert and M. J. Strauss. Group testing in statistical signal recovery. Preprint, 2006.
• [17] O. Goldreich and L. Levin. A hard-core predicate for all one-way functions. In, Proc. 21st ACM Symp. on Theory of Computing, pages 25–32, 1989.
• [18] M. Gyllenberg and T. Koski. Probabilistic models for bacterial taxonomy., International Statistical Review, 69(2):249–276, August 2001.
• [19] P. Hall, G. Kerkyacharian, and D. Picard. Block threshold rules for curve estimation using kernel and wavelet methods., Ann. Statist., 26(3):922–942, 1998.
• [20] P. Hall, S. Penev, G. Kerkyacharian, and D. Picard. Numerical performance of block thresholded wavelet estimators., Statistics and Computing, 7:115–124, 1997.
• [21] I. M. Johnstone. Minimax Bayes, asymptotic minimax and sparse wavelet priors. In S. S. Gupta and J. O. Berger, editors, Statistical Decision Theory and Related Topics V, pages 303–326. Springer, 1994.
• [22] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum., SIAM J. Comput., 22(6) :1331–1348, 1993.
• [23] S. L. Lauritzen., Graphical Models. Clarendon Press, Oxford, 1996.
• [24] W.-Q. Liang and P. R. Krishnaiah. Nonparametric iterative estimation of multivariate binary density., J. Multivariate Anal., 16:162–172, 1985.
• [25] Y. Mansour. Learning Boolean functions via the Fourier transform. In V. P. Roychodhury, K.-Y. Siu, and A. Orlitsky, editors, Theoretical Advances in Neural Computation and Learning, pages 391–424. Kluwer, 1994.
• [26] P. Massart., Concentration Inequalities and Model Selection. Springer, 2007.
• [27] S. Mendelson. A few notes on statistical learning theory. In S. Mendelson and A. J. Smola, editors, Advanced Lectures in Machine Learning, volume 2600 of Lecture Notes in Computer Science. Springer, 2003.
• [28] J. Ott and R. A. Kronmal. Some classification procedures for multivariate binary data using orthogonal functions., J. Amer. Stat. Assoc., 71(354):391–399, June 1976.
• [29] P. Reynaud-Bouret. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities., Probab. Th. Rel. Fields, 126:103–153, 2003.
• [30] H. P. Rosenthal. On the span in $l_p$ of sequences of independent random variables., Israel J. Math., 8:273–303, 1972.
• [31] I. Shmulevich and W. Zhang. Binary analysis and optimization-based normalization of gene expression data., Bioinformatics, 18(4):555–565, 2002.
• [32] J. Silva and R. Willett. Hypergraph-based detection of anomalous high-dimensional co-occurrences., IEEE Trans. Pattern Anal. Mach. Intel., 31(3):563–569, 2009.
• [33] J. S. Simonoff. Smoothing categorical data., J. Statist. Planning and Inference, 47:41–60, 1995.
• [34] M. Talagrand. On Russo’s approximate zero-one law., Ann. Probab., 22(3) :1576–1587, 1994.
• [35] M. Talagrand. Sharper bounds for Gaussian and empirical processes., Ann. Probab., 22:28–76, 1994.
• [36] T. Tao and V. H. Vu., Additive Combinatorics. Cambridge Univ. Press, 2006.
• [37] A. W. van der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes. Springer, 1996.
• [38] J. D. Wilbur, J. K. Ghosh, C. H. Nakatsu, S. M. Brouder, and R. W. Doerge. Variable selection in high-dimensional multivariate binary data with applications to the analysis of microbial community DNA fingerprints., Biometrics, 58:378–386, June 2002.
• [39] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Technical Report 28, Department of Statistics, Iowa State University, 1997.
• [40] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence., Ann. Statist., 27(5) :1564–1599, 1999.