Electronic Journal of Statistics

Converting high-dimensional regression to high-dimensional conditional density estimation

Rafael Izbicki and Ann B. Lee

Full-text: Open access

Abstract

There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 2800-2831.

Dates
Received: July 2016
First available in Project Euclid: 4 July 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1499133755

Digital Object Identifier
doi:10.1214/17-EJS1302

Subjects
Primary: 62G07: Density estimation 62G15: Tolerance and confidence regions
Secondary: 62G08: Nonparametric regression

Keywords
Nonparametric inference conditional density high-dimensional data prediction intervals functional conditional density estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Izbicki, Rafael; B. Lee, Ann. Converting high-dimensional regression to high-dimensional conditional density estimation. Electron. J. Statist. 11 (2017), no. 2, 2800--2831. doi:10.1214/17-EJS1302. https://projecteuclid.org/euclid.ejs/1499133755.


Export citation

References

  • [1] A. Baíllo and A. Grané. Local linear regression for functional predictor and scalar response., Journal of Multivariate Analysis, 100(1):102–111, 2009.
  • [2] K. Bertin and G. Lecué. Selection of variables and dimension reduction in high-dimensional non-parametric regression., Electronic Journal of Statistics, 2 :1224–1241, 2008.
  • [3] K. Bertin, C. Lacour, and V. Rivoirard. Adaptive pointwise estimation of conditional density function. In, Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 52, pages 939–980. Institut Henri Poincaré, 2016.
  • [4] P. J. Bickel and B. Li. Local polynomial regression on unknown manifolds. In, IMS Lecture Notes–Monograph Series, Complex Datasets and Inverse Problems, volume 54, pages 177–186. Institute of Mathematical Statisitcs, 2007.
  • [5] L. Breiman. Random forests., Machine Learning, 45(1):5–32, 2001.
  • [6] B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. F Balcan, and L. Song. Scalable kernel methods via doubly stochastic gradients. In, Advances in Neural Information Processing Systems, pages 3041–3049, 2014.
  • [7] A. Desai, H. Singh, and V. Pudi. Gear: Generic, efficient, accurate kNN-based regression. In, Proc. Int. Conf. KDIR, pages 1–13, 2010.
  • [8] M. Di Marzio, S. Fensore, A. Panzera, and C. C. Taylor. A note on nonparametric estimation of circular conditional densities., Journal of Statistical Computation and Simulation, pages 1–10, 2016.
  • [9] S. Efromovich., Nonparametric Curve Estimation: Methods, Theory and Application. Springer, 1999.
  • [10] S. Efromovich. Dimension reduction and adaptation in conditional density estimation., Journal of the American Statistical Association, 105(490):761–774, 2010.
  • [11] A. E Evrard, J. Bialek, M. Busha, M. White, S. Habib, et al. Virial scaling of massive dark matter halos: why clusters prefer a high normalization cosmology., The Astrophysical Journal, 672(1):122, 2008.
  • [12] J. Fan, Q. Yao, and H. Tong. Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems., Biometrika, 83(1):189–206, 1996.
  • [13] J. Fan, L. Peng, Q. Yao, and W. Zhang. Approximating conditional density functions using dimension reduction., Acta Mathematicae Applicatae Sinica, 25(3):445–456, 2009.
  • [14] Y. Fan, D. J. Nott, and S. A. Sisson. Approximate bayesian computation via regression density estimation., Stat, 2(1):34–48, 2013.
  • [15] A. Fernández-Soto, K. M. Lanzetta, H. W. Chen, B. Levine, and N. Yahata. Error analysis of the photometric redshift technique., Monthly Notices of the Royal Astronomical Society, 330:889–894, 2001.
  • [16] F. Ferraty and P. Vieu., Nonparametric functional data analysis: theory and practice. Springer Science & Business Media, 2006.
  • [17] F. Ferraty, A. Mas, and P. Vieu. Nonparametric regression on functional data: inference and practical aspects., Australian & New Zealand Journal of Statistics, 49(3):267–286, 2007.
  • [18] P. E. Freeman, R. Izbicki, and A. B. Lee. A unified framework for constructing, tuning and assessing photometric redshift density estimates in a selection bias setting., Monthly Notices of the Royal Astronomical Society, 468(4) :4556–4565, 2017.
  • [19] A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In, SIAM Data Mining, pages 203–211, 2003.
  • [20] P. Hall, J. S. Racine, and Q. Li. Cross-validation and the estimation of conditional probability densities., Journal of the American Statistical Association, 99 :1015–1026, 2004.
  • [21] T. Hastie and R. Tibshirani. Varying-coefficient models., Journal of the Royal Statistical Society. Series B (Methodological), pages 757–796, 1993.
  • [22] T. Hastie, R. Tibshirani, and J. H. Friedman., The elements of statistical learning: data mining, inference, and prediction. New York: Springer-Verlag, 2001.
  • [23] T. Hayfield and J. S. Racine. Nonparametric econometrics: The np package., Journal of Statistical Software, 27(5), 2008.
  • [24] N. J. Higham. Computing the nearest correlation matrix – a problem from finance., IMA journal of Numerical Analysis, 22(3):329–343, 2002.
  • [25] M. P. Holmes, A. G. Gray, and C. L. Jr. Isbell. Fast nonparametric conditional density estimation, 2007.
  • [26] R. J. Hyndman, D. M. Bashtannyk, and G. K. Grunwald. Estimating and visualizing conditional densities., Journal of Computational & Graphical Statistics, 5:315–336, 1996.
  • [27] T. Ichimura and D. Fukuda. A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions., Computational Statistics Data Analysis, 54(12) :3404–3410, 2010.
  • [28] R. Izbicki and A. B. Lee. Nonparametric conditional density estimation in a high-dimensional regression setting., Journal of Computational and Graphical Statistics, 25(4) :1297–1316, 2016.
  • [29] R. Izbicki, A. B. Lee, and C. M. Schafer. High-dimensional density ratio estimation with extensions to approximate likelihood computation., Journal of Machine Learning Research (AISTATS Track), pages 420–429, 2014.
  • [30] R. Izbicki, A. B. Lee, and P. E. Freeman. Photo-z estimation: An example of nonparametric density estimation under selection bias for multivariate data., The Annals of Applied Statistics, to appear, 2016.
  • [31] A. Kalda and S. Siddiqui. Nonparametric conditional density estimation of short-term interest rate movements: procedures, results and risk management implications., Applied Financial Economics, 23(8):671–684, 2013.
  • [32] M. C. Kind and R. J. Brunner. Tpz: photometric redshift pdfs and ancillary information by using prediction trees and random forests., Monthly Notices of the Royal Astronomical Society, 432(2) :1483–1501, 2013.
  • [33] S. Kpotufe. k-nn regression adapts to local intrinsic dimension. In, Advances in Neural Information Processing Systems, pages 729–737, 2011.
  • [34] S. Kpotufe and S. Dasgupta. A tree-based regressor that adapts to intrinsic dimension., Journal of Computer and System Sciences, 78(5) :1496–1515, 2012.
  • [35] J. Lafferty and L. Wasserman. Rodeo: sparse, greedy nonparametric regression., The Annals of Statistics, 36(1):28–63, 2008.
  • [36] A. B. Lee and R. Izbicki. A spectral series approach to high-dimensional nonparametric regression., Electronic Journal of Statistics, 10(1):423–463, 2016.
  • [37] S. Mallat., A wavelet tour of signal processing. Academic press, 1999.
  • [38] C. D. Manning, P. Raghavan, and H. Schütze., Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
  • [39] L. Meier, S. Van de Geer, and P. Bühlmann. High-dimensional additive modeling., The Annals of Statistics, 37(6B) :3779–3821, 2009.
  • [40] M. Ntampaka, H. Trac, D. J. Sutherland, N. Battaglia, B. Poczos, and J. Schneider. A machine learning approach for dynamical mass measurements of galaxy clusters., The Astrophysical Journal, 803(2):50, 2015a.
  • [41] M. Ntampaka, H. Trac, D. J. Sutherland, S. Fromenteau, B. Póczos, and J. Schneider. Dynamical mass measurements of contaminated galaxy clusters using machine learning., arXiv preprint arXiv:1509.05409, 2015b.
  • [42] G. Papamakarios and I. Murray. Fast $\epsilon$-free inference of simulation models with bayesian conditional density estimation., arXiv preprint arXiv:1605.06376, 2016.
  • [43] A. Quintela-del Río, F. Ferraty, and P. Vieu. Nonparametric conditional density estimation for functional data. econometric applications. In, Recent Advances in Functional Data Analysis and Related Topics, pages 263–268. Springer, 2011.
  • [44] M. M. Rau, S. Seitz, F. Brimioulle, E. Frank, O. Friedrich, D. Gruen, and B. Hoyle. Accurate photometric redshift probability density estimation — method comparison and application., Monthly Notices of the Royal Astronomical Society, 452(4) :3710–3725, 2015.
  • [45] P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman. Sparse additive models., Journal of the Royal Statistical Society, Series B, 71(5) :1009–1030, 2009.
  • [46] V. C. Raykar. Scalable machine learning for massive datasets: Fast summation algorithms., 2007.
  • [47] E. Rodrigues, R. Assunção, G. L. Pappa, D. Renno, and W. Meira Jr. Exploring multiple evidence to infer users’ location in twitter., Neurocomputing, 2015. URL http://www.sciencedirect.com/science/article/pii/ S092523121500764X.
  • [48] M. Rosenblatt. Conditional probability density and regression estimators. In P. R. Krishnaiah, editor, Multivariate Analysis II. 1969.
  • [49] E. S. Sheldon, C. E. Cunha, R. Mandelbaum, J. Brinkmann, and B. A. Weaver. Photometric redshift probability distributions for galaxies in the SDSS DR8., The Astrophysical Journal Supplement Series, 201(2), 2012.
  • [50] M. Shiga, V. Tangkaratt, and M. Sugiyama. Direct conditional probability density estimation with sparse feature selection., Machine Learning, 100(2–3):161–182, 2015.
  • [51] M. Sugiyama, I. Takeuchi, T. Suzuki, T. Kanamori, H. Hachiya, and D. Okanohara. Conditional density estimation via least-squares density ratio estimation. In, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 781–788, 2010.
  • [52] D. J. Sutherland, L. Xiong, B. Póczos, and J. Schneider. Kernels on sample sets via nonparametric divergence estimates., arXiv preprint arXiv:1202.0302, 2012.
  • [53] I. Takeuchi, K. Nomura, and T. Kanamori. Nonparametric conditional density estimation using piecewise-linear solution path of kernel quantile regression., Neural Computation, 21(2):533–559, 2009.
  • [54] R. Tibshirani. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
  • [55] Q. Wang, S. R. Kulkarni, and S. Verdú. A nearest-neighbor approach to estimating divergence between continuous random vectors. In, 2006 IEEE International Symposium on Information Theory, pages 242–246, 2006.
  • [56] L. Wasserman., All of Nonparametric Statistics. Springer-Verlag New York, Inc., 2006.
  • [57] Y. Yang and S. T. Tokdar. Minimax-optimal nonparametric regression in high dimensions., The Annals of Statistics, 43(2):652–674, 2015.
  • [58] Y. Zhang, J. Duchi, and M. Wainwright. Divide and conquer kernel ridge regression. In, Conference on Learning Theory, pages 592–617, 2013.
  • [59] L. Zhao and Z. Liu. Strong consistency of the kernel estimators of conditional density function., Acta Mathematica Sinica, 1(4):314–318, 1985.