## Electronic Journal of Statistics

### Converting high-dimensional regression to high-dimensional conditional density estimation

#### Abstract

There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short in any prediction problem where the distribution of the response is more complex with multi-modality, asymmetry or heteroscedastic noise. Nevertheless, much of the work on high-dimensional inference concerns regression and classification only, whereas research on density estimation has lagged behind. Here we propose FlexCode, a fully nonparametric approach to conditional density estimation that reformulates CDE as a non-parametric orthogonal series problem where the expansion coefficients are estimated by regression. By taking such an approach, one can efficiently estimate conditional densities and not just expectations in high dimensions by drawing upon the success in high-dimensional regression. Depending on the choice of regression procedure, our method can adapt to a variety of challenging high-dimensional settings with different structures in the data (e.g., a large number of irrelevant components and nonlinear manifold structure) as well as different data types (e.g., functional data, mixed data types and sample sets). We study the theoretical and empirical performance of our proposed method, and we compare our approach with traditional conditional density estimators on simulated as well as real-world data, such as photometric galaxy data, Twitter data, and line-of-sight velocities in a galaxy cluster.

#### Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 2800-2831.

Dates
First available in Project Euclid: 4 July 2017

https://projecteuclid.org/euclid.ejs/1499133755

Digital Object Identifier
doi:10.1214/17-EJS1302

Zentralblatt MATH identifier
1366.62078

Subjects
Primary: 62G07: Density estimation 62G15: Tolerance and confidence regions
Secondary: 62G08: Nonparametric regression

#### Citation

Izbicki, Rafael; B. Lee, Ann. Converting high-dimensional regression to high-dimensional conditional density estimation. Electron. J. Statist. 11 (2017), no. 2, 2800--2831. doi:10.1214/17-EJS1302. https://projecteuclid.org/euclid.ejs/1499133755

#### References

• [1] A. Baíllo and A. Grané. Local linear regression for functional predictor and scalar, response.Journal of Multivariate Analysis, 100(1):102–111, 2009.
• [2] K. Bertin and G. Lecué. Selection of variables and dimension reduction in high-dimensional non-parametric, regression.Electronic Journal of Statistics, 2 :1224–1241, 2008.
• [3] K. Bertin, C. Lacour, and V. Rivoirard. Adaptive pointwise estimation of conditional density function., InAnnales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 52, pages 939–980. Institut Henri Poincaré, 2016.
• [4] P. J. Bickel and B. Li. Local polynomial regression on unknown manifolds., InIMS Lecture Notes–Monograph Series, Complex Datasets and Inverse Problems, volume 54, pages 177–186. Institute of Mathematical Statisitcs, 2007.
• [5] L. Breiman. Random, forests.Machine Learning, 45(1):5–32, 2001.
• [6] B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. F Balcan, and L. Song. Scalable kernel methods via doubly stochastic gradients., InAdvances in Neural Information Processing Systems, pages 3041–3049, 2014.
• [7] A. Desai, H. Singh, and V. Pudi. Gear: Generic, efficient, accurate kNN-based regression., InProc. Int. Conf. KDIR, pages 1–13, 2010.
• [8] M. Di Marzio, S. Fensore, A. Panzera, and C. C. Taylor. A note on nonparametric estimation of circular conditional, densities.Journal of Statistical Computation and Simulation, pages 1–10, 2016.
• [9] S., Efromovich.Nonparametric Curve Estimation: Methods, Theory and Application. Springer, 1999.
• [10] S. Efromovich. Dimension reduction and adaptation in conditional density, estimation.Journal of the American Statistical Association, 105(490):761–774, 2010.
• [11] A. E Evrard, J. Bialek, M. Busha, M. White, S. Habib, et al. Virial scaling of massive dark matter halos: why clusters prefer a high normalization, cosmology.The Astrophysical Journal, 672(1):122, 2008.
• [12] J. Fan, Q. Yao, and H. Tong. Estimation of conditional densities and sensitivity measures in nonlinear dynamical, systems.Biometrika, 83(1):189–206, 1996.
• [13] J. Fan, L. Peng, Q. Yao, and W. Zhang. Approximating conditional density functions using dimension, reduction.Acta Mathematicae Applicatae Sinica, 25(3):445–456, 2009.
• [14] Y. Fan, D. J. Nott, and S. A. Sisson. Approximate bayesian computation via regression density, estimation.Stat, 2(1):34–48, 2013.
• [15] A. Fernández-Soto, K. M. Lanzetta, H. W. Chen, B. Levine, and N. Yahata. Error analysis of the photometric redshift, technique.Monthly Notices of the Royal Astronomical Society, 330:889–894, 2001.
• [16] F. Ferraty and P., Vieu.Nonparametric functional data analysis: theory and practice. Springer Science & Business Media, 2006.
• [17] F. Ferraty, A. Mas, and P. Vieu. Nonparametric regression on functional data: inference and practical, aspects.Australian & New Zealand Journal of Statistics, 49(3):267–286, 2007.
• [18] P. E. Freeman, R. Izbicki, and A. B. Lee. A unified framework for constructing, tuning and assessing photometric redshift density estimates in a selection bias, setting.Monthly Notices of the Royal Astronomical Society, 468(4) :4556–4565, 2017.
• [19] A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability., InSIAM Data Mining, pages 203–211, 2003.
• [20] P. Hall, J. S. Racine, and Q. Li. Cross-validation and the estimation of conditional probability, densities.Journal of the American Statistical Association, 99 :1015–1026, 2004.
• [21] T. Hastie and R. Tibshirani. Varying-coefficient, models.Journal of the Royal Statistical Society. Series B (Methodological), pages 757–796, 1993.
• [22] T. Hastie, R. Tibshirani, and J. H., Friedman.The elements of statistical learning: data mining, inference, and prediction. New York: Springer-Verlag, 2001.
• [23] T. Hayfield and J. S. Racine. Nonparametric econometrics: The np, package.Journal of Statistical Software, 27(5), 2008.
• [24] N. J. Higham. Computing the nearest correlation matrix – a problem from, finance.IMA journal of Numerical Analysis, 22(3):329–343, 2002.
• [25] M. P. Holmes, A. G. Gray, and C. L. Jr. Isbell. Fast nonparametric conditional density estimation, 2007.
• [26] R. J. Hyndman, D. M. Bashtannyk, and G. K. Grunwald. Estimating and visualizing conditional, densities.Journal of Computational & Graphical Statistics, 5:315–336, 1996.
• [27] T. Ichimura and D. Fukuda. A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density, functions.Computational Statistics Data Analysis, 54(12) :3404–3410, 2010.
• [28] R. Izbicki and A. B. Lee. Nonparametric conditional density estimation in a high-dimensional regression, setting.Journal of Computational and Graphical Statistics, 25(4) :1297–1316, 2016.
• [29] R. Izbicki, A. B. Lee, and C. M. Schafer. High-dimensional density ratio estimation with extensions to approximate likelihood, computation.Journal of Machine Learning Research (AISTATS Track), pages 420–429, 2014.
• [30] R. Izbicki, A. B. Lee, and P. E. Freeman. Photo-z estimation: An example of nonparametric density estimation under selection bias for multivariate, data.The Annals of Applied Statistics, to appear, 2016.
• [31] A. Kalda and S. Siddiqui. Nonparametric conditional density estimation of short-term interest rate movements: procedures, results and risk management, implications.Applied Financial Economics, 23(8):671–684, 2013.
• [32] M. C. Kind and R. J. Brunner. Tpz: photometric redshift pdfs and ancillary information by using prediction trees and random, forests.Monthly Notices of the Royal Astronomical Society, 432(2) :1483–1501, 2013.
• [33] S. Kpotufe. k-nn regression adapts to local intrinsic dimension., InAdvances in Neural Information Processing Systems, pages 729–737, 2011.
• [34] S. Kpotufe and S. Dasgupta. A tree-based regressor that adapts to intrinsic, dimension.Journal of Computer and System Sciences, 78(5) :1496–1515, 2012.
• [35] J. Lafferty and L. Wasserman. Rodeo: sparse, greedy nonparametric, regression.The Annals of Statistics, 36(1):28–63, 2008.
• [36] A. B. Lee and R. Izbicki. A spectral series approach to high-dimensional nonparametric, regression.Electronic Journal of Statistics, 10(1):423–463, 2016.
• [37] S., Mallat.A wavelet tour of signal processing. Academic press, 1999.
• [38] C. D. Manning, P. Raghavan, and H., Schütze.Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
• [39] L. Meier, S. Van de Geer, and P. Bühlmann. High-dimensional additive, modeling.The Annals of Statistics, 37(6B) :3779–3821, 2009.
• [40] M. Ntampaka, H. Trac, D. J. Sutherland, N. Battaglia, B. Poczos, and J. Schneider. A machine learning approach for dynamical mass measurements of galaxy, clusters.The Astrophysical Journal, 803(2):50, 2015a.
• [41] M. Ntampaka, H. Trac, D. J. Sutherland, S. Fromenteau, B. Póczos, and J. Schneider. Dynamical mass measurements of contaminated galaxy clusters using machine, learning.arXiv preprintarXiv:1509.05409, 2015b.
• [42] G. Papamakarios and I. Murray. Fast $\epsilon$-free inference of simulation models with bayesian conditional density, estimation.arXiv preprintarXiv:1605.06376, 2016.
• [43] A. Quintela-del Río, F. Ferraty, and P. Vieu. Nonparametric conditional density estimation for functional data. econometric applications., InRecent Advances in Functional Data Analysis and Related Topics, pages 263–268. Springer, 2011.
• [44] M. M. Rau, S. Seitz, F. Brimioulle, E. Frank, O. Friedrich, D. Gruen, and B. Hoyle. Accurate photometric redshift probability density estimation — method comparison and, application.Monthly Notices of the Royal Astronomical Society, 452(4) :3710–3725, 2015.
• [45] P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman. Sparse additive, models.Journal of the Royal Statistical Society, Series B, 71(5) :1009–1030, 2009.
• [46] V. C. Raykar. Scalable machine learning for massive datasets: Fast summation algorithms., 2007.
• [47] E. Rodrigues, R. Assunção, G. L. Pappa, D. Renno, and W. Meira Jr. Exploring multiple evidence to infer users’ location in, twitter.Neurocomputing, 2015. URLhttp://www.sciencedirect.com/science/article/pii/S092523121500764X.
• [48] M. Rosenblatt. Conditional probability density and regression estimators. In P. R. Krishnaiah, editor,Multivariate Analysis II. 1969.
• [49] E. S. Sheldon, C. E. Cunha, R. Mandelbaum, J. Brinkmann, and B. A. Weaver. Photometric redshift probability distributions for galaxies in the SDSS, DR8.The Astrophysical Journal Supplement Series, 201(2), 2012.
• [50] M. Shiga, V. Tangkaratt, and M. Sugiyama. Direct conditional probability density estimation with sparse feature, selection.Machine Learning, 100(2–3):161–182, 2015.
• [51] M. Sugiyama, I. Takeuchi, T. Suzuki, T. Kanamori, H. Hachiya, and D. Okanohara. Conditional density estimation via least-squares density ratio estimation., InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 781–788, 2010.
• [52] D. J. Sutherland, L. Xiong, B. Póczos, and J. Schneider. Kernels on sample sets via nonparametric divergence, estimates.arXiv preprintarXiv:1202.0302, 2012.
• [53] I. Takeuchi, K. Nomura, and T. Kanamori. Nonparametric conditional density estimation using piecewise-linear solution path of kernel quantile, regression.Neural Computation, 21(2):533–559, 2009.
• [54] R. Tibshirani. Regression shrinkage and selection via the, lasso.Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
• [55] Q. Wang, S. R. Kulkarni, and S. Verdú. A nearest-neighbor approach to estimating divergence between continuous random vectors., In2006 IEEE International Symposium on Information Theory, pages 242–246, 2006.
• [56] L., Wasserman.All of Nonparametric Statistics. Springer-Verlag New York, Inc., 2006.
• [57] Y. Yang and S. T. Tokdar. Minimax-optimal nonparametric regression in high, dimensions.The Annals of Statistics, 43(2):652–674, 2015.
• [58] Y. Zhang, J. Duchi, and M. Wainwright. Divide and conquer kernel ridge regression., InConference on Learning Theory, pages 592–617, 2013.
• [59] L. Zhao and Z. Liu. Strong consistency of the kernel estimators of conditional density, function.Acta Mathematica Sinica, 1(4):314–318, 1985.