Electronic Journal of Statistics

Divide and conquer local average regression

Xiangyu Chang, Shao-Bo Lin, and Yao Wang

Full-text: Open access

Abstract

The divide and conquer strategy, which breaks a massive data set into a series of manageable data blocks, and combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this paper, we equip the classical local average regression with some divide and conquer strategies to infer the regressive relationship of input-output pairs from a massive data set. When the average mixture, a widely used divide and conquer approach, is adopted, we prove that the optimal learning rate can be achieved under some restrictive conditions on the number of data blocks. We then propose two variants to relax (or remove) these conditions and derive the same optimal learning rates as that for the average mixture local average regression. Our theoretical assertions are verified by a series of experimental studies.

Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 1326-1350.

Dates
Received: March 2016
First available in Project Euclid: 14 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1492135236

Digital Object Identifier
doi:10.1214/17-EJS1265

Mathematical Reviews number (MathSciNet)
MR3635915

Zentralblatt MATH identifier
1362.62085

Subjects
Primary: 62G08: Nonparametric regression

Keywords
Divide and conquer strategy local average regression Nadaraya-Watson estimate $k$ nearest neighbor estimate

Rights
Creative Commons Attribution 4.0 International License.

Citation

Chang, Xiangyu; Lin, Shao-Bo; Wang, Yao. Divide and conquer local average regression. Electron. J. Statist. 11 (2017), no. 1, 1326--1350. doi:10.1214/17-EJS1265. https://projecteuclid.org/euclid.ejs/1492135236


Export citation

References

  • [1] Battey, H., Fan, J., Liu, H., Lu, J., and Zhu, Z. (2015). Distributed estimation and inference with statistical guarantees., https://arxiv.org/abs/1509.05457
  • [2] Biau, G., Cadre, B., Rouviere, L., et al. (2010). Statistical analysis of k-nearest neighbor collaborative recommendation., Ann. Stat. 38 1568–1592.
  • [3] Blanchard, G. and Mücke, N. (2016). Parallelizing spectral algorithms for kernel learning., https://arxiv.org/abs/1610.07487
  • [4] Chang, X., Lin, S.-B., and Zhou, D.-X. (2017). Distributed semi-supervised learning with kernel ridge regression., J. Mach. Learn. Res. To appear.
  • [5] Cheng, G. and Shang, Z. (2015). Computational limits of divide-and-conquer method., https://arxiv.org/abs/1512.09226
  • [6] Cucker, F. and Zhou, D.-X. (2007)., Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge.
  • [7] Dwork, C. and Smith, A. (2009). Differential privacy for statistics: What we know and what we want to learn., J. Priv. Confid. 1 135–154.
  • [8] Fan, J. (2000). Prospects of nonparametric modeling., J. Am. Stat. Assoc. 95 1296–1300.
  • [9] Fan, J. and Gijbels, I. (1994). Censored regression: local linear approximations and their applications., J. Am. Stat. Assoc. 89 560–570.
  • [10] Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., and Cleveland, W. S. (2012). Large complex data: divide and recombine (d&r) with rhipe., Stat. 1 53–67.
  • [11] Guo, Z.-C., Lin, S.-B., and Zhou, D.-X. (2017). Learning theory of distributed spectral algorithms., Inverse Probl. Minor Revision Under Review.
  • [12] Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002)., A Distribution-Free Theory of Nonparametric Regression, Springer, New York.
  • [13] Kato, K. (2012). Weighted Nadaraya–Watson estimation of conditional expected shortfall., J. Financ. Econ. 10 265–291.
  • [14] Li, R., Lin, D. K., and Li, B. (2013). Statistical inference in massive data sets., Appl. Stoch. Models Bus. Ind. 29 399–409.
  • [15] Lin, S.-B., Guo, X., and Zhou, D.-X. (2016). Distributed learning with regularized least squares., J. Mach. Learn. Res.. Revision Under Review.
  • [16] Lin, S.-B. and Zhou, D.-X. (2017). Distributed kernel gradient descent algorithms., Constr. Approx.. To appear.
  • [17] Mcdonald, R., Mohri, M., Silberman, N., Walker, D., and Mann, G. S. (2009). Efficient large-scale distributed training of conditional maximum entropy models. in, Adv. Neural Inf. Process. Syst. 1231–1239.
  • [18] Shi, L., Feng, Y.-L., and Zhou, D.-X. (2011). Concentration estimates for learning with $\ell_1$-regularizer and data dependent hypothesis spaces., Appl. Comput. Harmon. Anal. 31 286–302.
  • [19] Shi, L. (2013). Learning theory estimates for coefficient-based regularized regression., Appl. Comput. Harmon. Anal. 34 252–265.
  • [20] Steinwart, I. and Christmann, A. (2008)., Support Vector Machines, Springer, New York.
  • [21] Stone, C. J. (1977). Consistent nonparametric regression., Ann. Stat. 595–620.
  • [22] Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators., Ann. Stat. 1348–1360.
  • [23] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression., Ann. Stat. 1040–1053.
  • [24] Takeda, H., Farsiu, S., and Milanfar, P. (2007). Kernel regression for image processing and reconstruction., IEEE Trans. Image Process. 16 349–366.
  • [25] Tsybakov, A. B. (2008)., Introduction to Nonparametric Estimation, Springer, New York.
  • [26] Wang, C., Chen, M.-H., Schifano, E., Wu, J., and Yan, J. (2015). Statistical methods and computing for big data., https://arxiv.org/abs/1502.07989
  • [27] Wendland, H. (2004)., Scattered Data Approximation, Cambridge university press.
  • [28] Wu, X., Zhu, X., Wu, G.-Q., and Ding, W. (2014). Data mining with big data., IEEE Trans. Knowl. Data Eng. 26 97–107.
  • [29] Zhang, Y., Duchi, J., and Wainwright, M. (2015). Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates., J. Mach. Learn. Res. 16 3299–3340.
  • [30] Zhang, Y., Wainwright, M., and Duchi, J. (2013). Communication-efficient algorithms for statistical optimization., J. Mach. Learn. Res. 14 3321–3363.
  • [31] Zhou, D.-X. and Jetter, K. (2006). Approximation with polynomial kernels and SVM classifiers., Adv. Comput. Math. 25 323–344.
  • [32] Zhou, Z., Chawla, N., Jin, Y., and Williams, G. (2014). Big data opportunities and challenges: Discussions from data analytics perspectives., IEEE Comput. Intell. Mag. 9 62–74.