Open Access
2019 Isotonic regression meets LASSO
Matey Neykov
Electron. J. Statist. 13(1): 710-746 (2019). DOI: 10.1214/19-EJS1537
Abstract

This paper studies a two step procedure for monotone increasing additive single index models with Gaussian designs. The proposed procedure is simple, easy to implement with existing software, and consists of consecutively applying LASSO and isotonic regression. Aside from formalizing this procedure, we provide theoretical guarantees regarding its performance: 1) we show that our procedure controls the in-sample squared error; 2) we demonstrate that one can use the procedure for predicting new observations, by showing that the absolute prediction error can be controlled with high-probability. Our bounds show a tradeoff of two rates: the minimax rate for estimating high dimensional quadratic loss, and the minimax nonparametric rate for estimating a monotone increasing function.

References

1.

[1] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order., Probability Theory and Related Fields, 162(3-4):531–586, 2015.[1] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order., Probability Theory and Related Fields, 162(3-4):531–586, 2015.

2.

[2] Pierre Alquier and Gérard Biau. Sparse single-index model., Journal of Machine Learning Research, 14(Jan):243–280, 2013.[2] Pierre Alquier and Gérard Biau. Sparse single-index model., Journal of Machine Learning Research, 14(Jan):243–280, 2013.

3.

[3] Fadoua Balabdaoui, Cécile Durot, and Hanna Jankowski. Least squares estimation in the monotone single index model., arXiv preprint arXiv :1610.06026, 2016. 1360.62152[3] Fadoua Balabdaoui, Cécile Durot, and Hanna Jankowski. Least squares estimation in the monotone single index model., arXiv preprint arXiv :1610.06026, 2016. 1360.62152

4.

[4] Pierre C Bellec. Sharp oracle inequalities for least squares estimators in shape restricted regression., arXiv preprint arXiv :1510.08029, 2015. 1408.62066 10.1214/17-AOS1566 euclid.aos/1522742435[4] Pierre C Bellec. Sharp oracle inequalities for least squares estimators in shape restricted regression., arXiv preprint arXiv :1510.08029, 2015. 1408.62066 10.1214/17-AOS1566 euclid.aos/1522742435

5.

[5] Peter J Bickel, Ya’acov Ritov, and Alexandre B Tsybakov. Simultaneous analysis of lasso and dantzig selector., The Annals of Statistics, pages 1705–1732, 2009. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830[5] Peter J Bickel, Ya’acov Ritov, and Alexandre B Tsybakov. Simultaneous analysis of lasso and dantzig selector., The Annals of Statistics, pages 1705–1732, 2009. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830

6.

[6] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart., Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.[6] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart., Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.

7.

[7] Sabyasachi Chatterjee, Adityanand Guntuboyina, Bodhisattva Sen, et al. On risk bounds in isotonic and other shape restricted regression problems., The Annals of Statistics, 43(4) :1774–1800, 2015. 1317.62032 10.1214/15-AOS1324 euclid.aos/1434546222[7] Sabyasachi Chatterjee, Adityanand Guntuboyina, Bodhisattva Sen, et al. On risk bounds in isotonic and other shape restricted regression problems., The Annals of Statistics, 43(4) :1774–1800, 2015. 1317.62032 10.1214/15-AOS1324 euclid.aos/1434546222

8.

[8] Sourav Chatterjee et al. A new perspective on least squares under convex constraint., The Annals of Statistics, 42(6) :2340–2381, 2014. 1302.62053 10.1214/14-AOS1254 euclid.aos/1413810730[8] Sourav Chatterjee et al. A new perspective on least squares under convex constraint., The Annals of Statistics, 42(6) :2340–2381, 2014. 1302.62053 10.1214/14-AOS1254 euclid.aos/1413810730

9.

[9] Yining Chen and Richard J Samworth. Generalized additive and index models with shape constraints., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(4):729–754, 2016. 07065234 10.1111/rssb.12137[9] Yining Chen and Richard J Samworth. Generalized additive and index models with shape constraints., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(4):729–754, 2016. 07065234 10.1111/rssb.12137

10.

[10] R Dennis Cook and Liqiang Ni. Sufficient dimension reduction via inverse regression., Journal of the American Statistical Association, 100(470), 2005. 1117.62312 10.1198/016214504000001501[10] R Dennis Cook and Liqiang Ni. Sufficient dimension reduction via inverse regression., Journal of the American Statistical Association, 100(470), 2005. 1117.62312 10.1198/016214504000001501

11.

[11] Anirban DasGupta. Finite sample theory of order statistics and extremes. In, Probability for Statistics and Machine Learning, pages 221–248. Springer, 2011.[11] Anirban DasGupta. Finite sample theory of order statistics and extremes. In, Probability for Statistics and Machine Learning, pages 221–248. Springer, 2011.

12.

[12] Cécile Durot. Sharp asymptotics for isotonic regression., Probability theory and related fields, 122(2):222–240, 2002.[12] Cécile Durot. Sharp asymptotics for isotonic regression., Probability theory and related fields, 122(2):222–240, 2002.

13.

[13] Cécile Durot et al. On the-error of monotonicity constrained estimators., The Annals of Statistics, 35(3) :1080–1104, 2007.[13] Cécile Durot et al. On the-error of monotonicity constrained estimators., The Annals of Statistics, 35(3) :1080–1104, 2007.

14.

[14] Jared C Foster, Jeremy MG Taylor, and Bin Nan. Variable selection in monotone single-index models via the adaptive lasso., Statistics in medicine, 32(22) :3944–3954, 2013.[14] Jared C Foster, Jeremy MG Taylor, and Bin Nan. Variable selection in monotone single-index models via the adaptive lasso., Statistics in medicine, 32(22) :3944–3954, 2013.

15.

[15] Janos Galambos. Extreme value theory for applications. In, Extreme Value Theory and Applications, pages 1–14. Springer, 1994.[15] Janos Galambos. Extreme value theory for applications. In, Extreme Value Theory and Applications, pages 1–14. Springer, 1994.

16.

[16] Larry Goldstein, Stanislav Minsker, and Xiaohan Wei. Structured signal recovery from non-linear and heavy-tailed measurements., arXiv preprint arXiv :1609.01025, 2016. 1401.94035 10.1109/TIT.2018.2842216[16] Larry Goldstein, Stanislav Minsker, and Xiaohan Wei. Structured signal recovery from non-linear and heavy-tailed measurements., arXiv preprint arXiv :1609.01025, 2016. 1401.94035 10.1109/TIT.2018.2842216

17.

[17] Fang Han, Hongkai Ji, Zhicheng Ji, Honglang Wang, et al. A provable smoothing approach for high dimensional generalized regression with applications in genomics., Electronic Journal of Statistics, 11(2) :4347–4403, 2017. 06816619 10.1214/17-EJS1352[17] Fang Han, Hongkai Ji, Zhicheng Ji, Honglang Wang, et al. A provable smoothing approach for high dimensional generalized regression with applications in genomics., Electronic Journal of Statistics, 11(2) :4347–4403, 2017. 06816619 10.1214/17-EJS1352

18.

[18] Joel L Horowitz. A smoothed maximum score estimator for the binary response model., Econometrica: journal of the Econometric Society, pages 505–531, 1992. 0761.62166 10.2307/2951582[18] Joel L Horowitz. A smoothed maximum score estimator for the binary response model., Econometrica: journal of the Econometric Society, pages 505–531, 1992. 0761.62166 10.2307/2951582

19.

[19] Joel L Horowitz. Optimal rates of convergence of parameter estimators in the binary response model with weak distributional assumptions., Econometric Theory, 9(1):1–18, 1993.[19] Joel L Horowitz. Optimal rates of convergence of parameter estimators in the binary response model with weak distributional assumptions., Econometric Theory, 9(1):1–18, 1993.

20.

[20] Joel L Horowitz., Semiparametric and nonparametric methods in econometrics. Springer, 2009. 1278.62005[20] Joel L Horowitz., Semiparametric and nonparametric methods in econometrics. Springer, 2009. 1278.62005

21.

[21] Peter J Huber. Robust statistics. In, International Encyclopedia of Statistical Science, pages 1248–1251. Springer, 2011. MR1491392[21] Peter J Huber. Robust statistics. In, International Encyclopedia of Statistical Science, pages 1248–1251. Springer, 2011. MR1491392

22.

[22] Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In, Advances in Neural Information Processing Systems, pages 927–935, 2011.[22] Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In, Advances in Neural Information Processing Systems, pages 927–935, 2011.

23.

[23] Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: High-dimensional isotonic regression. In, COLT, 2009.[23] Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: High-dimensional isotonic regression. In, COLT, 2009.

24.

[24] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection., Annals of Statistics, 28(5) :1302–1338, 2000. 1105.62328 10.1214/aos/1015957395 euclid.aos/1015957395[24] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection., Annals of Statistics, 28(5) :1302–1338, 2000. 1105.62328 10.1214/aos/1015957395 euclid.aos/1015957395

25.

[25] Ker-Chau Li. Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86(414):316–327, 1991. 0742.62044 10.1080/01621459.1991.10475035[25] Ker-Chau Li. Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86(414):316–327, 1991. 0742.62044 10.1080/01621459.1991.10475035

26.

[26] Ker-Chau Li and Naihua Duan. Regression analysis under link violation., The Annals of Statistics, pages 1009–1052, 1989. 0753.62041 10.1214/aos/1176347254 euclid.aos/1176347254[26] Ker-Chau Li and Naihua Duan. Regression analysis under link violation., The Annals of Statistics, pages 1009–1052, 1989. 0753.62041 10.1214/aos/1176347254 euclid.aos/1176347254

27.

[27] Charles F Manski. Maximum score estimation of the stochastic utility model of choice., Journal of econometrics, 3(3):205–228, 1975. 0307.62068 10.1016/0304-4076(75)90032-9[27] Charles F Manski. Maximum score estimation of the stochastic utility model of choice., Journal of econometrics, 3(3):205–228, 1975. 0307.62068 10.1016/0304-4076(75)90032-9

28.

[28] P. McCullagh and J.A. Nelder., Generalized linear models. Chapman & Hall/CRC, 1989. 0744.62098[28] P. McCullagh and J.A. Nelder., Generalized linear models. Chapman & Hall/CRC, 1989. 0744.62098

29.

[29] Prasad A Naik and Chih-Ling Tsai. Isotonic single-index model for high-dimensional database marketing., Computational statistics & data analysis, 47(4):775–790, 2004. 05374082 10.1016/j.csda.2003.11.023[29] Prasad A Naik and Chih-Ling Tsai. Isotonic single-index model for high-dimensional database marketing., Computational statistics & data analysis, 47(4):775–790, 2004. 05374082 10.1016/j.csda.2003.11.023

30.

[30] Matey Neykov, Jun S Liu, and Tianxi Cai. L1-regularized least squares for support recovery of high dimensional single index models with gaussian designs., Journal of Machine Learning Research, 17(87):1–37, 2016. 06589501[30] Matey Neykov, Jun S Liu, and Tianxi Cai. L1-regularized least squares for support recovery of high dimensional single index models with gaussian designs., Journal of Machine Learning Research, 17(87):1–37, 2016. 06589501

31.

[31] Heng Peng and Tao Huang. Penalized least squares for single index models., Journal of Statistical Planning and Inference, 141(4) :1362–1379, 2011. 1204.62070 10.1016/j.jspi.2010.10.003[31] Heng Peng and Tao Huang. Penalized least squares for single index models., Journal of Statistical Planning and Inference, 141(4) :1362–1379, 2011. 1204.62070 10.1016/j.jspi.2010.10.003

32.

[32] Yaniv Plan and Roman Vershynin. The generalized lasso with non-linear observations., IEEE Transactions on information theory, 62(3) :1528–1537, 2016. MR3472264 1359.94153 10.1109/TIT.2016.2517008[32] Yaniv Plan and Roman Vershynin. The generalized lasso with non-linear observations., IEEE Transactions on information theory, 62(3) :1528–1537, 2016. MR3472264 1359.94153 10.1109/TIT.2016.2517008

33.

[33] Peter Radchenko. High dimensional single index models., Journal of Multivariate Analysis, 139:266–282, 2015. 1328.62482 10.1016/j.jmva.2015.02.007[33] Peter Radchenko. High dimensional single index models., Journal of Multivariate Analysis, 139:266–282, 2015. 1328.62482 10.1016/j.jmva.2015.02.007

34.

[34] Philippe Rigollet, Alexandre Tsybakov, et al. Exponential screening and optimal rates of sparse estimation., The Annals of Statistics, 39(2):731–771, 2011. 1215.62043 10.1214/10-AOS854 euclid.aos/1299680953[34] Philippe Rigollet, Alexandre Tsybakov, et al. Exponential screening and optimal rates of sparse estimation., The Annals of Statistics, 39(2):731–771, 2011. 1215.62043 10.1214/10-AOS854 euclid.aos/1299680953

35.

[35] Mark Rudelson, Roman Vershynin, et al. Hanson-wright inequality and sub-gaussian concentration., Electronic Communications in Probability, 18, 2013. 1329.60056 10.1214/ECP.v18-2865[35] Mark Rudelson, Roman Vershynin, et al. Hanson-wright inequality and sub-gaussian concentration., Electronic Communications in Probability, 18, 2013. 1329.60056 10.1214/ECP.v18-2865

36.

[36] David Ruppert, Matt P Wand, and Raymond J Carroll., Semiparametric regression. Number 12. Cambridge university press, 2003.[36] David Ruppert, Matt P Wand, and Raymond J Carroll., Semiparametric regression. Number 12. Cambridge university press, 2003.

37.

[37] Robert P Sherman. Maximum score methods. In, Microeconometrics, pages 122–128. Springer, 2010.[37] Robert P Sherman. Maximum score methods. In, Microeconometrics, pages 122–128. Springer, 2010.

38.

[38] Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Lasso with non-linear measurements is equivalent to one with linear measurements. In, Advances in Neural Information Processing Systems, pages 3420–3428, 2015.[38] Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassibi. Lasso with non-linear measurements is equivalent to one with linear measurements. In, Advances in Neural Information Processing Systems, pages 3420–3428, 2015.

39.

[39] Robert Tibshirani. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x[39] Robert Tibshirani. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x

40.

[40] Alexandre B Tsybakov., Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009. 1176.62032[40] Alexandre B Tsybakov., Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009. 1176.62032

41.

[41] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices., arXiv preprint arXiv :1011.3027, 2010.[41] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices., arXiv preprint arXiv :1011.3027, 2010.

42.

[42] Yingcun Xia and WK Li. On single-index coefficient regression models., Journal of the American Statistical Association, 94(448) :1275–1285, 1999. 1069.62548 10.1080/01621459.1999.10473880[42] Yingcun Xia and WK Li. On single-index coefficient regression models., Journal of the American Statistical Association, 94(448) :1275–1285, 1999. 1069.62548 10.1080/01621459.1999.10473880

43.

[43] Zhuoran Yang, Krishnakumar Balasubramanian, and Han Liu. High-dimensional non-gaussian single index models via thresholded score function estimation. In, International Conference on Machine Learning, pages 3851–3860, 2017.[43] Zhuoran Yang, Krishnakumar Balasubramanian, and Han Liu. High-dimensional non-gaussian single index models via thresholded score function estimation. In, International Conference on Machine Learning, pages 3851–3860, 2017.

44.

[44] Zhuoran Yang, Zhaoran Wang, Han Liu, Yonina C. Eldar, and Tong Zhang. Sparse nonlinear regression: Parameter estimation and asymptotic inference., arXiv; 1511:04514, 2015.[44] Zhuoran Yang, Zhaoran Wang, Han Liu, Yonina C. Eldar, and Tong Zhang. Sparse nonlinear regression: Parameter estimation and asymptotic inference., arXiv; 1511:04514, 2015.

45.

[45] Cun-Hui Zhang et al. Risk bounds in isotonic regression., The Annals of Statistics, 30(2):528–555, 2002. 1012.62045 10.1214/aos/1021379864 euclid.aos/1021379864[45] Cun-Hui Zhang et al. Risk bounds in isotonic regression., The Annals of Statistics, 30(2):528–555, 2002. 1012.62045 10.1214/aos/1021379864 euclid.aos/1021379864
Matey Neykov "Isotonic regression meets LASSO," Electronic Journal of Statistics 13(1), 710-746, (2019). https://doi.org/10.1214/19-EJS1537
Received: 1 February 2018; Published: 2019
Vol.13 • No. 1 • 2019
Back to Top