Electronic Journal of Statistics

Quantile universal threshold

Caroline Giacobino, Sylvain Sardy, Jairo Diaz-Rodriguez, and Nick Hengartner

Full-text: Open access

Abstract

Efficient recovery of a low-dimensional structure from high-dimensional data has been pursued in various settings including wavelet denoising, generalized linear models and low-rank matrix estimation. By thresholding some parameters to zero, estimators such as lasso, elastic net and subset selection perform variable selection. One crucial step challenges all these estimators: the amount of thresholding governed by a threshold parameter $\lambda $. If too large, important features are missing; if too small, incorrect features are included. Within a unified framework, we propose a selection of $\lambda $ at the detection edge. To that aim, we introduce the concept of a zero-thresholding function and a null-thresholding statistic, that we explicitly derive for a large class of estimators. The new approach has the great advantage of transforming the selection of $\lambda $ from an unknown scale to a probabilistic scale. Numerical results show the effectiveness of our approach in terms of model selection and prediction.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 4701-4722.

Dates
Received: March 2017
First available in Project Euclid: 24 November 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1511492459

Digital Object Identifier
doi:10.1214/17-EJS1366

Zentralblatt MATH identifier
06816630

Keywords
Convex optimization high-dimensionality sparsity regularization thresholding

Rights
Creative Commons Attribution 4.0 International License.

Citation

Giacobino, Caroline; Sardy, Sylvain; Diaz-Rodriguez, Jairo; Hengartner, Nick. Quantile universal threshold. Electron. J. Statist. 11 (2017), no. 2, 4701--4722. doi:10.1214/17-EJS1366. https://projecteuclid.org/euclid.ejs/1511492459


Export citation

References

  • [1] H. Akaike. Information theory and an extension of the maximum likelihood principle., InSelected Papers of Hirotugu Akaike, pages 199–213. Springer, 1998.
  • [2] A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse, models.Bernoulli, 19(2):521–547, 2013.
  • [3] A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic, programming.Biometrika, 98(4):791–806, 2011.
  • [4] L. Breiman, J. Friedman, R. Olshen, and C., Stone.Classification and Regression Trees. Wadsworth and Brooks/Cole Advanced Books & Software, Monterey, CA, 1984.
  • [5] P. Bühlmann and S. van de, Geer.Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg, 2011.
  • [6] P. Bühlmann, M. Kalisch, and L. Meier. High-dimensional statistics with a view toward applications in, biology.Annual Review of Statistics and Its Application, 1:255–278, 2014.
  • [7] F. Bunea, J. Lederer, and Y. She. The group square-root lasso: theoretical properties and fast, algorithms.IEEE Transactions on Information Theory, 60(2) :1313–1325, 2014.
  • [8] J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix, completion.SIAM Journal on Optimization, 20(4) :1956–1982, 2010.
  • [9] E. Candès and J. Romberg. Sparsity and incoherence in compressive, sampling.Inverse Problems, 23(3):969–985, 2007.
  • [10] E. Candès and T. Tao. The Dantzig selector: statistical estimation when $p$ is much larger than, $n$.The Annals of Statistics, 35(6) :2313–2351, 2007.
  • [11] E. J. Candès, C. A. Sing-Long, and J. D. Trzasko. Unbiased risk estimates for singular value thresholding and spectral, estimators.IEEE Transactions on Signal Processing, 61(19) :4643–4657, 2013.
  • [12] J. Chen and Z. Chen. Extended Bayesian information criteria for model selection with large model, spaces.Biometrika, 95(3):759–771, 2008.
  • [13] D. L. Donoho. Nonlinear solution of linear inverse problems by wavelet-vaguelette, decomposition.Applied and Computational Harmonic Analysis, 2(2):101–126, 1995.
  • [14] D. L. Donoho. Compressed, sensing.IEEE Transactions on Information Theory, 52(4) :1289–1306, 2006.
  • [15] D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation by wavelet, shrinkage.Biometrika, 81(3):425–455, 1994.
  • [16] D. L. Donoho and J. Tanner. Precise undersampling, theorems.Proceedings of the IEEE, 98(6):913–924, 2010.
  • [17] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage:, asymptopia?Journal of the Royal Statistical Society: Series B, 57(2):301–369, 1995.
  • [18] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard. Density estimation by wavelet, thresholding.The Annals of Statistics, 24(2):508–539, 1996.
  • [19] J. Fan and H. Peng. Nonconcave penalized likelihood with a diverging number of, parameters.The Annals of Statistics, 32(3):928–961, 2004.
  • [20] J. Fan, S. Guo, and N. Hao. Variance estimation using refitted cross-validation in ultrahigh dimensional, regression.Journal of the Royal Statistical Society: Series B, 74(1):37–65, 2012.
  • [21] Y. Fan and C. Y. Tang. Tuning parameter selection in high dimensional penalized, likelihood.Journal of the Royal Statistical Society: Series B, 75(3):531–552, 2013.
  • [22] W. J. Fu. Penalized regressions: the bridge versus the, lasso.Journal of Computational and Graphical Statistics, 7(3):397–416, 1998.
  • [23] M. Gavish and D. L. Donoho. Optimal shrinkage of singular, values.arXiv:1405.7511v2, 2014.
  • [24] C., Giacobino.Thresholding estimators for high-dimensional data: model selection, testing and existence. PhD thesis, University of Geneva, 2017.
  • [25] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression, monitoring.Science, 286 (5439):531–537, 1999.
  • [26] A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal, problems.Technometrics, 12(1):55–67, 1970.
  • [27] W. James and C. Stein. Estimation with quadratic loss., InProceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pages 361–379, Berkeley, California, 1961. University of California Press.
  • [28] J. Josse and F. Husson. Selecting the number of components in PCA using cross-validation, approximations.Computational Statististics and Data Analysis, 56(6) :1869–1879, 2012.
  • [29] J. Josse and S. Sardy. Adaptive shrinkage of singular, values.Statistics and Computing, 26(3):715–724, 2016.
  • [30] N. Kushmerick. Learning to remove internet advertisements., InProceedings of the third international conference on Autonomous Agents, pages 175–181. ACM, 1999.
  • [31] C. Leng, Y. Lin, and G. Wahba. A note on the lasso and related procedures in model, selection.Statistica Sinica, 16(4) :1273–1284, 2006.
  • [32] R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete, matrices.Journal of Machine Learning Research, 11 :2287–2322, 2010.
  • [33] L. Meier, S. van de Geer, and P. Bühlmann. The group lasso for logistic, regression.Journal of the Royal Statistical Society, Series B, 70(1):53–71, 2008.
  • [34] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the, lasso.The Annals of Statistics, 34 :1436–1462, 2006.
  • [35] N. Meinshausen and P. Bühlmann. Stability, selection.Journal of the Royal Statistical Society: Series B, 72(4):417–473, 2010.
  • [36] A. Mukherjee, K. Chen, N. Wang, and J. Zhu. On the degrees of freedom of reduced-rank estimators in multivariate, regression.Biometrika, 102(2):457–477, 2015.
  • [37] J. A. Nelder and R. W. M. Wedderburn. Generalized linear, models.Journal of the Royal Statistical Society: Series A, 135(3):370–384, 1972.
  • [38] D. Neto, S. Sardy, and P. Tseng. $\ell _1$-penalized likelihood smoothing and segmentation of volatility processes allowing for abrupt, changes.Journal of Computational and Graphical Statistics, 21(1):217–233, 2012.
  • [39] A. B. Owen and P. O. Perry. Bi-cross-validation of the svd and the nonnegative matrix, factorization.Annals of Applied Statistics, 3(2):564–594, 2009.
  • [40] M. Y. Park and T. Hastie. $L_1$-regularization-path algorithm for generalized linear, models.Journal of the Royal Statistical Society: Series B, 69(4):659–677, 2007.
  • [41] J. Pitman and M. Yor. The law of the maximum of a bessel, bridge.Electronic Journal of Probability, 4:1–35, 1999.
  • [42] S. Reid, R. Tibshirani, and J. Friedman. A study of error variance estimation in lasso, regression.arXiv:1311.5274v2, 2014.
  • [43] R. T., Rockafellar.Convex Analysis. Princeton University Press, Princeton, 1970.
  • [44] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal, algorithms.Physica D, 60:259–268, 1992.
  • [45] S. Sardy. On the practice of rescaling, covariates.International Statistical Review, 76(2):285–297, 2008.
  • [46] S. Sardy. Adaptive posterior mode estimation of a sparse sequence for model, selection.Scandinavian Journal of Statistics, 36(4):577–601, 2009.
  • [47] S. Sardy. Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood’s block, gradient.Journal of the American Statistical Association, 107(498):800–813, 2012.
  • [48] S. Sardy and P. Tseng. On the statistical analysis of smoothing by maximizing dirty markov random field posterior, distributions.Journal of the American Statistical Association, 99(465):191–204, 2004.
  • [49] S. Sardy and P. Tseng. Density estimation by total variation penalized likelihood driven by the sparsity $\ell _1$ information, criterion.Scandinavian Journal of Statistics, 37(2):321–337, 2010.
  • [50] S. Sardy, A. Antoniadis, and P. Tseng. Automatic smoothing with wavelets for a wide class of, distributions.Journal of Computational and Graphical Statistics, 13(2):399–421, 2004.
  • [51] G. Schwarz. Estimating the dimension of a, model.The Annals of Statistics, 6(2):461–464, 1978.
  • [52] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. A sparse-group, lasso.Journal of Computational and Graphical Statistics, 22(2):231–245, 2013.
  • [53] C. M. Stein. Estimation of the mean of a multivariate normal, distribution.The Annals of Statistics, 9(6) :1135–1151, 1981.
  • [54] R. Tibshirani. Regression shrinkage and selection via the, lasso.Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.
  • [55] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused, lasso.Journal of the Royal Statistical Society, Series B, 67(1):91–108, 2005.
  • [56] R. J. Tibshirani and J. Taylor. The solution path of the generalized, lasso.The Annals of Statistics, 39(3) :1335–1371, 2011.
  • [57] R. J. Tibshirani and J. Taylor. Degrees of freedom in lasso, problems.The Annals of Statistics, 40(2) :1198–1232, 2012.
  • [58] A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization, method.Soviet Mathematics Doklady, 4(4) :1035–1038, 1963.
  • [59] G., Wahba.Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia, 1990.
  • [60] H. Wang, G. Li, and G. Jiang. Robust regression shrinkage and consistent variable selection through the, LAD-lasso.Journal of Business & Economic Statistics, 25(3):347–355, 2007.
  • [61] Y. Yang. Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression, estimation.Biometrika, 92(4):937–950, 2005.
  • [62] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped, variables.Journal of the Royal Statistical Society, Series B, 68(1):49–67, 2006.
  • [63] C.-H. Zhang. Nearly unbiased variable selection under minimax concave, penalty.The Annals of Statistics, 38(2):894–942, 2010.
  • [64] H. Zou. The adaptive lasso and its oracle, properties.Journal of the American Statistical Association, 101(476) :1418–1429, 2006.
  • [65] H. Zou and T. Hastie. Regularization and variable selection via the elastic, net.Journal of the Royal Statistical Society: Series B, 67(2):301–320, 2005.
  • [66] H. Zou, T. Hastie, and R. Tibshirani. On the “degrees of freedom” of the, lasso.The Annals of Statistics, 35(5) :2173–2192, 2007.