## Bernoulli

• Bernoulli
• Volume 24, Number 1 (2018), 465-492.

### Power of the spacing test for least-angle regression

#### Abstract

Recent advances in Post-Selection Inference have shown that conditional testing is relevant and tractable in high-dimensions. In the Gaussian linear model, further works have derived unconditional test statistics such as the Kac–Rice Pivot for general penalized problems. In order to test the global null, a prominent offspring of this breakthrough is the Spacing test that accounts the relative separation between the first two knots of the celebrated least-angle regression (LARS) algorithm. However, no results have been shown regarding the distribution of these test statistics under the alternative. For the first time, this paper addresses this important issue for the Spacing test and shows that it is unconditionally unbiased. Furthermore, we provide the first extension of the Spacing test to the frame of unknown noise variance.

More precisely, we investigate the power of the Spacing test for LARS and prove that it is unbiased: its power is always greater or equal to the significance level $\alpha$. In particular, we describe the power of this test under various scenarii: we prove that its rejection region is optimal when the predictors are orthogonal; as the level $\alpha$ goes to zero, we show that the probability of getting a true positive is much greater than $\alpha$; and we give a detailed description of its power in the case of two predictors. Moreover, we numerically investigate a comparison between the Spacing test for LARS, the Pearson’s chi-squared test (goodness of fit) and a numerical testing procedure based on the maximal correlation.

When the noise variance is unknown, our analysis unleashes a new test statistic that can be computed in cubic time in the population size and which we refer to as the $t$-Spacing test for LARS. The $t$-Spacing test involves the first two knots of the LARS algorithm and we give its distribution under the null hypothesis. Interestingly, numerical experiments witness that the $t$-Spacing test for LARS enjoys the same aforementioned properties as the Spacing test.

#### Article information

Source
Bernoulli, Volume 24, Number 1 (2018), 465-492.

Dates
Revised: May 2016
First available in Project Euclid: 27 July 2017

https://projecteuclid.org/euclid.bj/1501142452

Digital Object Identifier
doi:10.3150/16-BEJ885

Mathematical Reviews number (MathSciNet)
MR3706766

Zentralblatt MATH identifier
1381.62110

#### Citation

Azaïs, Jean-Marc; De Castro, Yohann; Mourareau, Stéphane. Power of the spacing test for least-angle regression. Bernoulli 24 (2018), no. 1, 465--492. doi:10.3150/16-BEJ885. https://projecteuclid.org/euclid.bj/1501142452

#### References

• [1] Anderson, T.W. (1955). The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities. Proc. Amer. Math. Soc. 6 170–176.
• [2] Azaïs, J.-M. and Genz, A. (2013). Computation of the distribution of the maximum of stationary Gaussian processes. Methodol. Comput. Appl. Probab. 15 969–985.
• [3] Azaïs, J.-M. and Wschebor, M. (2009). Level Sets and Extrema of Random Processes and Fields. Hoboken, NJ: Wiley.
• [4] Bertin, K., Le Pennec, E. and Rivoirard, V. (2011). Adaptive Dantzig density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 47 43–74.
• [5] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [6] Bühlmann, P., Meier, L. and van de Geer, S. (2014). Discussion: “A significance test for the lasso”. Ann. Statist. 42 469–477.
• [7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
• [8] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [9] Candes, E.J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
• [10] Chen, S.S., Donoho, D.L. and Saunders, M.A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
• [11] de Castro, Y. (2013). A remark on the lasso and the Dantzig selector. Statist. Probab. Lett. 83 304–314.
• [12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• [13] Fuchs, J.J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601–3608.
• [14] Genz, A. (1992). Numerical computation of multivariate normal probabilities. J. Comput. Graph. Statist. 1 141–149.
• [15] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. New York: Springer.
• [16] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Boca Raton: CRC Press.
• [17] Juditsky, A. and Nemirovski, A. (2011). Accuracy guarantees for $\ell_{1}$-recovery. IEEE Trans. Inform. Theory 57 7818–7839.
• [18] Lee, J.D., Sun, D.L., Sun, Y. and Taylor, J.E. (2013). Exact post-selection inference with the lasso. Preprint. Available at arXiv:1311.6238.
• [19] Lockhart, R., Taylor, J., Tibshirani, R.J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
• [20] Lockhart, R., Taylor, J., Tibshirani, R.J. and Tibshirani, R. (2014). Correction: Rejoinder to “A significance test for the lasso”. Ann. Statist. 42 2138–2139.
• [21] Lockhart, R., Taylor, J., Tibshirani, R.J. and Tibshirani, R. (2014). Rejoinder: “A significance test for the lasso”. Ann. Statist. 42 518–531.
• [22] Loftus, J.R. and Taylor, J.E. (2015). Selective inference in regression models with groups of variables. Preprint. Available at arXiv:1511.01478.
• [23] Mourareau, S. (2015). Available at arXiv:http://www.math.univ-toulouse.fr/~smourare/power_test.m. March 2015.
• [24] Nuyens, D. and Cools, R. (2006). Fast algorithms for component-by-component construction of rank-1 lattice rules in shift-invariant reproducing kernel Hilbert spaces. Math. Comp. 75 903–920 (electronic).
• [25] Taylor, J.E., Lockhart, R., Tibshirani, R.J. and Tibshirani, R. (2014). Exact post-selection inference for forward stepwise and least angle regression. Preprint. Available at arXiv:1401.3889.
• [26] Taylor, J.E., Loftus, J.R. and Tibshirani, R.J. (2014). Tests in adaptive regression via the Kac–Rice formula. Preprint. Available at arXiv:1308.3020v3.
• [27] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [28] van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.