## The Annals of Statistics

### Overcoming the limitations of phase transition by higher order analysis of regularization techniques

#### Abstract

We study the problem of estimating a sparse vector $\beta\in\mathbb{R}^{p}$ from the response variables $y=X\beta+w$, where $w\sim N(0,\sigma_{w}^{2}I_{n\times n})$, under the following high-dimensional asymptotic regime: given a fixed number $\delta$, $p\rightarrow\infty$, while $n/p\rightarrow\delta$. We consider the popular class of $\ell_{q}$-regularized least squares (LQLS), a.k.a. bridge estimators, given by the optimization problem \begin{equation*}\hat{\beta}(\lambda,q)\in\arg\min_{\beta}\frac{1}{2}\|y-X\beta\|_{2}^{2}+\lambda\|\beta\|_{q}^{q},\end{equation*} and characterize the almost sure limit of $\frac{1}{p}\|\hat{\beta}(\lambda,q)-\beta\|_{2}^{2}$, and call it asymptotic mean square error (AMSE). The expression we derive for this limit does not have explicit forms, and hence is not useful in comparing LQLS for different values of $q$, or providing information in evaluating the effect of $\delta$ or sparsity level of $\beta$. To simplify the expression, researchers have considered the ideal “error-free” regime, that is, $w=0$, and have characterized the values of $\delta$ for which AMSE is zero. This is known as the phase transition analysis.

In this paper, we first perform the phase transition analysis of LQLS. Our results reveal some of the limitations and misleading features of the phase transition analysis. To overcome these limitations, we propose the small error analysis of LQLS. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also describes when phase transition analysis is reliable, and presents a more accurate comparison among different regularizers.

#### Article information

Source
Ann. Statist., Volume 46, Number 6A (2018), 3099-3129.

Dates
Revised: September 2017
First available in Project Euclid: 7 September 2018

https://projecteuclid.org/euclid.aos/1536307244

Digital Object Identifier
doi:10.1214/17-AOS1651

Mathematical Reviews number (MathSciNet)
MR3851766

Zentralblatt MATH identifier
06968610

#### Citation

Weng, Haolei; Maleki, Arian; Zheng, Le. Overcoming the limitations of phase transition by higher order analysis of regularization techniques. Ann. Statist. 46 (2018), no. 6A, 3099--3129. doi:10.1214/17-AOS1651. https://projecteuclid.org/euclid.aos/1536307244

#### References

• [1] Amelunxen, D., Lotz, M., McCoy, M. B. and Tropp, J. A. (2014). Living on the edge: Phase transitions in convex programs with random data. Inf. Inference 3 224–294.
• [2] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57 764–785.
• [3] Bayati, M. and Montanari, A. (2012). The LASSO risk for Gaussian matrices. IEEE Trans. Inform. Theory 58 1997–2017.
• [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [5] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
• [6] Bradic, J. (2016). Robustness in sparse high-dimensional linear models: Relative efficiency and robust approximate message passing. Electron. J. Stat. 10 3894–3944.
• [7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [9] Candès, E. J. (2008). The restricted isometry property and its implications for compressed sensing. C. R. Math. Acad. Sci. Paris 346 589–592.
• [10] Coolen, A. C. C. (2005). The Mathematical Theory of Minority Games: Statistical Mechanics of Interacting Agents. Oxford Univ. Press, Oxford.
• [11] Donoho, D. and Montanari, A. (2015). Variance breakdown of Huber (M)-estimators. $n/p\rightarrow m$. Preprint.
• [12] Donoho, D. and Montanari, A. (2016). High dimensional robust M-estimation: Asymptotic variance via approximate message passing. Probab. Theory Related Fields 166 935–969.
• [13] Donoho, D. L. (2006). High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete Comput. Geom. 35 617–652.
• [14] Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal $l_{1}$-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
• [15] Donoho, D. L., Gavish, M. and Montanari, A. (2013). The phase transition of matrix recovery from Gaussian measurements matches the minimax MSE of matrix denoising. Proc. Natl. Acad. Sci. USA 110 8405–8410.
• [16] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914–18919.
• [17] Donoho, D. L., Maleki, A. and Montanari, A. (2011). The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inform. Theory 57 6920–6941.
• [18] Donoho, D. L., Maleki, A. and Montanari, A. (2011). The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inform. Theory 57 6920–6941.
• [19] Donoho, D. L. and Tanner, J. (2005). Sparse nonnegative solution of underdetermined linear equations by linear programming. Proc. Natl. Acad. Sci. USA 102 9446–9451.
• [20] Donoho, D. L. and Tanner, J. (2005). Neighborliness of randomly projected simplices in high dimensions. Proc. Natl. Acad. Sci. USA 102 9452–9457.
• [21] El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. Preprint. Available at arXiv:1311.2445.
• [22] El Karoui, N., Bean, D., Bickel, P., Lim, C. and Yu, B. (2013). On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. USA 110 14557–14562.
• [23] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [24] Frank, L. and Friedman, J. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–135.
• [25] Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397–416.
• [26] Guo, D. and Verdú, S. (2005). Randomly spread CDMA: Asymptotics via statistical physics. IEEE Trans. Inform. Theory 51 1983–2010.
• [27] Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
• [28] Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
• [29] Kabashima, Y., Wadayama, T. and Tanaka, T. (2009). A typical reconstruction limit for compressed sensing based on $L_{p}$-norm minimization. J. Stat. Mech. Theory Exp. 2009 L09003.
• [30] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
• [31] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
• [32] Krzakala, F., Mézard, M., Sausset, F., Sun, Y. and Zdeborová, L. (2012). Statistical-physics-based reconstruction in compressed sensing. Phys. Rev. X 2 021005.
• [33] Maleki, A. (2010). Approximate message passing algorithms for compressed sensing. Ph.D. thesis, Stanford Univ., Stanford, CA.
• [34] Mousavi, A., Maleki, A. and Baraniuk, R. G. (2017). Consistent parameter estimation for LASSO and approximate message passing. Ann. Statist. 45 2427–2454.
• [35] Oymak, S. and Hassibi, B. (2016). Sharp MSE bounds for proximal denoising. Found. Comput. Math. 16 965–1029.
• [36] Oymak, S., Thrampoulidis, C. and Hassibi, B. (2013). The squared-error of generalized lasso: A precise analysis. In 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) 1002–1009. IEEE, New York.
• [37] Rangan, S., Goyal, V. and Fletcher, A. (2009). Asymptotic analysis of map estimation via the replica method and compressed sensing. In Advances in Neural Information Processing Systems 1545–1553.
• [38] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
• [39] Stojnic, M. (2009). Various thresholds for $\ell_{1}$-optimization in compressed sensing. Preprint. Available at arXiv:0907.3666.
• [40] Su, W., Bogdan, M. and Candes, E. (2015). False discoveries occur early on the lasso path. Preprint. Available at arXiv:1511.01957.
• [41] Tanaka, T. (2002). A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors. IEEE Trans. Inform. Theory 48 2888–2910.
• [42] Thrampoulidis, C., Abbasi, E. and Hassibi, B. (2016). Precise error analysis of regularized M-estimators in high-dimensions. Preprint. Available at arXiv:1601.06233.
• [43] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [44] Weng, H. and Maleki, A. (2017). Low noise sensitivity analysis of $\ell_{q}$-minimization in oversampled systems. Preprint. Available at arXiv:1705.03533.
• [45] Weng, H., Maleki, A. and Zheng, L. (2018). Supplement to “Overcoming the limitations of phase transition by higher order analysis of regularization techniques.” DOI:10.1214/17-AOS1651SUPP.
• [46] Zheng, L., Maleki, A., Weng, H., Wang, X. and Long, T. (2016). Does $\ell_{p}$-minimization outperform $\ell_{1}$-minimization? Preprint. Available at arXiv:1501.03704v2.
• [47] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

#### Supplemental materials

• Supplement to “Overcoming the limitations of phase transition by higher order analysis of regularization techniques”. Due to space constraints, additional simulations and technical proofs are relegated a supplementary document in [45], which contains Sections A–J.