Electronic Journal of Statistics

Estimation from nonlinear observations via convex programming with application to bilinear regression

Sohail Bahmani

Full-text: Open access


We propose a computationally efficient estimator, formulated as a convex program, for a broad class of nonlinear regression problems that involve difference of convex(DC) nonlinearities. The proposed method can be viewed as a significant extension of the “anchored regression” method formulated and analyzed in [10] for regression with convex nonlinearities. Our main assumption, in addition to other mild statistical and computational assumptions, is availability of a certain approximation oracle for the average of the gradients of the observation functions at a ground truth. Under this assumption and using a PAC-Bayesian analysis we show that the proposed estimator produces an accurate estimate with high probability. As a concrete example, we study the proposed framework in the bilinear regression problem with Gaussian factors and quantify a sufficient sample complexity for exact recovery. Furthermore, we describe a computationally tractable scheme that provably produces the required approximation oracle in the considered bilinear regression problem.

Article information

Electron. J. Statist., Volume 13, Number 1 (2019), 1978-2011.

Received: June 2018
First available in Project Euclid: 19 June 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F10: Point estimation 90C25: Convex programming
Secondary: 62P30: Applications in engineering and industry

Nonlinear regression convex programming PAC-Bayesian analysis bilinear regression

Creative Commons Attribution 4.0 International License.


Bahmani, Sohail. Estimation from nonlinear observations via convex programming with application to bilinear regression. Electron. J. Statist. 13 (2019), no. 1, 1978--2011. doi:10.1214/19-EJS1567. https://projecteuclid.org/euclid.ejs/1560909647

Export citation


  • [1] A. Aghasi, A. Ahmed, and P. Hand. Branchhull: Convex bilinear inversion from the entrywise product of signals with known signs. preprint, arXiv:1702.04342 [cs.IT], 2017.
  • [2] A. A. Ahmadi and G. Hall. DC decomposition of nonconvex polynomials with algebraic techniques., Mathematical Programming - Series B, 2017.
  • [3] A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming., IEEE Transactions on Information Theory, 60(3) :1711–1732, March 2014.
  • [4] P. Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers., Mathematical Methods of Statistics, 17(4):279–304, Dec 2008.
  • [5] P. Alquier and K. Lounici. PAC-Bayesian bounds for sparse regression estimation with exponential weights., Electronic Journal of Statistics, 5:127–145, 2011.
  • [6] J.-Y. Audibert and O. Catoni. Robust linear least squares regression., the Annals of Statistics, 39(5) :2766–2794, 10 2011.
  • [7] S. Bahmani and J. Romberg. Lifting for blind deconvolution in random mask imaging: Identifiability and convex relaxation., SIAM Journal on Imaging Sciences, 8(4) :2203–2238, 2015.
  • [8] S. Bahmani and J. Romberg. Phase retrieval meets statistical learning theory: A flexible convex relaxation. In A. Singh and J. Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 252–260, Fort Lauderdale, FL, USA, 20–22 Apr 2017a. PMLR.
  • [9] S. Bahmani and J. Romberg. A flexible convex relaxation for phase retrieval., Electronic Journal of Statistics, 11(2) :5254–5281, 2017b.
  • [10] S. Bahmani and J. Romberg. Solving equations of random convex functions via anchored regression., J. Found. Comp. Math., 2018. In press; preprint arXiv:1702.05327 [cs.LG].
  • [11] O. Bousquet, V. Koltchinskii, and D. Panchenko. Some local measures of complexity of convex hulls and generalization bounds. In J. Kivinen and R. H. Sloan, editors, Computational Learning Theory, pages 59–73, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg.
  • [12] E. J. Candès, X. Li, and M. Soltanolkotabi. Phase retrieval via Wirtinger flow: Theory and algorithms., Information Theory, IEEE Transactions on, 61(4) :1985–2007, Apr. 2015.
  • [13] O. Catoni., PAC-Bayesian supervised classification: the thermodynamics of statistical learning, volume 56 of Lecture Notes–Monograph Series. Institute of Mathematical Statistics, Beachwood, OH, USA, 2007.
  • [14] O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression., arXiv preprint, Dec. 2017. arXiv:1712.02747 [math, stat].
  • [15] R. Y. Chen, A. Gittens, and J. A. Tropp. The masked sample covariance estimator: An analysis using matrix concentration inequalities., Information and Inference: A Journal of the IMA, 1(1):2–20, 2012.
  • [16] Y. Chen and E. Candés. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In, Advances in Neural Information Processing Systems 28, pages 739–747. Curran Associates, Inc., 2015.
  • [17] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In, Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 353–360, New York, NY, USA, 2009. ACM.
  • [18] E. Giné and V. Koltchinskii. Concentration inequalities and asymptotic results for ratio type empirical processes., Annals of Probability, 34(3) :1143–1216, May 2006.
  • [19] E. Giné, V. Koltchinskii, and J. A. Wellner. Ratio limit theorems for empirical processes. In E. Giné, C. Houdré, and D. Nualart, editors, Stochastic Inequalities and Applications, pages 249–278, Basel, 2003. Birkhäuser Basel.
  • [20] T. Goldstein and C. Studer. Convex phase retrieval without lifting via PhaseMax. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1273–1281, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.
  • [21] T. Goldstein and C. Studer. Phasemax: Convex phase retrieval via basis pursuit., IEEE Transactions on Information Theory, 64(4) :2675–2689, April 2018.
  • [22] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1, Mar. 2014. URL, http://cvxr.com/cvx.
  • [23] Gurobi Optimization, Inc. Gurobi optimizer reference manual, 2016. URL, http://www.gurobi.com.
  • [24] P. Hartman. On functions representable as a difference of convex functions., Pacific J. Math., 9(3):707–713, 1959.
  • [25] M. Junge and Q. Zeng. Noncommutative Bennett and Rosenthal inequalities., Annals of Probability, 41(6) :4287–4316, Nov. 2013.
  • [26] V. Koltchinskii. Rademacher penalties and structural risk minimization., IEEE Transactions on Information Theory, 47(5) :1902–1914, 2001.
  • [27] V. Koltchinskii., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics: École d’Été de Probabilités de Saint-Flour XXXVIII -2008. Springer-Verlag Berlin Heidelberg, 2011.
  • [28] V. Koltchinskii and S. Mendelson. Bounding the smallest singular value of a random matrix without concentration., International Mathematics Research Notices, 2015(23):12991–13008, 2015.
  • [29] V. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. In E. Giné, D. M. Mason, and J. A. Wellner, editors, High Dimensional Probability II, pages 443–457, Boston, MA, 2000. Birkhäuser Boston.
  • [30] J. Langford and J. Shawe-Taylor. PAC-Bayes & margins. In, Advances in Neural Information Processing Systems, pages 439–446, 2003.
  • [31] M. Ledoux and M. Talagrand., Probability in Banach Spaces: Isoperimetry and processes. Springer Science & Business Media, 2013.
  • [32] X. Li, S. Ling, T. Strohmer, and K. Wei. Rapid, robust, and reliable blind deconvolution via nonconvex optimization., Applied and Computational Harmonic Analysis, 2018.. in press.
  • [33] S. Ling and T. Strohmer. Regularized gradient descent: A non-convex recipe for fast joint blind deconvolution and demixing., Information and Inference: A Journal of the IMA, 2018.. in press.
  • [34] W. Luo, W. Alghamdi, and Y. M. Lu. Optimal spectral initialization for signal recovery with applications to phase retrieval. Preprint, arXiv:1811.04420 [cs.IT], 2018.
  • [35] C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. Preprint, arXiv:1711.10467 [cs.LG], 2017.
  • [36] L. Mackey, M. I. Jordan, R. Y. Chen, B. Farrell, and J. A. Tropp. Matrix concentration inequalities via the method of exchangeable pairs., Annals of Probability, 42(3):906–945, May 2014.
  • [37] D. McAllester and T. Akinbiyi., PAC-Bayesian Theory, pages 95–103. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
  • [38] D. A. McAllester. Some PAC-Bayesian theorems., Machine Learning, 37(3):355–363, Dec 1999.
  • [39] S. Mendelson. Learning without concentration. In, Proceedings of the 27th Conference on Learning Theory (COLT), volume 35 of JMLR W&CP, pages 25–39, 2014.
  • [40] S. Mendelson. Learning without concentration., Journal of the ACM, 62(3):21:1–21:25, June 2015. ISSN 0004-5411.
  • [41] M. Mondelli and A. Montanari. Fundamental limits of weak recovery with applications to phase retrieval. In, Proceedings of the 31st Conference On Learning Theory (COLT), volume 75 of Proceedings of Machine Learning Research, pages 1445–1450. PMLR, 2018.
  • [42] P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating minimization. In, Advances in Neural Information Processing Systems 26, pages 2796–2804. Curran Associates, Inc., 2013.
  • [43] R. I. Oliveira. The lower tail of random quadratic forms with applications to ordinary least squares., Probability Theory and Related Fields, 166(3) :1175–1194, Dec 2016.
  • [44] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach., IEEE Transactions on Information Theory, 59(1):482–494, Jan 2013.
  • [45] Y. Plan and R. Vershynin. The generalized lasso with non-linear observations., IEEE Transactions on Information Theory, 62(3) :1528–1537, March 2016.
  • [46] A. W. van Der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes. Springer Series in Statistics. Springer, 1996.
  • [47] V. N. Vapnik., Statistical learning theory. Wiley, 1998.
  • [48] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities., Theory of Probability & Its Applications, 16(2):264–280, 1971.
  • [49] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In G. Kutyniok and Y. Eldar, editors, Compressed Sensing, Theory and Applications, pages 210–268. Cambridge University Press, 2012.
  • [50] Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis-Kahan theorem for statisticians., Biometrika, 102(2):315–323, 2015.