## The Annals of Applied Probability

### A random matrix approach to neural networks

#### Abstract

This article studies the Gram random matrix model $G=\frac{1}{T}\Sigma^{{\mathsf{T}}}\Sigma$, $\Sigma=\sigma(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_{1},\ldots,x_{T}]\in\mathbb{R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in\mathbb{R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries and $\sigma:\mathbb{R}\to\mathbb{R}$ is a Lipschitz continuous (activation) function—$\sigma(WX)$ being understood entry-wise. By means of a key concentration of measure lemma arising from nonasymptotic random matrix arguments, we prove that, as $n,p,T$ grow large at the same rate, the resolvent $Q=(G+\gamma I_{T})^{-1}$, for $\gamma>0$, has a similar behavior as that met in sample covariance matrix models, involving notably the moment $\Phi=\frac{T}{n}{\mathrm{E}}[G]$, which provides in passing a deterministic equivalent for the empirical spectral measure of $G$. Application-wise, this result enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

#### Article information

Source
Ann. Appl. Probab., Volume 28, Number 2 (2018), 1190-1248.

Dates
Revised: June 2017
First available in Project Euclid: 11 April 2018

https://projecteuclid.org/euclid.aoap/1523433634

Digital Object Identifier
doi:10.1214/17-AAP1328

Mathematical Reviews number (MathSciNet)
MR3784498

Zentralblatt MATH identifier
06897953

#### Citation

Louart, Cosme; Liao, Zhenyu; Couillet, Romain. A random matrix approach to neural networks. Ann. Appl. Probab. 28 (2018), no. 2, 1190--1248. doi:10.1214/17-AAP1328. https://projecteuclid.org/euclid.aoap/1523433634

#### References

• Akhiezer, N. I. and Glazman, I. M. (1993). Theory of Linear Operators in Hilbert Space. Dover, New York.
• Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
• Bai, Z. D. and Silverstein, J. W. (2007). On the signal-to-interference-ratio of CDMA systems in wireless communications. Ann. Appl. Probab. 17 81–101.
• Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
• Benaych-Georges, F. and Nadakuditi, R. R. (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal. 111 120–135.
• Cambria, E., Gastaldo, P., Bisio, F. and Zunino, R. (2015). An ELM-based model for affective analogical reasoning. Neurocomputing 149 443–455.
• Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. and LeCun, Y. (2015). The loss surfaces of multilayer networks. In AISTATS.
• Couillet, R. and Benaych-Georges, F. (2016). Kernel spectral clustering of large dimensional data. Electron. J. Stat. 10 1393–1454.
• Couillet, R., Hoydis, J. and Debbah, M. (2012). Random beamforming over quasi-static and fading channels: A deterministic equivalent approach. IEEE Trans. Inform. Theory 58 6392–6425.
• Couillet, R. and Kammoun, A. (2016). Random matrix improved subspace clustering. In 2016 Asilomar Conference on Signals, Systems, and Computers.
• Couillet, R., Pascal, F. and Silverstein, J. W. (2015). The random matrix regime of Maronna’s M-estimator with elliptically distributed samples. J. Multivariate Anal. 139 56–78.
• El Karoui, N. (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 19 2362–2405.
• El Karoui, N. (2010). The spectrum of kernel random matrices. Ann. Statist. 38 1–50.
• El Karoui, N. (2013). Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. Preprint. Available at arXiv:1311.2445.
• Giryes, R., Sapiro, G. and Bronstein, A. M. (2016). Deep neural networks with random Gaussian weights: A universal classification strategy? IEEE Trans. Signal Process. 64 3444–3457.
• Hornik, K., Stinchcombe, M. and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks 2 359–366.
• Huang, G.-B., Zhu, Q.-Y. and Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing 70 489–501.
• Huang, G.-B., Zhou, H., Ding, X. and Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 42 513–529.
• Jaeger, H. and Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 78–80.
• Kammoun, A., Kharouf, M., Hachem, W. and Najim, J. (2009). A central limit theorem for the sinr at the lmmse estimator output for large-dimensional signals. IEEE Transactions on Information Theory 55 5048–5063.
• Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 1097–1105.
• LeCun, Y., Cortes, C. and Burges, C. (1998). The MNIST database of handwritten digits.
• Ledoux, M. (2005). The Concentration of Measure Phenomenon 89. Amer. Math. Soc., Providence, RI.
• Liao, Z. and Couillet, R. (2017). A large dimensional analysis of least squares support vector machines. J. Mach. Learn. Res. To appear. Available at arXiv:1701.02967.
• Loubaton, P. and Vallet, P. (2010). Almost sure localization of the eigenvalues in a Gaussian information plus noise model. Application to the spiked models. Electron. J. Probab. 16 1934–1959.
• Mai, X. and Couillet, R. (2017). The counterintuitive mechanism of graph-based semi-supervised learning in the big data regime. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17).
• Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Math. USSR, Sb. 1 457–483.
• Pastur, L. and Ŝerbina, M. (2011). Eigenvalue Distribution of Large Random Matrices. Amer. Math. Soc., Providence, RI.
• Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 1177–1184.
• Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 65 386–408.
• Rudelson, M., Vershynin, R. et al. (2013). Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 1–9.
• Saxe, A., Koh, P. W., Chen, Z., Bhand, M., Suresh, B. and Ng, A. Y. (2011). On random weights and unsupervised feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 1089–1096.
• Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Netw. 61 85–117.
• Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large dimensional random matrices. J. Multivariate Anal. 54 175–192.
• Silverstein, J. W. and Choi, S. (1995). Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal. 54 295–309.
• Tao, T. (2012). Topics in Random Matrix Theory 132. Amer. Math. Soc., Providence, RI.
• Titchmarsh, E. C. (1939). The Theory of Functions. Oxford Univ. Press, New York.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing, 210–268, Cambridge Univ. Press, Cambridge.
• Williams, C. K. I. (1998). Computation with infinite neural networks. Neural Comput. 10 1203–1216.
• Yates, R. D. (1995). A framework for uplink power control in cellular radio systems. IEEE Journal on Selected Areas in Communications 13 1341–1347.
• Zhang, T., Cheng, X. and Singer, A. (2014). Marchenko–Pastur Law for Tyler’s and Maronna’s M-estimators. Available at http://arxiv.org/abs/1401.3424.