## The Annals of Statistics

### Eigenvalue distributions of variance components estimators in high-dimensional random effects models

#### Abstract

We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.

#### Article information

Source
Ann. Statist., Volume 47, Number 5 (2019), 2855-2886.

Dates
Revised: August 2018
First available in Project Euclid: 3 August 2019

https://projecteuclid.org/euclid.aos/1564797866

Digital Object Identifier
doi:10.1214/18-AOS1767

Mathematical Reviews number (MathSciNet)
MR3988775

Subjects
Primary: 62E20: Asymptotic distribution theory

#### Citation

Fan, Zhou; Johnstone, Iain M. Eigenvalue distributions of variance components estimators in high-dimensional random effects models. Ann. Statist. 47 (2019), no. 5, 2855--2886. doi:10.1214/18-AOS1767. https://projecteuclid.org/euclid.aos/1564797866

#### References

• Bai, Z., Chen, J. and Yao, J. (2010). On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 423–437.
• Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553–605.
• Bai, Z. and Yao, J. (2012). On sample eigenvalues in a generalized spiked population model. J. Multivariate Anal. 106 167–177.
• Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
• Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
• Barton, N. H. (1990). Pleiotropic models of quantitative variation. Genetics 124 773–782.
• Benaych-Georges, F. (2009). Rectangular random matrices, related convolution. Probab. Theory Related Fields 144 471–515.
• Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
• Blows, M. W. (2007). A tale of two matrices: Multivariate approaches in evolutionary biology. J. Evol. Biol. 20 1–8.
• Blows, M. W. and McGuigan, K. (2015). The distribution of genetic variance across phenotypic space and the response to selection. Mol. Ecol. 24 2056–2072.
• Blows, M. W., Allen, S. L., Collet, J. M., Chenoweth, S. F. and McGuigan, K. (2015). The phenome-wide distribution of genetic variance. Amer. Nat. 186 15–30.
• Collins, B. (2003). Moments and cumulants of polynomial random variables on unitary groups, the Itzykson–Zuber integral, and free probability. Int. Math. Res. Not. 2003 953–982.
• Collins, B. and Śniady, P. (2006). Integration with respect to the Haar measure on unitary, orthogonal and symplectic group. Comm. Math. Phys. 264 773–795.
• Couillet, R., Debbah, M. and Silverstein, J. W. (2011). A deterministic equivalent for the analysis of correlated MIMO multiple access channels. IEEE Trans. Inform. Theory 57 3493–3514.
• Dobriban, E. (2015). Efficient computation of limit spectra of sample covariance matrices. Random Matrices Theory Appl. 4 1550019, 36.
• Dobriban, E. (2017). Sharp detection in PCA under correlations: All eigenvalues matter. Ann. Statist. 45 1810–1833.
• Dupuy, F. and Loubaton, P. (2011). On the capacity achieving covariance matrix for frequency selective MIMO channels using the asymptotic approach. IEEE Trans. Inform. Theory 57 5737–5753.
• Dykema, K. (1993). On certain free product factors via an extended matrix model. J. Funct. Anal. 112 31–60.
• El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
• Fan, Z. and Johnstone, I. M. (2017). Tracy–Widom at each edge of real covariance estimators. Unpublished manuscript.
• Fan, Z. and Johnstone, I. M. (2019). Supplement to “Eigenvalue distributions of variance components estimators in high-dimensional random effects models.” DOI:10.1214/18-AOS1767SUPP.
• Fan, Z., Johnstone, I. M. and Sun, Y. (2018). Spiked covariances and principal components analysis in high-dimensional random effects models. Available at arXiv:1806.09529.
• Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52 399–433.
• Hachem, W., Loubaton, P. and Najim, J. (2007). Deterministic equivalents for certain functionals of large random matrices. Ann. Appl. Probab. 17 875–930.
• Hiai, F. and Petz, D. (2000). Asymptotic freeness almost everywhere for random matrices. Acta Sci. Math. (Szeged) 66 809–834.
• Hine, E., McGuigan, K. and Blows, M. W. (2014). Evolutionary constraints in high-dimensional trait sets. Amer. Nat. 184 119–131.
• Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• Kirkpatrick, M. (2009). Patterns of quantitative genetic variation in multiple dimensions. Genetica 136 271–284.
• LaMotte, L. R. (1973). Quadratic estimation of variance components. Biometrics 29 311–330.
• Lande, R. (1979). Quantitative genetic analysis of multivariate evolution, applied to brain: Body size allometry. Evolution 33 402–416.
• Lande, R. and Arnold, S. J. (1983). The measurement of selection on correlated characters. Evolution 37 1210–1226.
• Ledoit, O. and Péché, S. (2011). Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Related Fields 151 233–264.
• Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
• Loh, P.-R., Tucker, G., Bulik-Sullivan, B. K., Vilhjálmsson, B. J., Finucane, H. K., Salem, R. M., Chasman, D. I., Ridker, P. M., Neale, B. M. et al. (2015). Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47 284–290.
• Lush, J. L. (1937). Animal Breeding Plans. Iowa State College Press, Ames, IA.
• Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits 1. Sinauer, Sunderland, MA.
• Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Sb. Math. 1 457–483.
• McGuigan, K., Collet, J. M., McGraw, E. A., Ye Yixin, H., Allen, S. L., Chenoweth, S. F. and Blows, M. W. (2014). The nature and extent of mutational pleiotropy in gene expression of male drosophila serrata. Genetics 196 911–921.
• Mestre, X. (2008). Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. IEEE Trans. Inform. Theory 54 5113–5129.
• Moustakas, A. L. and Simon, S. H. (2007). On the outage capacity of correlated multiple-path MIMO channels. IEEE Trans. Inform. Theory 53 3887–3903.
• Nica, A., Shlyakhtenko, D. and Speicher, R. (2002). Operator-valued distributions. I. Characterizations of freeness. Int. Math. Res. Not. 2002 1509–1538.
• Nica, A. and Speicher, R. (2006). Lectures on the Combinatorics of Free Probability. London Mathematical Society Lecture Note Series 335. Cambridge Univ. Press, Cambridge.
• Onatski, A., Moreira, M. J. and Hallin, M. (2014). Signal detection in high dimension: The multispiked case. Ann. Statist. 42 225–254.
• Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
• Paul, D. and Aue, A. (2014). Random matrix theory in statistics: A review. J. Statist. Plann. Inference 150 1–29.
• Phillips, P. C. and Arnold, S. J. (1989). Visualizing multivariate selection. Evolution 1209–1222.
• Rao, C. R. (1971). Minimum variance quadratic unbiased estimation of variance components. J. Multivariate Anal. 1 445–456.
• Rao, C. R. (1972). Estimation of variance and covariance components in linear models. J. Amer. Statist. Assoc. 67 112–115.
• Rao, N. R., Mingo, J. A., Speicher, R. and Edelman, A. (2008). Statistical eigen-inference from large Wishart matrices. Ann. Statist. 36 2850–2885.
• Robertson, A. (1959a). The sampling variance of the genetic correlation coefficient. Biometrics 15 469–485.
• Robertson, A. (1959b). The sampling variance of the genetic correlation coefficient. Biometrics 15 469–485.
• Searle, S. R., Casella, G. and McCulloch, C. E. (2006). Variance Components. Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken, NJ.
• Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331–339.
• Soshnikov, A. (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys. 108 1033–1056.
• Speicher, R. (1998). Combinatorial theory of the free product with amalgamation and operator-valued free probability theory. Mem. Amer. Math. Soc. 132 x$+$88.
• Speicher, R. and Vargas, C. (2012). Free deterministic equivalents, rectangular random matrix models, and operator-valued free probability theory. Random Matrices Theory Appl. 1 1150008, 26.
• Voiculescu, D. (1991). Limit laws for random matrices and free products. Invent. Math. 104 201–220.
• Voiculescu, D. (1995). Operations on certain non-commutative operator-valued random variables. Recent advances in operator algebras (Orléans, 1992). Astérisque 232 243–275.
• Voiculescu, D. (1998). A strengthened asymptotic freeness result for random matrices with applications to free entropy. Int. Math. Res. Not. 1998 41–63.
• Voiculescu, D. V., Dykema, K. J. and Nica, A. (1992). Free Random Variables: A Noncommutative Probability Approach to Free Products with Applications to Random Matrices, Operator Algebras and Harmonic Analysis on Free Groups. CRM Monograph Series 1. Amer. Math. Soc., Providence, RI.
• Walsh, B. and Blows, M. W. (2009). Abundant genetic variation $+$ strong selection $=$ multivariate genetic constraints: A geometric view of adaptation. Annu. Rev. Ecol. Evol. Syst. 40 41–59.
• Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2) 62 548–564.
• Wright, S. (1935). The analysis of variance and the correlations between relatives with respect to deviations from an optimum. J. Genet. 30 243–256.
• Yang, M., Goldstein, H., Browne, W. and Woodhouse, G. (2002). Multivariate multilevel analyses of examination results. J. Roy. Statist. Soc. Ser. A 165 137–153.
• Yang, J., Lee, S. H., Goddard, M. E. and Visscher, P. M. (2011). GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88 76–82.
• Yao, J., Zheng, S. and Bai, Z. (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics 39. Cambridge Univ. Press, New York.
• Zhang, L. (2006). Spectral analysis of large dimensional random matrices. Ph.D. thesis, National Univ. Singapore.

#### Supplemental materials

• Supplementary Appendices. The Appendices contain a discussion of more general classification designs, proofs of Theorem 3.10 and Corollary 3.11, the proof of Lemma 4.3 and the conclusion of the proof of Theorem 4.1 and a separate exposition of the proof in Section 4 for the simpler setting of Theorem 1.1.