Large deviations and stochastic calculus for large random matrices

Large random matrices appear in different fields of mathematics and physics such as combinatorics, probability theory, statistics, operator theory, number theory, quantum field theory, string theory etc... In the last ten years, they attracted lots of interests, in particular due to a serie of mathematical breakthroughs allowing for instance a better understanding of local properties of their spectrum, answering universality questions, connecting these issues with growth processes etc. In this survey, we shall discuss the problem of the large deviations of the empirical measure of Gaussian random matrices, and more generally of the trace of words of independent Gaussian random matrices. We shall describe how such issues are motivated either in physics/combinatorics by the study of the so-called matrix models or in free probability by the definition of a non-commutative entropy. We shall show how classical large deviations techniques can be used in this context. These lecture notes are supposed to be accessible to non probabilists and non free-probabilists.


Introduction
Large random matrices have been studied since the thirties when Wishart [132] considered them to analyze some statistics problems. Since then, random matrices appeared in various fields of mathematics. Let us briefly summarize some of them and the mathematical questions they raised. with random entries. Typically, the matrix X N,M is made of independent equidistributed vectors {X 1 , · · · , X N } in C M with covariance matrix Σ, Such random vectors naturally appear in multivariate analysis context where X N,M is a data matrix, the column vectors of which represent an observation of a vector in C M . In such a setup, one would like to find the effective dimension of the system, that is the smallest dimension with which one can encode all the variations of the data. Such a principal components analysis is based on the study of the eigenvalues and eigenvectors of the covariance matrix X N,M (X N,M ) * . When one assumes that the column vectors have i.i.d Gaussian entries, Y N,M is called a standard Gaussian Wishart matrix. In statistics, it used to be reasonable to assume that N/M was large. However, the case where N/M is of order one is nowadays commonly considered, which corresponds to the cases where either the number of observations is rather small or when the dimension of the observation is very large. Such cases appear for instance in problems related with telecommunications and more precisely the analysis of cellular phones data, where a very large number of customers have to be treated simultaneously (see [70,116,120] and references therein). Other examples are provided in [80]. In this setting, the main questions concern local properties of the spectrum (such as the study of the large N, M behavior of the spectral radius of Y N,M , see [80], the asymptotic behaviour of the k largest eigenvalues etc.), or the form of the eigenvectors of Y N,M (see [120] and references therein).
2. Large random matrices and quantum mechanics : Wigner, in 1951 [131], suggested to approximate the Hamiltonians of highly excited nuclei by large random matrices. The basic idea is that there are so many phenomena going on in such systems that they can not be analyzed exactly and only a statistical approach becomes reasonable. The random matrices should be chosen as randomly as possible within the known physical restrictions of the model. For instance, he considered what we shall later on call Wigner's matrices, that is Hermitian (since the Hamiltonian has to be Hermitian) matrices with i.i.d entries (modulo the symmetry constraint).
In the case where the system is invariant by time inversion, one can consider real symmetric matrices etc... As Dyson pointed out, the general idea is to chose the most random model within the imposed symmetries and to check if the theoretical predictions agree with the experiment, a disagreement pointing out that an important symmetry of the problem has been neglected. It turned out that experiments agreed exceptionally well with these models; for instance, it was shown that the energy states of the atom of hydrogen submitted to a strong magnetic field can be compared with the eigenvalues of an Hermitian matrix with i.i.d Gaussian entries.
The book [59] summarizes a few similar experiments as well as the history of random matrices in quantum mechanics. In quantum mechanics, the eigenvalues of the Hamiltonian represent the energy states of the system. It is therefore important to study, following Wigner, the spectral distribution of the random matrix under study, but even more important, is its spacing distribution which represents the energy gaps and its extremal eigenvalues which are related with the ground states. Such questions were addressed in the reference book of M.L. Mehta [93], but got even more popular in mathematics since the work of C. Tracy et H. Widom [117] . It is also important to make sure that the results obtained do not depend on the details of the large random matrix models such as the law of the entries; this important field of investigation is often referred to as universality. An important effort of investigation was made in the last ten years in this direction for instance in [23], [54], [76], [89], [110], [112], [118], [102] ...

Large random matrices and Riemann Zeta function : The Riemann
Zeta function is given by with Re(s) > 1 and can be analytically continued to the complex plane. The study of the zeroes of this function in the strip 0 ≤ Re(s) < 1 furnishes one of the most famous open problems. It is well known that ζ has trivial zeroes at −2, −4, −6.... and that its zeroes are distributed symmetrically with respect to the line Re(s) = 2 −1 . The Riemann conjecture is that all the non trivial zeroes are located on this line. It was suggested by Hilbert and Polya that these zeroes might be related to the eigenvalues of a Hermitian operator, which would immediately imply that they are aligned. To investigate this idea, H. Montgomery (1972), assuming the Riemann conjecture, studied the number of zeroes of the zeta function in Re(s) = 2 −1 up to a distance T of the real axis. His result suggests a striking similarity with corresponding statistics of the distribution of the eigenvalues of random Hermitian or unitary matrices when T is large. Since then, an extensive literature was devoted to understand this relation. Let us only point out that the statistical evidence of this link can only be tested thanks to enormous numerical work, in particular due to A. Odlyzko [99,100] who could determine a few hundred of millions of zeroes of Riemann zeta function around the 10 20 -th zeroes on the line Re(s) = 2 −1 .
In somewhat the same direction, there is numerical evidence that the eigenvalues distribution of large Wigner matrices also describes the large eigenvalues of the Laplacian in some bounded domain such as the cardioid. This is related to quantum chaos since these eigenvalues describe the long time behavior of the classical ray dynamics in this domain (i.e. the billiard dynamics). 4. Large random matrices and free probability Free probability is a probability theory in a non-commutative framework. Probability measures are replaced by tracial states on von Neumann algebras. Free probability also contains the central notion of freeness which can be seen as a noncommutative analogue of the notion of independence. At the algebraic level, it can be related with the usual notion of freeness. This is why free probability could be well suited to solve important questions in von Neumann algebras, such as the question of isomorphism between free group factors. Eventhough this goal is not yet achieved, let us quote a few results on von Neumann algebras which were proved thanks to free probability machinery [56], [57], [124].
In the 1990's, Voiculescu [121] proved that large random matrices are asymptotically free as their size go to infinity. Hence, large random matrices became a source for constructing many non-commutative laws, with nice properties with respect to freeness. Thus, free probability can be considered as the natural asymptotic large random matrices framework. Conversely, if one believes that any tracial state could be approximated by the empirical distribution of large matrices (which we shall define more precisely later), which would answer in the affirmative a well known question of A. Connes, then any tracial state could be obtained as such a limit.
In this context, one often studies the asymptotic behavior of traces of polynomial functions of several random matrices with size going to infinity, trying to deduce from this limit either intuition or results concerning tracial states. For instance, free probability and large random matrices can be used to construct counter examples to some operator algebra questions.

Combinatorics, enumeration of maps and matrix models
It is well known that the evaluation of the expectation of traces of random matrices possesses a combinatorial nature. For instance, if one considers a N × N symmetric or Hermitian matrix X N with i.i.d centered entries with covariance N −1 , it is well known that E[N −1 Tr(X p N )] converges toward 0 if p is odd and toward the Catalan number C p 2 if p is even. C p is the number of non crossing partitions of {1, · · · , 2p} and arises very often in combinatorics. This idea was pushed forward by J. Harer and D. Zagier [68] who computed exactly moments of the trace of X p N to enumerate maps with given number of vertices and genus. This combinatorial aspect of large random matrices was developed in the free probability context by R. Speicher [113]. This strategy was considerably generalized by 't Hooft who saw that matrix integrals such as with a polynomial function P and independent copies X i N of X N , can be seen as generating functions for the enumeration of maps of various types. The formal proof follows from Feynman diagrams expansion. This relation is nicely summarized in an article by A. Zvonkin [136] and we shall describe it more precisely in Chapter 5. One-matrix integrals can be used to enumerate various maps of arbitrary genus (maps with a given genus g appearing as the N −2g correction terms in the expansion of Z N (P )), and several matrix integrals can serve to consider the case where the vertices of these maps are colored, i.e. can take different states. For example, twomatrix integrals can therefore serve to define an Ising model on random graphs. Matrix models were also used in physics to construct string theory models. Since string theory concerns maps with arbitrary genus, matrix models have to be considered at criticality and with temperature parameters well tuned with the dimension in order to have any relevance in this domain. It seems that this subject had a great revival in the last few years, but it seems still far from mathematical (or at least my) understanding. Haar distributed Unitary matrices also can be used to enumerate combinatorial objects due to their relation with representations of the symmetric group (cf. [34] for instance). Nice applications to the enumeration of magic squares can be found in [38]. In this domain, one tries to estimate integrals such as Z N (P ), and in particular tries to obtain the full expansion of log Z N (P ) in terms of the dimension N . This could be done rigorously so far only for one matrix models by use of Riemann-Hilbert problem techniques by J. Mc Laughlin et N. Ercolani [46]. First order asymptotics for a few several-matrix models could be obtained by orthogonal polynomial methods by M. L. Mehta [93,90,32] and by large deviations techniques in [61]. The physics literature on the subject is much more consistent as can be seen on the arxiv (see work by V. Kazakov, I. Kostov, M. Staudacher, B. Eynard, P. Zinn Justin etc.). 6. Large random matrices, random partitions and determinantal laws It is well know [93] that Gaussian matrices have a determinantal form, i.e. the law of the eigenvalues (λ 1 , · · · , λ N ) of a Wigner matrix with complex Gaussian entries (also called the GUE) is given by with Z N the normalizing constant and Because ∆ is a determinant, specific techniques can be used to study for instance the law of the top eigenvalue or the spacing distribution in the bulk or next to the top (cf. [117]). Such laws appear actually in different contexts such as random partitions as illustrated in the work of K. Johansson [77] or tilling problems [78]. For more general remarks on the relation between random matrices and random partitions, see [101]. In fact, determinantal laws appear naturally when non-intersecting paths are involved. Indeed, following [83], if k T is the transition probability of a homogeneous continuous Markov process, and P N T the distribution of N independent copies X N t = (x 1 (t), · · · , x N (t)) of this process, then for any X = (x 1 , · · · , x N ), x 1 < x 2 < · · · < x N , Y = (y 1 , · · · , y N ), y 1 < y 2 < · · · < y N , the reflection principle shows that P (X N (0) = X, X N (T ) = Y |∀t ≥ 0, x 1 (t) ≤ x 2 (t) ≤ · · · x N (t)) = C(x)det (k T (x i , y j )) 1≤i≤N 1≤j≤N (1.0.1) with C(x) −1 = det (k T (x i , y j )) 1≤i≤N 1≤j≤N dy.
This might provide an additional motivation to study determinantal laws. Even more striking is the occurrence of large Gaussian matrices laws for the problem of the longest increasing subsequence [8], directed polymers and the totally asymmetric simple exclusion process [75]. These relations are based on bijections with pairs of Young tableaux. In fact, the law of the hitting time of the totally asymmetric simple exclusion process (TASEP) starting from Heaviside initial condition can be related with the law of the largest eigenvalue of a Wishart matrix. Let us remind the reader that the (TASEP) is a process with values in {0, 1} Z , 0 representing the fact that the site is empty and 1 that it is occupied, the dynamics of which are described as follows. Each site of Z is equipped A. Guionnet/Large deviations for random matrices 80 with a clock which rings at times with exponential law. When the clock rings at site i, nothing happens if there is no particle at i or if there is one at i + 1. Otherwise, the particle jumps from i to i + 1. Once this clock rang, it is replaced by a brand new independent clock. K. Johansson [75] considered these dynamics starting from the initial condition where there is no particles on Z + but one particle on each site of Z − . The paths of the particles do not intersect by construction and therefore one can expect the law of the configurations to be determinantal. The main question to understand is to know where the particle which was at site −N , N ∈ N, at time zero will be at time T . In other words, one wants to study the time H(N, M ) that the particle which was initially at −N needs to get to M − N . K. Johansson [75] has shown that H(M, N ) has the same law as of the largest eigenvalue of a Gaussian complex Wishart matrix Gaussian entries with covariance 2 −1 . This remark allowed him to complete the law of large numbers result of Rost [106] by the study of the fluctuations of order N 1 3 . This paper opens the field of investigation of diverse growth processes (cf. Forrester [53]), to the problem of generalizing this result to different initial conditions or to other problems such as tilling models [78]. In this last context, one of the main results is the description of the fluctuation of the boundary of the tilling in terms of the Airy process (cf. M. Prahofer and H. Spohn [114] and K. Johansson [79]). In this set of problems, one usually meets the problem of analyzing the largest eigenvalue of a large matrix, which is a highly non trivial analysis since the eigenvalues interact by a Coulomb gas potential.
In short, large random matrices became extremely fashionable during the last ten years. It is somewhat a pity that there is no good introductory book to the field. Having seen the six aspects of the topic I tried to describe above and imagining all those I forgot, the task looks like a challenge.
These notes are devoted to a very particular aspect of the study of large random matrices, namely, the study of the deviations of the law of large random matrices macroscopic quantities such as their spectral measures. It is only connected to points 4 and 5 listed above. Since large deviations results are refinements of law of large numbers theorems, let us briefly summarize these last results here.
It has been known since Wigner that the spectral measure of Wigner matrices converges toward the semicircle law almost surely. More precisely, let us consider a Wigner matrix, that is a N × N selfadjoint matrix X N with independent (modulo the symmetry constraint) equidistributed centered entries with covariance N −1 . Let (λ 1 , · · · , λ N ) be the eigenvalues of X N . Then, it was shown by Wigner [131], under appropriate assumptions on the moments of the entries, that the spectral measureμ N = N −1 δ λi converges almost surely toward the semi-circle distribution This result was originally proved by estimating the moments {N −1 Tr((X N ) p ), p ∈ N}, which is a common strategy to study the spectral measure of self-adjoint random matrices.
This convergence can also be proved by considering the Stieljes transform of the spectral measure following Z. Bai [4], which demands less hypothesis on the moments of the entries of X N . In the case of Gaussian entries, this result can be easily deduced from the large deviation principle of Section 3. The convergence of the spectral measure was generalized to Wishart matrices (matrices of the form X N R N (X N ) * with a matrix X N with independent entries and a diagonal matrix R N ) by Pastur and Marchenko [103]. Another interesting question is to wonder, if you are given two arbitrary large matrices (A, B) with given spectrum, how the spectrum of the sum of these two matrices behave. Of course, this depends a lot on their eigenvectors. If one assumes that A and B have the same eigenvectors and i.i.d eigenvalues with law µ and ν respectively, the law of the eigenvalues of A + B is the standard convolution µ * ν. On the contrary, if the eigenvectors of A and B are a priori not related, it is natural to consider A + U BU * with U following the Haar measure on the unitary group. It was proved by D. Voiculescu [122] that the spectral measure of this sum converges toward the free convolution µ A ⊞ µ B if the spectral measure of A (resp. B) converges toward µ A (resp . µ B ) as the size of the matrices goes to infinity. More generally, if one considers the normalized trace of a word in two independent Wigner matrices then Voiculescu [122] proved that it converges in expectation (but actually also almost surely) toward a limit which is described by the trace of this word taken at two free semi-circular variables. We shall describe the notion of freeness in Chapter 6.
The question of the fluctuations of the spectral measure of random matrices was initiated in 1982 by D. Jonsson [81] for Wishart matrices by using moments method. This approach was applied and improved by A. Soshnikov an Y. Sinai [110] who considered Wigner matrices with non Gaussian entries but sufficient bounds on their moments and who obtained precise estimates on the moments {N −1 Tr((X N ) p ), p ∈ N}. Such results were generalized to the non-commutative setting where one considers polynomial functions of several independent random matrices by T. Cabanal Duvillard [28] and myself [60]. Recently, J. Mingo and R. Speicher [96] gave a combinatorial interpretation of the limiting covariance via a notion of second order freeness which places the problem of fluctuations to its natural non-commutative framework. They applied it with P. Sniady [97] to unitary matrices, generalizing to a non-commutative framework the results of P. Diaconis and M. Shahshahani [37] showing that traces of moments of unitary matrices converge towards Gaussian variables. In [60], I used the non-commutative framework to study fluctuations of the spectral measure of Gaussian band matrices, following an idea of D. Shlyakhtenko [109]. On the other hand, A. Khorunzhy, B. Khoruzhenko and L. Pastur [89] and more recently Z. Bai and J.F Yao [6] developed Stieljes transforms technology to study the central limit theorems for entries with eventually only the four first moments bounded. Such techniques apply at best to prove central limit theorem for nice analytic functions of the matrix under study. K. Johansson [73] considered Gaussian entries in order to take advantage that in this case, the eigenvalues have a simple joint law, given by a Coulomb gas type interaction. In this case, he could describe the optimal set of functions for which a central limit theorem can hold. Note here that in [60], the covariance is described in terms of a positive symmetric operator and therefore such an optimal set should be described as the domain of this operator. However, because this operator acts on non-commutative functions, its domain remains rather mysterious. A general combinatorial approach for understanding the fluctuations of band matrices with entries satisfying for instance Poincaré inequalities and rather general test functions has recently been undertaken by G. Anderson and O. Zeitouni [2].
In these notes, we shall study the error to the typical behavior in terms of large deviations in the cases listed above, with the restriction to Gaussian entries. They rely on a series of papers I have written on this subject with different coauthors [10,18,29,30,42,60,62,64,65] and try to give a complete accessible overview of this work to uninitiated readers. Some statements are improved or corrected and global introductions to free probability and hydrodynamics/large deviations techniques are given. While full proofs are given in Chapter 3 and rather detailed in Chapter 4, Chapter 7 only outlines how to adapt the ideas of Chapter 4 to the non-commutative setting. Chapter 5 uses the results of Chapter 1 and Chapter 4 to study matrix models. These notes are supposed to be accessible to non probabilists, if they assume some facts concerning Itô's calculus.
First, we shall consider the case of Wigner Gaussian matrices (see Chapter 3). The case of non Gaussian entries is still an open problem. We generalize our approach to non centered Gaussian entries in Chapter 4, which corresponds to the deviations of the law of the spectral measure of A + X with a deterministic diagonal matrix A and a Wigner matrix X. This result in turn gives the first order asymptotics of spherical integrals. The asymptotics of spherical integrals allows us to estimate matrix integrals in the case of quadratic (also called AB) interaction. Such a study puts on a firm ground some physics papers of Matytsin for instance. It is related with the enumeration of colored planar maps. We finally present the natural generalization of these results to several matrices, which deals with the so-called free micro-states entropy.

Frequently used notations
For N ∈ N, M N will denote the set of N × N matrices with complex entries, H N (resp. S N ) will denote the set of N × N Hermitian (resp. symmetric) matrices. U (N ) (resp. O(N ), resp S(N )) will denote the unitary (resp. orthogonal, resp. symplectic) group. We denote Tr the trace on M N , Tr(A) = N i=1 A ii and tr the normalized trace tr(A) = N −1 Tr(A).
To denote an ordered product of non-commutative variables X 1 , · · · X n (such as matrices), we write in short C[X 1 , · · · , X n ] (resp. C X 1 , · · · , X n ) denotes the space of commutative (resp. non-commutative) polynomials in n variables for which ) for all choices of indices {j i , 1 ≤ i ≤ n, n ∈ N} (resp. eventually for a choice of {j i , 1 ≤ i ≤ n, n ∈ N}) and for all permutation σ (resp. eventually for some permutation σ).
For a Polish space X, P(X) shall denote the set of probability measures on X. P(X) will be equipped with the usual weak topology, ie a sequence µ n ∈ P(X) converges toward µ iff for any bounded continuous function f on X, µ n (f ) converges toward µ(f ). Here, we denote in short For two Polish spaces X, Y and a measurable function φ : X → Y , for any µ ∈ P(X) we denote φ # µ ∈ P(Y ) the push forward of µ by φ, that is the probability measure on Y such that for any bounded continuous f : Y → R, For a given selfadjoint N × N matrix A, we denote (λ 1 (A), · · · , λ N (A)) its N (real) eigenvalues and byμ N A its spectral measurê For two Polish spaces X, Y we denote by C 0 b (X, Y ) (or C(X, Y ) when no ambiguity is possible) the space of bounded continuous functions from X to Y . For instance, we shall denote C([0, 1], P(R)) the set of continuous processes on [0, 1] with values in the set P(R) of probability measures on R, endowed with its usual weak topology. For a measurable set Ω of R × [0, 1], C p,q b (Ω) denotes the set of real-valued functions on Ω which are p times continuously differentiable with respect to the (first) space variable and q times continuously differentiable with respect to the (second) time variable with bounded derivatives. C p,q c (Ω) will denote the functions of C p,q b (Ω) with compact support in the interior of the measurable set Ω. For a probability measure µ on a Polish space X, L p (dµ) denotes the space of measurable functions with finite p th moment under µ. We shall say that an equality holds in the sense of distribution on a measurable set Ω if it holds, once integrated with respect to any C ∞,∞ c (Ω) functions.

Basic notions of large deviations
Since these notes are devoted to the proof of large deviations principles, let us remind the reader what is a large deviation principle and the few main ideas which are commonly used to prove it. We refer the reader to [41] and [43] for further developments. In what follows, X will be a Polish space (that is a complete separable metric space). We then have Definition 2.1.
• I : X → R + ∪ {+∞} is a rate function, iff it is lower semi-continuous, i.e. its level sets {x ∈ X : I(x) ≤ M } are closed for any M ≥ 0. It is a good rate function if its level sets {x ∈ X : I(x) ≤ M } are compact for any M ≥ 0.
• A sequence (µ N ) N ∈N of probability measures on X satisfies a large deviation principle with speed (or in the scale) a N (going to infinity with N ) and rate function I iff a)For any closed subset F of X, The proof of a large deviation principle often proceeds first by the proof of a weak large deviation principle (which is defined as in definition (2.1) except that the upper bound is only required to hold for compact sets) and the so-called exponential tightness property

85
A. Guionnet and Suppose that for all x ∈ X, Then, µ N satisfies a weak large deviation principle with rate function I.
As an immediate corollary, we find that if d is a distance on X compatible with the weak topology and B(x, δ) = {y ∈ X : d(y, x) < δ}, Then, µ N satisfies a weak large deviation principle with rate function I.
From a given large deviation principle one can deduce a large deviation principle for other sequences of probability measures by using either the so-called contraction principle or Laplace's method. Namely, let us recall the contraction principle (cf. Theorem 4.2.1 in [41]) : Theorem 2.5. Assume that (µ N ) N ∈N satisfies a large deviation principle with good rate function I with speed a N . Then for any function F : X → Y with values in a Polish space Y which is continuous, the image (F # µ N ) N ∈N ∈ P(Y ) N also satisfies a large deviation principle with the same speed and good rate function given for any y ∈ Y by Laplace's method (or Varadhan's Lemma) says the following (cf. Theorem 4.3.1 [41]): Theorem 2.6. Assume that (µ N ) N ∈N satisfies a large deviation principle with good rate function I. Let → R be a bounded continuous function. Then, Moreover, the sequence satisfies a large deviation principle with good rate function Bryc's theorem ( [41], Section 4.4) gives an inverse statement to Laplace theorem. Namely, assume that we know that for any bounded continuous function F : X → R, there exists Then, Bryc's theorem says that µ N satisfies a weak large deviation principle with rate function This actually provides another approach to proving large deviation principles : We see that we need to compute the asymptotics (2.0.1) for as many bounded continuous functions as possible. This in general can easily be done only for some family of functions (for instance, if µ N is the law of N −1 N i=1 x i for independent equidistributed bounded random variable x i 's, a N = N , such quantities are easy to compute for linear functions F ). This will always give a weak large deviation upper bound with rate function given as in (2.0.2) but where the supremum is only taken on this family of functions. The point is then to show that in fact this family is sufficient, in particular this restricted supremum is equal to the supremum over all bounded continuous functions.
Large deviations for the spectral measure of large random matrices

Large deviations for the spectral measure of Wigner Gaussian matrices
Let X N,β = X N,β ij be N × N real (resp. complex) Gaussian Wigner matrices when β = 1 (resp. β = 2, resp. β = 4) defined as follows. They are N × N self-adjoint random matrices with entries where (e i β ) 1≤i≤β is a basis of R β , that is e 1 1 = 1, e 1 2 = 1, e 2 2 = i. This definition can be extended to the case β = 4 when N is even by choosing X N,β = X N,β ij 1≤i,j≤ N 2 with X N,β kl a 2 × 2 matrix defined as above but with (e k β ) 1≤k≤4 the Pauli matrices are independent equidistributed centered Gaussian variables with variance 1. (X N,2 , N ∈ N) is commonly referred to as the Gaussian Unitary Ensemble (GUE), (X N,1 , N ∈ N) as the Gaussian Orthogonal Ensemble (GOE) and (X N,4 , N ∈ N) as the Gaussian Symplectic Ensemble (GSE) since they can be characterized by the fact that their laws are invariant under the action of the unitary, orthogonal and symplectic group respectively (see [93]). X N,β has N real eigenvalues (λ 1 , λ 2 , · · · , λ N ). Moreover, by invariance of the distribution of X N,1 (resp. X N,2 , resp. X N,4 ) under the action of the orthogonal group O(N ) (resp. the unitary group U (N ), resp. the symplectic group S(N )), it is not hard to check that its eigenvectors will follow the Haar measure m β N on O(N ) (resp. U (N ), resp. S(N )) in the case β = 1 (resp. β = 2, resp. β = 4). More precisely, a change of variable shows that for any Borel subset A ⊂ M N ×N (R) (resp. M N ×N (C)), with D(λ) = diag(λ 1 , λ 2 , · · · , λ N ) the diagonal matrix with entries (λ 1 ≤ λ 2 ≤ · · · ≤ λ N ) and Q N β the joint law of the eigenvalues given by Such changes of variables are explained in details in the book in preparation of P. Forrester [53]. Using this representation, it was proved in [10] that the law of the spectral measureμ N = 1 N N i=1 δ λi , as a probability measure on IR, satisfies a large deviation principle.
In the following, we will denote P(IR) the space of probability measure on IR and will endow P(IR) with its usual weak topology. We now can state the main result of [10].
2) The law of the spectral measureμ N = 1 N N i=1 δ λi on P(R) satisfies a full large deviation principle with good rate function I β in the scale N 2 .
Proof : We here skip the proof of 1.b.2. and 1.d and refer to [10] for these two points. The proof of the large deviation principle is rather clear; one writes the density Q N β of the eigenvalues as If the function x, y → 1 x =y f (x, y) were bounded continuous, the large deviation principle would result directly from a standard Laplace method (cf. Theorem 2.6), where the entropic term coming from the underlying Lebesgue measure could be neglected since the scale of the large deviation principle we are considering is N 2 ≫ N . In fact, the main point is to deal with the fact that the logarithmic function blows up at the origin and to control what happens at the diagonal ∆ := {x = y}. In the sequel, we letQ N β be the non-normalized positive measureQ N β = Z N β Q N β and prove upper and lower large deviation estimates with rate function This is of course enough to prove the second point of the theorem by taking f (x, y)dν(x)dν(y).
To obtain that this limit is equal to 3 8 β, one needs to show that the infimum is taken at the semi-circle law and then compute its value. Alternatively, Selberg's formula (see [93]) allows to compute Z N β explicitly from which its asymptotics are easy to get. We refer to [10] for this point. The upper bound is obtained as follows. Noticing thatμ N ⊗μ N (∆) = N −1 Q N β -almost surely (since the eigenvalues are almost surely distinct), we see that for any M ∈ IR + , Q N β a.s., Therefore, for any Borel subset A ∈ P(IR), any M ∈ IR + , We now show that if A is closed, M can be taken equal to infinity in the above right hand side. We first observe that I β (µ) = β 2 f (x, y)dµ(x)dµ(y) − 3 8 β is a good rate function. Indeed, since f is the supremum of bounded continuous functions, I β is lower semi-continuous, i.e. its level sets are closed. Moreover, because f blows up when x or y go to infinity, its level sets are compact.
where K M is the set insuring, with the uniform boundedness of N −2 log Z N β , the exponential tightness of Q N β . Hence, me may assume that A is compact, and actually a ball surrounding any given probability measure with arbitrarily small radius (see Chapter 2). Let B(µ, δ) be a ball centered at µ ∈ P(IR) with radius δ for a distance compatible with the weak topology, Since µ → f (x, y) ∧ M dµ(x)dµ(y) is continuous for any µ ∈ P(IR), (3.1.2) shows that for any probability measure µ ∈ P(IR) We can finally let M going to infinity and use the monotone convergence theorem which asserts that finishing the proof of the upper bound.
To prove the lower bound, we can also proceed quite roughly by constraining the eigenvalues to belong to very small sets. This will not affect the lower bound again because of the fast speed N 2 ≫ N log N of the large deviation principle we are proving. Again, the difficulty lies in the singularity of the logarithm. The proof goes as follows; let ν ∈ P(IR). Since I β (ν) = +∞ if ν has an atom, we can assume without loss of generality that it does not when proving the lower bound. We construct a discrete approximation to ν by setting and ν N = 1 N N i=1 δ x i,N (note here that the choice of the length (N + 1) −1 of the intervals rather than N −1 is only done to insure that x N,N is finite). Then, ν N converges toward ν as N goes to infinity. Thus, for any δ > 0, for N large enough, if we set ∆ N : But, when λ 1 < λ 2 · · · < λ N and since we have constructed the x i,N 's such that x 1,N < x 2,N < · · · < x N,N , we have, for any integer numbers (i, j), the lower bound As a consequence, (3.1.4) gives Moreover, one can easily bound from below the last term in the right hand side of (3.1.5) and find Hence, (3.1.5) implies : Indeed, since x → log(x) increases on IR + , we notice that, with The same arguments holds for x 2 dν(x) and N i=1 (x i,N ) 2 . We can conclude that Letting δ going to zero gives the result. Note here that we have used a lot monotonicity arguments to show that our approximation scheme converges toward the right limit. Such arguments can not be used in the setup considered by G. Ben Arous and O. Zeitouni [11] when the eigenvalues are complex; this is why they need to perform first a regularization by convolution.

Discussion and open problems
There are many ways in which one would like to generalize the previous large deviations estimates; the first would be to remove the assumption that the entries are Gaussian. It happens that such a generalization is still an open problem. In fact, when the entries of the matrix are not Gaussian anymore, the law of the eigenvalues is not independent of that of the eigenvectors and becomes rather complicated. With O. Zeitouni [64], I proved however that concentration of measures result hold. In fact, set and A is a self-adjoint non-random complex matrix with entries {A ij , 1 ≤ i ≤ j ≤ N } uniformly bounded by, say, a. Our main result is satisfy the logarithmic Sobolev inequality with uniform constant c, then for any Lipschitz function f , for any δ > 0, This result is a direct consequence of standard results about concentration of measure due to Talagrand and Herbst and the observation that if f : IR → IR is a Lipschitz function, then ω → tr(f (X A (ω))) is also Lipschitz and its Lipschitz constant can be evaluated, and that if f is convex, ω → tr(f (X A (ω))) is also convex.
Note here that the matrix A can be taken equal to {A ij = 1, 1 ≤ i, j ≤ N } to recover results for Wigner's matrices. However, the generalization is here costless and allows to include at the same time more general type of matrices such as band matrices or Wishart matrices. See a discussion in [64].
Eventhough this estimate is on the right large deviation scale, it does not precise the rate of deviation toward a given spectral distribution. This problem seems to be very difficult in general. The deviations of a empirical moments of matrices with eventually non-centered entries of order N −1 are studied in [39]; in this case, deviations are typically produced by the shift of all the entries and the scaling allows to see the random matrix as a continuous operator. This should not be the case for Wigner matrices.
Another possible generalization is to consider another common model of Gaussian large random matrices, namely Wishart matrices. Sample covariance matrices (or Wishart matrices) are matrices of the form Here, X N,M is an N × M matrix with centered real or complex i.i.d. entries of covariance N −1 and T M is an M × M Hermitian (or symmetric) matrix.
These matrices are often considered in the limit where M/N goes to a constant α > 0. Let us assume that M ≤ N , and hence α ∈ [0, 1], to fix the notations. Then, Y N,M has N − M null eigenvalues. Let (λ 1 , · · · , λ M ) be the M non trivial remaining eigenvalues and denoteμ In the case where T M = I and the entries of X N,M are Gaussian, Hiai and Petz [71] proved that the law ofμ M satisfies a large deviation principle. In this case, the joint law of the eigenvalues is given by so that the analysis performed in the previous section can be generalized to this model. In the case where T M is a positive definite matrix, we have the formula is the normalizing constant such that σ β M has mass one. This formula can be found in [72, (58) and (95)]. Hence, we see that the study of the deviations of the spectral measurê µ M = 1 M M i=1 δ λi when the spectral measure of T M converges, is equivalent to the study of the asymptotics of the spherical integral I The spherical integral also appears when one considers Gaussian Wigner matrices with non centered entries. Indeed, if we let with a self adjoint deterministic matrix M N (which can be taken diagonal without loss of generality) and a Gaussian Wigner matrix X N,β as considered in the previous section, then the law of the eigenvalues of Y N,β is given by We shall in the next section study the asymptotics of the spherical integral by studying the deviations of the spectral measure of Y N,β . This in turn provides the large deviation principle for the law ofμ M under σ β M by Laplace's method (see [65], Theorem 1.2).
Let us finish this section by mentioning two other natural generalizations in the context of large random matrices. The first consists in allowing the covariances of the entries of the matrix to depend on their position. More precisely, one considers the matrix . When φ(x, y) = 1 |x+y|≤c , the matrix is called a band matrix, but the case where φ is smooth can as well be studied as a natural generalization. The large deviation properties of the spectral measure of such a matrix was studied in [60] but only a large deviation upper bound could be obtained so far. Indeed, the non-commutativity of the matrices here plays a much more dramatic role, as we shall discuss later. In fact, it can be seen by the so-called linear trick (see [69] for instance) that studying the deviations of the words of several matrices can be related with the deviations of the spectral measure of matrices with complicated covariances and hence this last problem is intimately related with the study of the so-called microstates entropy discussed in Chapter 7. The second generalization is to consider a matrix where all the entries are independent, and therefore with a priori complex spectrum. Such a generalization was considered by G. Ben Arous and O. Zeitouni [11] in the case of the so-called Ginibre ensemble where the entries are identically distributed standard Gaussian variables, and a large deviation principle for the law of the spectral measure was derived. Let us further note that the method developed in the last section can as well be used if one considers random partitions following the Plancherel measure. In the unitary case, this law can be interpreted as a discretization of the GUE (see S.Kerov [85] or K. Johansson [74]) because the dimension of a representation is given in terms of a Coulomb gas potential. Using this similarity, one can prove large deviations principle for the empirical distribution of these random partitions (see [62]); the rate function is quite similar to that of the GUE except that, because of the discrete nature of the partitions, deviations can only occur toward probability measures which are absolutely continuous with respect to Lebesgue measure and with density bounded by one. More general large deviations techniques have been developed to study random partitions for instance in [40] for uniform distribution.
Let us finally mention that large deviations can as well be obtained for the law of the largest eigenvalue of the Gaussian ensembles (cf. e.g. [9], Theorem 6.2) Chapter 4

Asymptotics of spherical integrals
In this chapter, we shall consider the spherical integral where m β N denotes the Haar measure on the orthogonal (resp. unitary, resp. symplectic) group when β = 1 (resp. β = 2, resp. β = 4). This object is actually central in many matters; as we have seen, it describes the law of Gaussian Wishart matrices and non centered Gaussian Wigner matrices. It also appears in many matrix models described in physics; we shall describe this point in the next chapter. It is related with the characters of the symmetric group and Schur functions (cf. [107]) because of the determinantal formula below. We shall discuss this point in the next paragraph.
A formula for I (2) N was obtained by Itzykson and Zuber (and more generally by Harish-Chandra) , see [93,Appendix 5]; whenever the eigenvalues of D N and E N are distinct then the VanderMonde determinants associated with D N , E N . Although this formula seems to solve the problem, it is far from doing so, due to the possible cancellations appearing in the determinant so that it is indeed completely unclear how to estimate the logarithmic asymptotics of such a quantity.
To evaluate this integral, we noticed with O. Zeitouni [65] that it was enough to derive a large deviation principle for the law of the spectral measure of non centered Wigner matrices. We shall detail this point in Section 4.1. To prove a large deviation principle for such a law, we improved a strategy initiated with T. Cabanal Duvillard in [29,30] which consists in considering the matrix Y N,β = D N + X N,β as the value at time one of a matrix-valued process where H N,2 (resp. H N,1 , resp. H N,4 ) is a Hermitian (resp. symmetric, resp. symplectic) Brownian motion, that is a Wigner matrix with Brownian motion entries. More explicitly, H N,β is a process with values in the set of N × N self-adjoint matrices with entries is the basis of R β described in the previous chapter. The advantage to consider the whole process Y N,β is that we can then use stochastic differential calculus and standard techniques to study deviations of processes by martingales properties as initiated by Kipnis, Olla and Varadhan [88]. The idea to study properties of Gaussian variables by using their characterization as time marginals of Brownian motions is not new. At the level of deviations, it is very natural since we shall construct infinitesimally the paths to follow to create a given deviation. Actually, it seems to be the right way to consider I (2) N when one realizes that it has a determinantal form according to (1.0.1) and so is by nature related with non-intersecting paths. There is still no more direct study of these asymptotics of the spherical integrals; eventhough B. Collins [34] tried to do it by expanding the exponential into moments and using cumulants calculus, obtaining such asymptotics would still require to be able to control the convergence of infinite signed series. In the physics literature, A. Matytsin [91] derived the same asymptotics for the spherical integrals than these we shall describe. His methods are quite different and only apply a priori in the unitary case. I think they might be written rigorously if one could a priori prove sufficiently strong convergence of the spherical integral as N goes to infinity. As a matter of fact, I do not think there is any other formula for the limiting spherical integral in the physics literature, but mostly saddle point studies of this a priori converging quantity. I do however mention recent works of B. Eynard and als. [50] and M. Bertola [14] who produced a formula of the free energy for the model of matrices coupled in chain by means of residues technology. However, this corresponds to the case where the matrices E N , D N of the spherical integral have a random spectrum submitted to a smooth polynomial potential and it is not clear how to apply such technology to the hard constraint case where the spectral measures of D N , E N converge to a prescribed limit.

Asymptotics of spherical integrals and deviations of the spectral measure of non centered Gaussian Wigner matrices
Let Y N,β = D N + X N,β with a deterministic diagonal matrix D N and X N,β a Gaussian Wigner matrix. We now show how the deviations of the spectral measure of Y N,β are related to the asymptotics of the spherical integrals. To this end, we shall make the following hypothesis We defineĪ 2) For any probability measure µ ∈ P(IR), 3) We let, for any µ ∈ P(IR), Before going any further, let us point out that these results give interesting asymptotics for Schur functions which are defined as follows.
• a Young shape λ is a finite sequence of non-negative integers (λ 1 , λ 2 , . . . , λ l ) written in non-increasing order. One should think of it as a diagram whose ith line is made of λ i empty boxes: for example, We denote by |λ| = i λ i the total number of boxes of the shape λ.
In the sequel, when we have a shape λ = (λ 1 , λ 2 , . . .) and an integer N greater than the number of lines of λ having a strictly positive length, we will define a sequence l associated to λ and N , which is an N -tuple of integers l i = λ i + N − i. In particular we have that l 1 > l 2 > . . . > l N ≥ 0 and l i − l i+1 ≥ 1. • for some fixed N ∈ IN, a Young tableau will be any filling of the Young shape above with integers from 1 to N which is non-decreasing on each line and (strictly) increasing on each column. For each such filling, we define the content of a Young tableau as the N -tuple (µ 1 , . . . , µ N ) where µ i is the number of i's written in the tableau.
Notice that, for N ∈ IN, a Young shape can be filled with integers from 1 to N if and only if λ i = 0 for i > N . • for a Young shape λ and an integer N , the Schur polynomial s λ is an where the sum is taken over all Young tableaux T of fixed shape λ and (µ 1 , . . . , µ N ) is the content of T . On a statistical point of view, one can think of the filling as the heights of a surface sitting on the tableau λ, µ i being the surface of the height i. s λ is then a generating function for these heights when one considers the surfaces uniformly distributed under the constraints prescribed for the filling. Note that s λ is positive whenever the x i 's are and, although it is not obvious from this definition (cf. for example [107] for a proof), s λ is a symmetric function of the x i 's and actually (s λ , λ) form a basis of symmetric functions and hence play a key role in representation theory of the symmetric group.
where the A i 's are the eigenvalues of A. Then, by Weyl formula (cf. Theorem 7.5.B of [130]), for any matrices V, W , Then, because s λ also has a determinantal formula, we can see (cf. [107] and [62]) where l N denotes the diagonal matrix with entries N −1 (λ i − i + N ) and ∆ the Vandermonde determinant. Therefore, we have the following immediate corollary to Theorem 4.2 : We pick a sequence of Hermitian matrix E N and assume that (D N , E N ) N ∈N satisfy hypothesis 4.1 and that Σ(µ D ) > −∞. Then, Proof of Theorem 4.2 : To simplify, let us assume that E N andÊ N are uniformly bounded by a constant M . Let δ ′ > 0 and {A j } j∈J be a partition of [−M, M ] such that |A j | ∈ [δ ′ , 2δ ′ ] and the endpoints of A j are continuity points of µ E . DenoteÎ We construct a permutation σ N so that |Ê(ii) −Ē(σ N (i), σ N (i))| < 2δ except possibly for very few i's as follows.
Then, choose and fix a permutation Hence, we obtain, taking d max N (D N ,Ê N ) and the reverse inequality by symmetry. This proves the first point of the theorem when (Ē N ,Ê N ) are uniformly bounded. The general case (which is not much more complicated) is proved in [65] and follows from first approximatinḡ E N andÊ N by bounded operators using (4.1.3).
The second and the third points are proved simultaneously : in fact, writing we see that the first point gives, since I N,δ goes to zero as N goes to infinity first and then δ goes to zero.
An equivalent way to obtain this relation is to use (1.0.1) together with the continuity of spherical integrals in order to replace the fixed values at time one of the Brownian motion B 1 = Y by an average over a small ball The large deviation principle proved in the third chapter of these notes shows 2) and 3).
Note for 3) that if I β (µ E ) = +∞, J(µ D , µ E ) = +∞ so that in this case the result is empty since it leads to an indetermination. Still, if I β (µ D ) < ∞, by symmetry of I (β) , we obtain a formula by exchanging µ D and µ E . If both I β (µ D ) and I β (µ E ) are infinite, we can only argue, by continuity of I (β) , that for any sequence (µ ǫ E ) ǫ>0 of probability measures with uniformly bounded variance and finite entropy I β converging toward µ E , A more explicit formula is not yet available. Note here that the convergence of the spherical integral is in fact not obvious and here given by the fact that we have convergence of the probability of deviation toward a given probability measure for the law of the spectral measure of non-centered Wigner matrices.

Large deviation principle for the law of the spectral measure of non-centered Wigner matrices
The goal of this section is to prove the following theorem. By Bryc's theorem (2.0.2), it is clear that the above large deviation principle statement is equivalent to the fact that for any bounded continuous function f on P(IR), It is not clear how one could a priori study such limits, except for very trivial functions f . However, if we consider the matrix valued process Y N,β (t) = D N + H N,β (t) with Brownian motion H N,β described in (4.0.1) and its spectral measure procesŝ we may construct martingales by use of Itô's calculus. Continuous martingales lead to exponential martingales, which have constant expectation, and therefore allows one to compute the exponential moments of a whole family of functionals ofμ N . . This idea will give easily a large deviation upper bound for the law of (μ N t , t ∈ [0, 1]), and therefore for the law ofμ N Y , which is the law ofμ N 1 . The difficult point here is to check that it is enough to compute the exponential moments of this family of functionals in order to obtain the large deviation lower bound.
Let us now state more precisely our result. We shall consider {μ N (t), t ∈ [0, 1]} as an element of the set C([0, 1], P(IR)) of continuous processes with values in P(IR). The rate function for these deviations shall be given as follows. For any f, g ∈ C 2,1 b (R × [0, 1]), any s ≤ t ∈ [0, 1], and any ν . ∈ C([0, 1], P(IR)), we let Set, for any probability measure µ ∈ P(IR), Then, the main theorem of this section is the following Remark 4.6: In [65], the large deviation principle was only obtained for marginals; it was proved at the level of processes in [66].
Note that the application The main point to prove Theorem 4.5 is to observe that the evolution ofμ N is described, thanks to Itô's calculus, by an autonomous differential equation. This is easily seen from the fact observed by Dyson [45] (see also [93], Theorem 8.2.1) that the eigenvalues (λ i t , 1 ≤ i ≤ N, 0 ≤ t ≤ 1) of (Y N,β (t), 0 ≤ t ≤ 1) are described as the strong solution of the interacting particle system with diag(λ 1 0 , · · · , λ N 0 ) = D N and β = 1, 2 or 4. This is the starting point to use Kipnis-Olla-Varadhan's papers ideas [88,87]. These papers concerns the case where the diffusive term is not vanishing (βN is of order one). The large deviations for the law of the empirical measure of the particles following (4.2.11) in such a scaling have been recently studied by Fontbona [52] in the context of Mc Kean-Vlasov diffusion with singular interaction. We shall first recall for the reader these techniques when one considers the empirical measures of independent Brownian motions as presented in [87]. We will then describe the necessary changes to adapt this strategy to our setting.

Large deviations from the hydrodynamical limit for a system of independent Brownian particles
Note that the deviations of the law of the empirical measure of independent Brownian motions on path space are well known by Sanov's theorem which yields (cf. [41], Section 6.2) ), the contraction principle shows immediately that the law of (μ N t , t ∈ [0, 1]) under W ⊗N satisfies a large deviation principle with rate function given, for p ∈ C([0, 1], P(R)), by Here, (x t ) # µ denotes the law of x t under µ. It was shown by Föllmer [51] and for all f Moreover, we then have Kipnis and Olla [87] proposed a direct approach to obtain this result based on exponential martingales. Its advantage is to be much more robust and to adapt to many complicated settings encountered in hydrodynamics (cf. [86]). Let us now summarize it. It follows the following scheme • Exponential tightness and study of the rate function S Since the rate function S is the contraction of the relative entropy I(.|W), it is clearly a good rate function. This can be proved directly from formula (4.2.13) as we shall detail it in the context of the eigenvalues of large random matrices. Similarly, we shall not detail here the proof thatμ N # W ⊗N is exponentially tight which reduces the proof of the large deviation principle to the proof of a weak large deviation principle and thus to estimate the probability of deviations into small open balls (cf. Chapter 2). We will now concentrate on this last point.
• Itô's calculus: Itô's calculus (cf. [82], Theorem 3.3 p. 149) implies that for The last ingredient of stochastic calculus we want to use is that (cf. [82], Problem 2.28, p. 147) for any bounded continuous martingale m t with bracket < m > t , any λ ∈ R, is a martingale. In particular, it has constant expectation. Thus, we deduce that for all f ∈ (4.2.14) • Weak large deviation upper bound We equip C([0, 1], P(IR)) with the weak topology on P(IR) and the uniform topology on the time variable. It is then a Polish space. A distance compatible with such a topology is for instance given, for any µ, ν ∈ C([0, 1], P(IR)), by D(µ, ν) = sup with a distance d on P(IR) compatible with the weak topology such as where |f | L is the Lipschitz constant of f : We prove here that Therefore, let us assume that p 0 = δ 0 . We set B(p, δ) = {µ ∈ C([0, 1], P(R)) : D(µ, p) ≤ δ}.
• Large deviation lower bound The derivation of the large deviation upper bound was thus fairly easy. The lower bound is a bit more sophisticated and relies on the proof of the following points (a) The solutions to the heat equations with a smooth drift are unique. (b) The set described by these solutions is dense in C([0, 1], P(IR)).
(c) The entropy behaves continuously with respect to the approximations by elements of this dense set. We now describe more precisely these ideas. In the previous section (see (4.2.15)), we have merely obtained the large deviation upper bound from the observation that for all ν ∈ C([0, 1], P(IR)), all δ > 0 and any f ∈ To make sure that this upper bound is sharp, we need to check that for any ν ∈ C([0, 1], P(IR)) and δ > 0, this inequality is almost an equality for some k, i.e there exists k ∈ C 2,1 b ([0, 1], R), In other words that we can find a k such that the probability thatμ N .
belongs to a small neighborhood of ν under the shifted probability measure is not too small. In fact, we shall prove that for good processes ν, we can find k such that this probability goes to one by the following argument.
. Under the shifted probability measure P N,k , it is not hard to see thatμ N . is exponentially tight (indeed, for k ∈ C 2,1 b (R × [0, 1]), the density of P N,k with respect to P is uniformly bounded by e C(k)N with a finite constant C(k) so that P N,k • (μ N . ) −1 is exponentially tight since P • (μ N . ) −1 is). As a consequence,μ N . is almost surely tight. We let µ . be a limit point. Now, by Itô's calculus, for any f ∈ C 2,1 b (R × [0, 1]), any 0 ≤ t ≤ 1, . Since the bracket of M N t (f ) goes to zero, the martingale (M N t (f ), t ∈ [0, 1]) goes to zero uniformly almost surely. Hence, any limit point µ . must satisfy , we say that k is the field associated with µ. Therefore, if we can prove that there exists a unique solution ν . to (4.2.17), we see thatμ N . converges almost surely under P N,k to this solution. This proves the lower bound at any measure-valued path ν . which is the unique solution of (4.2.17), namely for any k ∈ C 2,1 b (R × [0, 1]) such that there exists a unique solution ν k to (4.2.17), where we used in the second line the continuity of µ → T (µ, k) due to our assumption that k ∈ C 2,1 b (R × [0, 1]) and the fact that P N,k 1 sup t∈[0,1] d(μ N t ,ν k )<δ goes to one in the third line. Hence, the question boils down to uniqueness of the weak solutions of the heat equation with a drift. This problem is not too difficult to solve here and one can see that for instance for fields k which are analytic within a neighborhood of the real line, there is at most one solution to this equation. To generalize (4.2.18) to any ν ∈ {S < ∞}, it is not hard to see that it is enough to find, for any such ν, a sequence ν kn for which (4.  Now, observe that S is a convex function so that for any probability measure p ǫ , where in the last inequality we neglected the condition at the initial time to say that S((. − x) # µ) = S(µ) for all x. Hence, since S is also lower semicontinuous, one sees that S(µ * p ǫ ) will converge toward S(µ) for any µ with finite entropy S. Performing also a regularization with respect to time and taking care of the initial conditions allows to construct a sequence ν n with analytic fields satisfying (4.2.19). This point is quite technical but still manageable in this context. Since it will be done quite explicitly in the case we are interested in, we shall not detail it here.

Large deviations for the law of the spectral measure of a non-centered large dimensional matrix-valued Brownian motion
To prove a large deviation principle for the law of the spectral measure of Hermitian Brownian motions, the first natural idea would be, following (4.2.11), to prove a large deviation principle for the law of the spectral measure of , to use Girsanov theorem to show that the law we are considering is absolutely continuous with respect to the law of the independent Brownian motions with a density which only depend on L N and conclude by Laplace's method (cf. Chapter 2). However, this approach presents difficulties due to the singularity of the interacting potential, and thus of the density. Here, the techniques developed in [87] will however be very efficient because they only rely on smooth functions of the empirical measure since the empirical measure are taken as distributions so that the interacting potential is smoothed by the test functions (Note however that this strategy would not have worked with more singular potentials). According to (4.2.11), we can in fact follow the very same approach. We here mainly develop the points which are different.

Itô's calculus
With the notations of (4.2.7) and (4.2.8), we have Proof : It is easy to derive this result from Itô's calculus and (4.2.11). Let us however point out how to derive it directly in the case where f (x) = x k with an integer number k ∈ N and β = 2. Then, for any (i, j) ∈ {1, .., N }, Itô 's calculus gives Let us finally compute the martingale bracket of the normalized trace of the above martingale. We have Similar computations give the bracket of more general polynomial functions.
Remark 4.11:Observe that if the entries were not Brownian motions but diffusions described for instance as solution of a SDE then the evolution of the spectral measure of the matrix would not be autonomous anymore. In fact, our strategy is strongly based on the fact that the variations of the spectral measure under small variations of time only depends on the spectral measure, allowing us to construct exponential martingales which are functions of the process of the spectral measure only. It is easy to see that if the entries of the matrix are not Gaussian, the variations of the spectral measures will depend on much more general functions of the entries than those of the spectral measure.
However, this strategy can also be used to study the spectral measure of other Gaussian matrices as emphasized in [29,60].
From now on, we shall consider the case where β = 2 and drop the subscript 2 in H N,2 , which is slightly easier to write down since there are no error terms in Itô's formula, but everything extends readily to the cases β = 1 or 4. The only point to notice is that where the last equality is obtained by changing f into 2 −1 βf .

Large deviation upper bound
From the previous Itô's formula, one can deduce by following the ideas of [88] (see Section 4.2.1) a large deviation upper bound for the measure valued procesŝ µ N . ∈ C([0, 1], P(IR))). To this end, we shall make the following assumption on the initial condition D N ; (H) implying that (μ N DN , N ∈ N) is tight. Moreover,μ N DN converges weakly, as N goes to infinity, toward a probability measure µ D . Then, we shall prove, with the notations of (4.2.7)-(4.2.9), the following Theorem 4.12. Assume (H). Then (1) S µD is a good rate function on C([0, 1], P(IR)).
(2) For any closed set F of C([0, 1], P(IR)), Proof : We first prove that S µD is a good rate function. Then, we show that exponential tightness holds and then obtain a weak large deviation upper bound, these two arguments yielding (2) (cf. Chapter 2).
(a) Let us first observe that S µD (ν) is also given, when ν 0 = µ D , by Consequently, S µD is non negative. Moreover, S µD is obviously lower semicontinuous as a supremum of continuous functions. Hence, we merely need to check that its level sets are contained in relatively compact sets. For K and C compact subsets of P(IR) and C([0, 1], R), respectively, set To prove (4.2.22), we consider, for δ > 0, f δ (x) = log We observe that is finite and, for δ ∈ (0, 1], Hence, (4.2.21) implies, by taking f = f δ in the supremum, that for any δ ∈ (0, 1], any t ∈ [0, 1], any µ . ∈ {S µD ≤ M }, Consequently, we deduce by the monotone convergence theorem and letting δ decrease to zero that for any µ . ∈ {S µD ≤ M }, which finishes the proof of (4.2.22).
The proof of (4.2.23) again relies on (4.2.21) which implies that for any f ∈ C 2 b (R), any µ . ∈ {S µD ≤ M } and any 0 ≤ s ≤ t ≤ 1, The proof is rather classical (it uses Doob's inequality but otherwise is closely related to the proof that S µD is a good rate function); we shall omit it here (see the first section of [29] for details).
The arguments are exactly the same as in Section 4.2.1.

Large deviation lower bound
We shall prove at the end of this section that Note that h belongs to MF ∞ iff it can be extended analytically to {z : |ℑ(z)| < ǫ}.
As a consequence of Lemma 4.15, we find that for any open subset O ∈ C([0, 1], P(IR)), any ν ∈ O ∩ MC([0, 1], P(IR)), there exists δ > 0 small enough so that with a function g going to zero at zero. Hence, for any ν ∈ O ∩ MC([0, 1], P(IR)) To complete the lower bound, it is therefore sufficient to prove that for any ν ∈ C([0, 1], P(IR)), there exists a sequence ν n ∈ MC([0, 1], P(IR)) such that lim n→∞ ν n = ν and lim n→∞ S µD (ν n ) = S µD (ν). The rate function S µD is not convex a priori since it is the supremum of quadratic functions of the measure-valued path ν so that there is no reason why it should be reduced by standard convolution as in the classical setting (cf. Section 4.2.1). Thus, it is now unclear how we can construct the sequence (4.2.27). Further, we begin with a degenerate rate function which is infinite unless ν 0 = µ D .
To overcome the lack of convexity, we shall remember the origin of the problem; in fact, we have been considering the spectral measure of matrices and should not forget the special features of operators due to the matrices structure. By definition, the differential equation satisfied by a Hermitian Brownian motion should be invariant if we translate the entries, that is translate the Hermitian Brownian motion by a self-adjoint matrix. The natural limiting framework of large random matrices is free probability, and the limiting spectral measure of the sum of a Hermitian Brownian motion and a deterministic self-adjoint matrix converges toward the free convolution of their respective limiting spectral measure. Intuitively, we shall therefore expect (and in fact we will show) that the rate function S 0,1 decreases by free convolution, generalizing the fact that standard convolution was decreasing the Brownian motion rate function (cf. (4.2.20)). However, because free convolution by a Cauchy law is equal to the standard convolution by a Cauchy law, we shall regularize our laws by convolution by Cauchy laws. Free probability shall be developed in Chapter 6. Let us here outline the main steps of the proof of (4.2.27): 1. We find that convolution by Cauchy laws (P ǫ ) ǫ>0 decreases the entropy and prove (this is very technical and proved in [65]) that for any ν ∈ {S µD < ∞}, any given partition 0 = t 1 < t 2 < . . . < t n = 1 with t i = (i − 1)∆, the measure valued path given, for t ∈ [t k , t k+1 [, by Here, the choice of Cauchy laws is not innocent; it is a good choice because free convolution by Cauchy laws is, exceptionally, the standard convolution. Hence, it is easier to work with. Moreover, to prove the above result, it is convenient that ν ǫ,∆ t is absolutely continuous with respect to Lebesgue measure with non vanishing density (see (4.2.31)). But it is not hard to see that for any measure µ with compact support, µ ⊞ p, if it has a density with respect to Lebesgue measure for many choices of laws p , is likely to have holes in its density unless xdp(x) = +∞. For these two reasons, the Cauchy law is a wise choice. However, we have to pay for it as we shall see below.
3. Everything looks nice except that we modified the initial condition from µ D into µ D * P ǫ , so that in fact S µD (ν ǫ,∆ ) = +∞! and moreover, the empirical measure-valued process can not deviate toward processes of the form ν ǫ,∆ even after some time because these processes do not have finite second moment. To overcome this problem, we first note that this result will still give us a large deviation lower bound if we change the initial data of our matrices. Namely, let, for ǫ > 0, C N ǫ be a N × N diagonal matrix with spectral measure converging toward the Cauchy law P ǫ and consider the matrix-valued process with U N a N × N unitary measure following the Haar measure m N 2 on U (N ). Then, it is well known (see Voiculescu [122]) that the spectral distribution of U N C N ǫ U * N + D N converges toward the free convolution P ǫ ⊞ µ D = P ǫ * µ D . Hence, we can proceed as before to obtain the following large deviation estimates on the law of the spectral measureμ N,ǫ t =μ N X N,ǫ t Corollary 4.16. For any ǫ > 0, for any closed subset F of C([0, 1], P(R)),

Further, for any open set
4. To deduce our result for the case ǫ = 0, we proceed by exponential approximation. In fact, we have the following lemma, whose proof is fairly classical and omitted here (cf. [65], proof of Lemma 2.11).
In fact we show in Chapter 6 that S 0,1 (ν ǫ,∆ ) converges toward S 0,1 (ν). The fact that the field h ǫ,∆ associated with ν ǫ,∆ satisfies the necessary conditions so that ν ǫ,∆ ∈ MC([0, 1], P(IR)) is proved in [65]. We shall not detail it here but let us just point out the basic idea which is to observe that (4.2.25) is equivalent to write that, if µ t (dx) = ρ t (x)dx with a smooth density ρ . , where Hν is the Hilbert transform In other words, Hence, we see that ∂ x k . is smooth as soon as ρ is, that its Hilbert transform behaves well and that ρ does not vanish. To study the Fourier transform of ∂ x k t , we need it to belong to L 1 (dx), which we can only show when the original process ν possess at least finite fifth moment. More details are given in [65].

Discussion and open problems
We have seen in this section how the non-intersecting paths description (or equivalently the Dyson equations (4.2.11)) of spherical integrals can be used to study their first order asymptotic behaviour and understand where the integral concentrates.
A related question is to study the law of the spectral measure of the random matrix M with law In the case where V (M ) = 1 2 M 2 , M = W + A with W a Wigner matrix, the asymptotic distribution of the eigenvalues are given by the free convolution of the semi-circular distribution and the limiting spectral distribution of A. A more detailed study based on Riemann-Hilbert techniques gives the limiting eigenvalue distribution correlations whenμ N A = α N δ a + (1 − α N )δ −a (cf. [19,3]). It is natural to wonder whether such a result could be derived from the Brownian paths description of the matrix. Our result allows (as a mild generalization of the next chapter) to describe the limiting spectral measure for general potentials V . However, the study of correlations requires different sets of techniques.
It would be very interesting to understand the relation between this limit and the expansion obtained by B. Collins [34] who proved that with some complicated functions a p (µ D , µ E ) of interest. The physicists answer to such questions is that which would validate the whole approach to use the asymptotics of I (β) N to compute and study the a p (µ D , µ E )'s. However, it is not yet known whether such an interchange of limit and derivation is rigorous. A related topic is to understand whether the asymptotics we obtained extends to non Hermitian matrices. This is not true in general following a counterexample of S. Zelditch [133] but still could hold when the spectral norm of the matrices is small enough.
In the case where one matrix has rank one, I have shown with M. Maida [63] that such an analytic extension was true.
Other models such as domino tilings can be represented by non-intersecting paths (cf. [77] for instance). It is rather tempting to hope that similar techniques could be used in this setting and give a second approach to [84]. This however seems slightly more involved because time and space are discrete and have the same scaling (making the approximation by Brownian motions irrelevant), so that the whole machinery borrowed from hydrodynamics technology does not seem to be well adapted.
It would be as well very interesting to obtain second order corrections terms for the spherical integrals, problem related with the enumeration of maps with genus g ≥ 1 as we shall see in the next section.
Finally, it would also be nice to get a better understanding of the limiting value of the spherical integrals, namely of J β (µ D , µ E ). We shall give some elements in this direction in the next section but wish to emphasize already that this quantity remains rather mysterious. We have not yet been able for instance to obtain a simple formula in the case of Bernouilli measures (which are somewhat degenerate cases in this context).

Matrix models and enumeration of maps
It appears since the work of 't Hooft that matrix integrals can be seen, via Feynman diagrams expansion, as generating functions for enumerating maps (or triangulated surfaces). We refer here to the very nice survey of A. Zvonkin's [136]. One matrix integrals are used to enumerate maps with a given genus and given vertices degrees distribution whereas several matrices integrals can be used to consider the case where the vertices can additionally be colored (i.e. can take different states).
Matrix integrals are usually of the form with some polynomial function P of d-non-commutative variables and the Lebesgue measure dA on some well chosen ensemble of N ×N matrices such as the set H N (resp. S N , resp. Symp N ) of N ×N Hermitian (resp. symmetric, resp. symplectic) matrices. We shall describe in the next section how such integrals are related with the enumeration of maps. Then, we shall apply the results of the previous section to compute the first order asymptotics of such integrals in some cases where the polynomial function P has a quadratic interaction. (−t) n n!N 2g C(n, g) Here, a map of genus g is an oriented connected graph drawn on a surface of genus g modulo equivalent classes.

Relation with the enumeration of maps
To be more precise, a surface is a compact oriented two-dimensional manifold without boundary. The surfaces are classified according to their genera, which is characterized by a genus (the number of 'handles', see figure 5.1). There is only one surface with a given genus up to homeomorphism. Following [136], Definition 4.1, a map is then a graph which is 'drawn' (or embedded into) a surface in such a way that the edges do not intersect and if we cut the surface along the edges, we get a disjoint union of sets which are homeomorphic to an open disk (these sets are the faces of the map). Note that in Conjecture 5.1, the maps are counted up to homeomorphisms (that is modulo equivalent classes).
The formal proof of Conjecture 5.1 goes as follows. One begins by expanding all the non quadratic terms in the exponential that is the law of Wigner's Hermitian matrices given by and using Wick's formula to compute the expectation over the Gaussian variables, gives the result. One can alternatively use the graphical representation introduced by Feynman to compute such expectations. It goes as shown on figure 5.2.
The last equivalence in figure 5.2 results from the observation that µ N (A ij ) = 0 for all i, j and that µ N (A ij A kl ) = δ ij=lk N −1 , so that each end of the crosses has to be connected with another one. As a consequence from this construction, one can see that only oriented fat graphs will contribute to the sum. Moreover, since on each face of the graph, the indices are constant, we see that each given graph will appear N ♯faces times. Hence, if we denote G(n, F ) = {oriented graphs with n vertices with degree 4 and F faces} Taking logarithm, it is well known that we get only contributions from connected graphs : Finally, since the genus of a map is related with its number of faces by n − F = 2(g − 1), the result is proved. When considering several matrices, we see that we have additionally to decide at each vertex which matrix contributed, which corresponds to give different states to the vertices. Of course, this derivation is formal and it is not clear at all that such an expansion of the free energy exists. Indeed, the series here might well not be summable (this is clear when t < 0 since the integral diverges). In the one matrix case, this result was proved very recently by J. Mc Laughlin et N. Ercolani [46] who have shown that such an expansion is valid by mean of Riemann-Hilbert problems techniques, under natural assumptions over the potential. The first step in order to use Riemann-Hilbert problems techniques, is to understand the limiting behavior of the spectral measures of the matrices following the corresponding Gibbs measure This is clearly needed to localize the integral. The understanding of such asymptotics is the subject of the next section. Let us remark before embarking in this line of attack that a more direct combinatorial strategy can be developed. For instance, G. Scheaffer and M. Bousquet Melou [22] studied the Ising model on planar random graph as a generating function for the enumeration of colored planar maps, generalizing Tutte's approach. The results are then more explicitly described by an algebraic

Fig 5.2. Computing Gaussian integrals
A. Guionnet/Large deviations for random matrices 129 equation for this generating function. However, matrix models approach is more general a priori since it allows to consider many different models and arbitrary genus, eventhough the mathematical understanding of these points is still far from being achieved. Finally, let us notice that the relation between the enumeration of maps and the matrix models should a priori be true only when the weights of non quadratic Gaussian terms are small (the expansion being in fact a formal analytic expansion around the origin) but that matrix integrals are of interest in other regimes for instance in free probability.

Asymptotics of some matrix integrals
We would like to consider integrals of more than one matrix. The simplest interaction that one can think of is the quadratic one. Such an interaction describes already several classical models in random matrix theory; We refer here to the works of M. Mehta, A. Matytsin, A. Migdal, V. Kazakov, P. Zinn Justin and B. Eynard for instance.
• The random Ising model on random graphs is described by the Gibbs measure and two polynomial functions P 1 , P 2 . The limiting free energy for this model was calculated by M. Mehta [93] in the case P 1 (x) = P 2 (x) = x 2 + gx 4 and integration holds over H N . However, the limiting spectral measures of A and B under µ N Ising were not considered in that paper. A discussion about this problem can be found in P. Zinn Justin [135].
• One can also define the q − 1 Potts model on random graphs described by the Gibbs measure The limiting spectral measures of (A 1 , · · · , A q ) are discussed in [135] when P i = gx 3 − x 2 (!).
• As a straightforward generalization, one can consider matrices coupled by a chain following S. Chadha, G. Mahoux and M. Mehta [94] given by q can eventually go to infinity as in [92].
The first order asymptotics of these models can be studied thanks to the control of spherical integrals obtained in the last chapter.
Theorem 5.2. Assume that P i (x) ≥ c i x 4 + d i with c i > 0 and some finite constants d i . Hereafter, β = 1 (resp. β = 2, resp. β = 4) when dA denotes the Lebesgue measure on S N (resp. H N , resp. H N with N even). Then, with c = inf ν∈P(R) I β (ν), Remark 5.3: The above theorem actually extends to polynomial functions going to infinity like x 2 . However, the case of quadratic polynomials is trivial since it boils down to the Gaussian case and therefore the next interesting case is quartic polynomial as above. Moreover, Theorem 5.4 fails in the case where P, Q go to infinity only like x 2 . However, all our proofs would extends easily for any continuous functions P ′ i s such that P i (x) ≥ a|x| 2+ǫ + b with some a > 0 and ǫ > 0. In particular, we do not need any analyticity assumptions which can be required for instance to obtain the so-called Master loop equations (see Eynard and als. [50,49]). Proof of Theorem 5.2 : It is enough to notice that, when diagonalizing the matrices A i 's, the interaction is expressed in terms of spherical integrals by (3.1.1). Laplace's (or saddle point) method then gives the result (up to the boundedness of the matrices A i 's in the spherical integrals, which can be obtained by approximation). We shall not detail it here and refer the reader to [61] .
We shall then study the variational problems for the above energies; indeed, by standard large deviation considerations, it is clear that the spectral measures of the matrices (A i ) 1≤i≤d will concentrate on the set of the minimizers defining the free energies, and in particular converge to these minimizers when they are unique. We prove the following for the Ising model.
3) There exists a couple (ρ A→B , u A→B ) of measurable functions on R × (0, 1) such that ρ A→B t (x)dx is a probability measure on R for all t ∈ (0, 1) and (µ A , µ B , ρ A→B , u A→B ) are characterized uniquely as the minimizer of a strictly convex function under a linear constraint.
In particular, (ρ A→B , u A→B ) are solution of the Euler equation for isentropic flow with negative pressure p(ρ) = − π 2 3 ρ 3 such that, for all (x, t) in the interior with the probability measure ρ A→B t (x)dx weakly converging toward µ A (dx) (resp. µ B (dx)) as t goes to zero (resp. one). Moreover, we have For the other models, uniqueness of the minimizers is not always clear. For instance, we obtain uniqueness of the minimizers for the q-Potts models only for q ≤ 2 whereas it is also expected for q = 3. For the description of these minimizers, I refer the reader to [61].
To prove Theorem 5.4, we shall study more carefully the entropy and in particular understand where the infimum is taken. The main observation is that by (4.2.21), for any f ∈ C 2,1 b (R × [0, 1]),

A. Guionnet/Large deviations for random matrices
132 so that the linear form f → S 0,1 (ν, f ) is bounded and Riesz's theorem asserts We then say that (ν, k) satisfies (5.2.5). Then Property 5.5. Let µ 0 ∈ {µ ∈ P(R) : Σ(µ) > −∞} and ν . ∈ {S µ0 < ∞}. If k is the field such that (ν, k) satisfies (5.2.5), we set u t := ∂ x k t (x) + Hν t (x). Then, ν t (dx) ≪ dx for almost all t and Proof : We shall here only prove formula (5.2.6), assuming the first part of the property which is actually proved by showing that formula (5.2.6) yields for smooth approximations ν ǫ of the measure-valued path ν such that S µD (ν ǫ ) approximate S µD (ν), yielding a uniform control on the L 3 norm of their densities in terms of S µD (ν). This uniform controls allow us to show that any path ν ∈ {S µD < ∞} is absolutely continuous with respect to Lebesgue measure and with density in L 3 (dxdt).
Let us denote by k the field associated with ν, i.e such that for any f ∈ with ∂ x k ∈ L 2 (dν t (x) × dt). Observe that by [119], p. 170, for any s ∈ [0, 1] such that ν s is absolutely continuous with respect to Lebesgue measure with density ρ s ∈ L 3 (dx), for any compactly supported measurable function ∂ x f (., s), Hence, since we assumed that ν s (dx) = ρ s (x)dx for almost all s with ρ ∈ L 3 (dxdt), (5.2.7) shows that for any i.e. that in the sense of distributions on R × [0, 1], Moreover, since Hν . belongs to L 2 (dν s × ds) when ρ ∈ L 3 (dxdt), we can write We shall now see that the last term in the above right hand side only depends on (µ 0 , µ 1 ). To simplify, let us assume that ν is a smooth path such that which gives the result. The general case is obtained by smoothing by free convolution, as can be seen in [61], p. 537-538.
To prove Theorem 5.4, one should therefore write It is easy to see that is strictly convex. Therefore, since F Ising is its infimum under a linear constraint, this infimum is taken at a unique point (µ A , µ B , ρ A→B , u A→B ). To describe such a minimizer, one classically performs variations. The only point to take care of here is that perturbations should be constructed so that the constraint remains satisfied. This type of problems has been dealt with before. In the case above, I met three possible strategies.
The first is to use a target type perturbation, which is a standard perturbation on the space of probability measure, viewed as a subspace of the vector space of measures. The second is to make a perturbation with respect to the source. This strategy was followed by D. Serre in [108]. The idea is basically to set a t (x) = a(t, x) = (ρ * t (x), ρ * t (x)u * t (x)) so that the constraint reads div(a t (x)) = 0 and perturb a by considering a family Note here that div(a g t (x)) = 0. Such an approach yields the Euler's equation The last way is to use convex analysis, following for instance Y. Brenier (see [26], Section 2). These two last strategies can only be applied when we know a priori that (µ A , µ B ) are compactly supported (this indeed guarantees some a priori bounds on (ρ * , u * ρ * ) for instance). When the minimizers are smooth, all these strategies should give the same result. It is therefore important to study these regularity properties.
It is quite hard to obtain good smoothness properties of the minimizers directly from the fact that they minimize L. An alternative possibility is to come back to the matricial integrals and find directly there some of these properties (cf. [62]). What I did in [61] was to obtain directly the following informations Property 5.6. 1) If P (x) ≥ a|x| 4 + b, Q(x) ≥ a|x| 4 + b for some a > 0 there exists a finite constant C such that for any N ∈ N, there exists k(N ), k(N ) going to infinity with N , such that The first result tells us that the polynomial potentials P, Q force the limiting laws (µ A , µ B ) to be compactly supported. The second point is quite natural also when we think that we are looking at matrices with Gaussian entries with given values at time zero and one; the brownian bridge is the path which takes the lowest energy to do that and therefore it is natural that the minimizer of S µD should be the limiting distribution of matrix-valued Brownian bridge. As we shall see in Chapter 6, such a distribution has a limit in free probability as soon asμ N tBN +(1−t)AN converges for all t ∈ [0, 1], it is nothing but the distribution of a free Brownian bridge tB + (1 − t)A + t(1 − t)S between A and B, with a semi-circular variable S, free with (A, B). Note however that unless the joint law of (A, B) is given, this result does not describe entirely the law µ * t . It however shows, because the limiting law is a free convolution by a semicircular law, that we have the following Corollary 5.7. 1)µ A and µ B are compactly supported.
2) a) There exists a compact set K ⊂ R so that for all t ∈ [0, 1], µ * t (K c ) = 0. For all t ∈ (0, 1), the support of µ * t is the closure of its interior (in particular it does not put mass on points All these properties are due to the free convolution by the semi-circular law, and are direct consequences of [15]. Once Corollary 5.7 is given, the variational study of F Ising is fairly standard and gives Theorem 5.4. We do not detail this proof here. Hence, free probability arises naturally when we deal with the study of the rate function S µD and more generally when we consider traces of matrices with size going to infinity. We describe the basis of this rapidly developing field in the next chapter.

Discussion and open problems
Matrix models have been much more studied in physics than in mathematics and raise numerous open questions.
As in the last chapter, it would be extremely interesting to understand the relation between the asymptotics of the free energy and the asymptotics of its derivatives at the origin, which are the objects of primary interest. Once this question would be settled, one should understand how to retrieve informations on this serie of numbers from the limiting free energy. In fact, it would be tempting for instance to describe the radius of convergence of this serie via the phase transition of the model, since both should coincide with a default of analyticity of the free energy/the generating function of this serie. However, for instance for the Ising model, one should realize that the criticality is reached at negative temperature where the integral actually diverges. Eventhough the free energy can still be defined in this domain, its description as a generating function becomes even more unclear. There is however some challenging works on this subject in physics (cf. [21] for instance).
There are many other matrix models to be understood with nice combinatorial interpretations. A few were solved in physics literature by means of character expansions for instance in the work of V. Kazakov, M. Staudacher, I. Kostov or P. Zinn Justin. However, such expansions are signed in general and the saddle points methods used rarely justified. In one case, M. Maida and myself [62] could give a mathematical understanding to this method. In general, even the question of the existence of the free energies for most matrix models (that is the convergence of N −2 log Z N (P )) is open and would actually be of great interest in free probability (see the discussion at the end of Chapter 7).
Related with string theory is the question of the understanding of the full expansion of the partition function in terms of the dimension N of the matrices, and the definition of critical exponents. This is still far from being understood on a rigorous ground, eventhought the present section showed that at least for AB interaction models, a good strategy could be to understand better nonintersecting Brownian paths. However, it is yet not clear how to concatenate such an approach with the technology, currently used for one matrix model given in [46], which is based on orthogonal polynomials and Riemann-Hilbert techniques. It would be very tempting to try to generalize precise Laplace's methods which are commonly used to understand the second order corrections of the free energy of mean field interacting particle systems [20]. However, such an approach until now failed even in the one matrix case due to the singularity of the logarithmic interacting potential (cf. [33]). Another approach to this problem has recently been proposed by B. Eynard et all [50,49] and M. Bertola [14].
On a more analytic point of view, it would be interesting to understand better the properties of complex Burgers equations; we have here deduced most of the smoothness properties of the solution by recalling its realization in free probability terms. A direct analysis should be doable. Moreover, it would be nice to understand how holes in the initial density propagates along time; this might as well be related with the phase transition phenomena according to A. Matytsin and P. Zaugg [92].

Large random matrices and free probability
Free probability is a probability theory for non-commutative variables. In this field, random variables are operators which we shall assume hereafter selfadjoint. For the sake of completeness, but actually not needed for our purpose, we shall recall some notions of operator algebra. We shall then describe free probability as a probability theory on non-commutative functionals, a point of view which forgets the space of realizations of the laws. Then, we will see that free probability is the right framework to consider large random matrices. Finally, we will sketch the proofs of some results we needed in the previous chapter.

A few notions about von Neumann algebras
Definition 6.1 (Definition 37, [16]). A C * -algebra (A, * ) is an algebra equipped with an involution * and a norm ||.|| A which furnishes it with a Banach space structure and such that for any X, Y ∈ A, A . X ∈ A is self-adjoint iff X * = X. A sa denote the set of self-adjoint elements of A. A C * -algebra (A, * ) is said unital if it contains a neutral element denoted I.
A can always be realized as a sub-C * -algebra of the space B(H) of bounded linear operators on a Hilbert space H. For instance, if A is a unital C * -algebra furnished with a positive linear form τ , one can always construct such a Hilbert space H by completing and separating L 2 (τ ) (this is the Gelfand-Naimark-Segal (GNS) construction, see [115], Theorem 2.2.1).
We shall restrict ourselves to this case in the sequel and denote by H a Hilbert space equipped with a scalar product < ., . > H such that A ⊂ B(H).

Definition 6.2. If A is a sub-C * -algebra of B(H), A is a von Neumann algebra iff it is closed for the weak topology, generated by the semi-norms family
Let us notice that by definition, a von Neumann algebra contains only bounded operators. The theory nevertheless allows us to consider unbounded operators thanks to the notion of affiliated operators. An operator X on H is said to be affiliated to A iff for any Borel function f on the spectrum of X, f (X) ∈ A (see [104], p. 164). Here, f (X) is well defined for any operator X as the operator with the same eigenvectors than X and eigenvalues given by the image of those of X by the map f . Note also that if X and Y are affiliated with A, aX + bY is also affiliated with A for any a, b ∈ R. The couple (A, τ ) of a von Neumann algebra equipped with a state τ is called a W * -probability space. Example 6.3.
1. Let n ∈ N, and consider A = M n (C) as the set of bounded linear operators on C n . For any v ∈ C n , v C n = 1, Observe that it is weakly closed for the semi-norms (< f, .g > H , f, g ∈ L 2 (µ)) as L ∞ (X, Σ, dµ) is the dual of L 1 (X, Σ, dµ). 3. Let G be a discrete group, and (e h ) h∈G be a basis of ℓ 2 (G). Let λ(h)e g = e hg . Then, we take A to be the von Neumann algebra generated by the linear span of λ(G). The (tracial) state is the linear form such that τ (λ(g)) = 1 g=e (e = neutral element). We refer to [128] for further examples and details.

Space of laws of m non-commutative self-adjoint variables
Following the above description, laws of m non-commutative self-adjoint variables can be seen as elements of the set M (m) of linear forms on the set of polynomial functions of m non-commutative variables C X 1 , . . . X m furnished with the involution and such that 1. Positivity τ (P P * ) ≥ 0, for any P ∈ C X 1 , . . . X m . 2. Traciality τ (P Q) = τ (QP ) for any P, Q ∈ C X 1 , . . . X m . 3. Total mass τ (I) = 1 This point of view is identical to the previous one. Indeed, by the Gelfand-Naimark-Segal construction, being given µ ∈ M (m) , we can construct a W *probability space (A, τ ) and operators (X 1 , · · · , X m ) such that µ = τ X1,...,Xm . (6.2.1) This construction can be summarized as follows. Consider the bilinear form on C X 1 , . . . X m 2 given by < P, Q > τ = τ (P Q * ).
Then H = L 2 (τ )/L µ is a Hilbert space with scalar product < ., . > τ . The noncommutative polynomials C X 1 , . . . X m act by left multiplication on L 2 (τ ) and we can consider the completion of these multiplication operators for the seminorms {< P, .Q > H , P, Q ∈ L 2 (τ )}, which form a von Neumann algebra A equipped with a tracial state τ satisfying (6.2.1). In this sense, we can think about A as the set of bounded measurable functions L ∞ (τ ).
Such a topology is reasonable when one deals with uniformly bounded noncommutative variables. In fact, if we consider for R ∈ R + , R , equipped with this weak-* topology, is a Polish space (i.e. a complete metric space). A distance can for instance be given by where P n is a dense sequence of polynomials with operator norm bounded by one when evaluated at any set of self-adjoint operators with operator norms bounded by R.
This notion is the generalization of laws of m real-valued variables bounded say by a given finite constant R, in which case the weak-* topology driven by polynomial functions is the same as the standard weak topology. Actually, it is not hard to check that M (1) . However, it can be usefull to consider more general topologies compatible with the existence of unbounded operators, as might be encountered for instance when considering the deviations of large random matrices. Then, the only point is to change the set of test functions. In [29], we considered for instance the complex vector space CC m st (C) generated by the Stieljes functionals is complete. It is actually a long standing question posed by A. Connes to know whether all τ ∈ M (m) can be approximated in such a way.
In the case m = 1, the question amounts to ask if for all µ ∈ P([−R, R]), there exists a sequence (λ N 1 , · · · , λ N N ) N ∈N such that This is well known to be true by Birkhoff 's theorem (which is based on Krein-Milman's theorem), but still an open question when m ≥ 2.

Freeness
Free probability is not only a theory of probability for non-commutative variables; it contains also the central notion of freeness, which is the analogue of independence in standard probability.
Definition 6.5. The variables (X 1 , . . . , X m ) and (Y 1 , . . . , Y n ) are said to be free iff for any as soon as Remark 6.6: 1) The notion of freeness defines uniquely the law of {X 1 , . . . , X m , Y 1 , . . . , Y n } once the laws of (X 1 , . . . , X m ) and (Y 1 , . . . , Y n ) are given (in fact, check that every expectation of any polynomial is given uniquely by induction over the degree of this polynomial).
3)The above notion of freeness is related with the usual notion of freeness in groups as follows. Let (x 1 , ..x m , y 1 , · · · , y n ) be elements of a group. Then, (x 1 , · · · , x m ) is said to be free from (y 1 , · · · , y n ) if any non trivial words in these elements is not the neutral element of the group, that is that for every monomials P 1 , · · · , P k ∈ C X 1 , · · · , X m and Q 1 , · · · , Q k ∈ C X 1 , · · · , X n , P 1 (x)Q 1 (y)P 2 (x) · · · Q k (y) is not the neutral element as soon as the Q k (y) and the P i (x) are not the neutral element. If we consider, following example 6.3.3), the map which is one on trivial words and zero otherwise and extend it by linearity to polynomials, we see that this define a tracial state on the operators of left multiplication by the elements of the group and that the two notions of freeness coincide. 4) We shall see below that examples of free variables naturally show up when considering random matrices with size going to infinity.

Large random matrices and free probability
We have already seen in example 6. The fact that free probability is particularly well suited to study large random matrices is due to an observation of Voiculescu [121] who proved that if (A N , B N ) N ∈N is a sequence of uniformly bounded diagonal matrices with converging spectral distribution, and U N a unitary matrix following Haar measure m N 2 , then the empirical distribution of (A N , U N B N U * N ) converges toward the law of (A, B), A and B being free and each of their law being given by their limiting spectral distribution. This convergence holds in expectation with respect to the unitary matrices U N [121] and then almost surely (as can be checked by Borel-Cantelli's lemma and by controlling As a consequence, if one considers the joint distribution of m independent Wigner matrices (X N 1 , · · · , X N m ), their empirical distribution converges almost surely toward (S 1 , · · · , S m ), m free variables distributed according to the semi-circular law Hence, freeness appears very naturally in the context of large random matrices. The semi-circular law σ, which we have seen to be the asymptotic spectral distribution of Gaussian Wigner matrices, is in fact also deeply related with the notion of freeness; it plays the role that Gaussian law has with the notion of independence in the sense that it gives the limit law of the analogue of central limit theorem. Indeed, let (A, τ ) be a W * -probability space and {X i , i ∈ N} ∈ A be free random variables which are centered (τ (X i ) = 0) and with covariance 1 (τ (X 2 i ) = 1). Then, the sum √ n −1 (X 1 + · · · + X n ) converges in distribution toward a semi-circular variable. We shall be interested in the next section in the free Itô's calculus which appeared naturally in our problems. Before that, let us introduce the notation for free convolution : if X (resp. Y) is a random variable with law µ (resp ν) and X and Y are free, we denote µ ⊞ ν the law of X + Y. There is a general formula to describe the law µ ⊞ ν; in fact, analytic functions R µ were introduced by Voiculescu as an analogue of the logarithm of Fourier transform in the sense that and that R µ defines µ uniquely. I refer the reader to [128] for more details. Convolution by a semicircular variable was precisely studied by Biane [15].

Free processes
The notion of freeness allows us to construct a free Brownian motion such as The Hermitian Brownian motion in particular converges toward the free Brownian motion according to the previous section (using convergence on cylinder functions).
As for the Brownian motion, one can make sense of the free differential equation and show existence and uniqueness of solution to this equation when b t is Lipschitz operator in the sense that if ||.|| denotes the operator norm on the algebra A (||A|| = lim τ (A 2n ) 1 2n ) on which the free Brownian motion {S t , t ≥ 0} lives, with a finite constant C(one just uses a Picard argument).
With the intuition given by stochastic calculus, we shall give some outline of the proof of some results needed in Chapter 4.

Continuity of the rate function under free convolution
In the classical case, the entropy S of the deviations of the law of the empirical measure of independent Brownian motion decreases by convolution (see (4.2.20)). We want here to generalize this result to our eigenvalues setting. The intuition coming from the classical case, adapted to the free probability setting, will help to show the following result : Lemma 6.8. For any p ∈ P(IR), any ν ∈ C([0, 1], P(IR)) S 0,1 (ν ⊞ p) ≤ S 0,1 (ν).
Proof : We shall give the philosophy of the proof via the formula (see (5.2.5)) Namely, let us assume that ν t can be represented as the law at time t of the free stochastic differential equation (FSDE) It can be checked by free Itô's calculus that this FSDE satisfies the same free Fokker-Planck equation (5.2.5) (get the intuition by replacing S by the Hermitian Brownian motion and X . by a matrix-valued process). However, until uniqueness of the solutions of (5.2.5) is proved, it is not clear that ν is indeed the law of this FSDE. Now, let C be a random variable with law p, free with S and X 0 . Y = X + C satisfies the same FSDE dY t = dS t + ∂ x k t (X t )dt and therefore its law µ t = ν t ⊞ p satisfies for any C 2,1 b (R × [0, 1]), where τ ( |X t +C) is the orthogonal projection in L 2 (τ ) on the algebra generated by X t + C (recall the definition of L 2 (τ ) given in Section 6.2). From this, we see Of course, such an inequality has nothing to do with the existence and uniqueness of a strong solution of our free Fokker-Planck equation and we can indeed prove this result by means of R-transform theory for any ν ∈ {S 0,1 < ∞} (see [30]).
The proof of Theorem 6.9 is rather technical and goes back through the large random matrices origin of J β . Let us consider the case β = 2. By definition, if with a real diagonal matrix X N 0 with spectral measureμ N 0 and a Hermitian Brownian motion H N , if we denoteμ N t the spectral measure of X N t , then, if µ N 0 converges toward a compactly supported probability measure µ 0 , for any µ 1 ∈ P(R), Let us now reconsider the above limit and show that the limit must be taken at a free Brownian bridge. More precisely, we shall see that, if τ denotes the joint law of (X 0 , X 1 ) and µ τ the law of the free Brownian bridge (6.7.3) associated with the distribution τ of (X 0 , X 1 ), for any family {t 1 , · · · , t n } of times in [0, 1]. Therefore, Theorem 4.5.2).a) implies that The lower bound estimate obtained in Theorem 4.5.2).b) therefore guarantees that inf{S(ν), ν 0 = µ 0 , ν 1 = µ 1 } ≥ inf{S(µ τ ), τ • X −1 0 = µ 0 , τ • X −1 1 = µ 1 }. and therefore the equality since the other bound is trivial.
Let us now be more precise. We consider the empirical distributionμ N 0,1 = µ N X N 0 ,X N 1 of the couple of the initial and final matrices of our process as an element of M 1 , equipped with the topology of the Stieljes functionals. It is not hard to see that M Now, conditionally on X N 1 , or equivalently It is not hard to see that whenμ N 0,1 converges toward τ ,μ N X N t converges toward µ τ t for all t ∈ [0, 1]. Therefore, for any κ > 0, any t 1 , · · · , t n ∈ [0, 1], there exists ǫ > 0 such that for any (X N 0 , X N 1 ) ∈ {D(μ N 0,1 , τ ) < ǫ}, Hence for any η, when ǫ is small enough and N large enough, taking κ = 1 2 , We arrive at, for ǫ small enough and any τ ∈ M 0,1 , Using the large deviation upper bound for the law of (μ N t , t ∈ [0, 1]) of Theorem 4.5.2), we deduce We can now let ǫ going to zero, and then δ going to zero, and then n going to infinity, to conclude (since S is a good rate function). It is not hard to see that FBB(µ 0 , µ 1 ) is closed (see p. 565-566 in [61]).
To try to disprove the isomorphism, one would like to construct a function δ : M (m) → R which is an invariant in the sense that for any τ, τ ∈ M (m) τ ≡ τ ⇒ δ(τ ) = δ( τ ) (7.0.1) and such that δ(σ k ) = k (where σ k , k ≤ m, can be embedded into M (m) for k ≤ m by taking m − k null operators). D. Voiculescu proposed as a candidate the so-called entropy dimension δ, constructed in the spirit of Minkowski dimension (see (7.1.3) for a definition). It is currently under study whether δ is an invariant of the von Neumann algebra, that is if it satisfies (7.0.1). We note however that the converse implications is false since N. Brown [27] just produced an example showing that there exists a von-Neumann algebra which is not isomorphic to the free group factor but with same entropy dimension. Eventhough this problem has not yet been solved, some other important questions concerning von Neumann algebras could already be answered by using this approach (see [124] for instance). In fact, free entropy theory is still far from being complete and the understanding of these objects is still too narrow to be applied to its full extent. In this last chapter, we will try to complement this theory by means of large deviations techniques. We begin by the definitions of the two main entropies introduced by D. Voiculescu, namely the microstates entropy χ and the microstates-free entropy χ * . The entropy dimension δ is defined via the microstates entropy χ but we shall not study it here.

Definitions
We define here a version of Voiculescu's microstates entropy with respect to the Gaussian measure (rather than Lebesgue measure as he initially did : these two points of view are equivalent (see [30]) but the Gaussian point of view is more natural after the previous chapters). I underlined the entropies to make this difference, but otherwise kept the same notations as Voiculescu.
For µ ∈ M (m) 1 , n ∈ N, N ∈ N, ǫ > 0, Voiculescu [122] defines a neighborhood Γ R (µ, n, N, ǫ) of the state µ as the set of matrices A 1 , .., A m of H m N such that for any 1 ≤ p ≤ n, i 1 , .., i p ∈ {1, .., m} p and ||A j || ∞ ≤ R. Then, the microstates entropy w.r.t the Gaussian measure is given by This definition of the entropy is done in the spirit of Boltzmann and Shannon. In the classical case where M m 1 is replaced by P(R) the entropy with respect to the Gaussian measure γ is the relative entropy where the last equality holds if (f p ) p∈N is a family of uniformly continuous functions, dense in C 0 b (R). This last way to define the entropy is very close from Voiculescu's definition. In the commutative case, we know by Sanov's theorem that 1. The limsup in the definition of S can be replaced by a liminf, i.e.

S(µ)
2. For any µ ∈ P(R) we have the formula with S * the relative entropy which is infinite if µ is not absolutely continuous wrt γ and otherwise given by These two fundamental results are still lacking in the non-commutative theory except in the case m = 1 where Voiculescu [123] (see also [10]) has shown that the two limits coincide and that In the case where m ≥ 2, Voiculescu [122,125] proposed an analogue of S * , χ * , which does not depend at all on the definition of microstates and called for that reason micro-states-free entropy. The definition of χ * , is based on the notion of free Fisher information. To generalize the definition of Fisher information to the non-commutative setting, D. Voiculescu noticed that the standard Fisher information can be defined as with ∂ * x the adjoint of the derivative ∂ x for the scalar product in L 2 (µ), that is that for every f ∈ L 2 (µ), When µ has a density ρ with respect to Lebesgue measure, we simply have The entropy S * is related to Fisher information by the formula with µ b t the law at time t of the Brownian bridge between δ 0 and µ. Such a definition can be naturally extended to the free probability setting as follows. To this end, we begin by describing the definition of the derivative in this setting; let P ∈ C X 1 , . . . , X m be for instance a monomial function → 1≤k≤r X i k . Then, for any operators X 1 , · · · , X m and Y 1 , · · · , Y m , In order to keep track of the place where the operator Y have to be inserted, the derivative is defined as follows; the derivative D Xi with respect to the i th variable is a linear form from C X 1 , . . . , X m into C X 1 , . . . , X m ⊗ C X 1 , . . . , X m satisfying the non-commutative Leibniz rule for any P, Q ∈ C X 1 , . . . , X m and D Xi X l = 1 l=i 1 ⊗ 1.
One then denotes ♯ the application from C X 1 , . . . , X m ⊗ C X 1 , . . . , X m × C X 1 , . . . , X m into C X 1 , . . . , X m such that A ⊗ B♯C = ACB. It is then not hard to see that (7.1.2) reduces to The cyclic derivative D Xi with respect to the i-th variable is given by where m : C X 1 , · · · , X n ⊗ C X 1 · · · X n → C X 1 , · · · , X n is such that m(P ⊗ Q) = QP . In the case m = 1, using the bijection between C X ⊗ C X and C X, Y , we find that

152
The analogue of ∂ * x 1 in L 2 (µ) is given, for any τ ∈ M (m) , as the element in L 2 (τ ) such that for any P ∈ C X 1 , . . . , X m , In the case m = 1 and so τ ∈ P(IR), it is not hard to check (see [123]) that Free Fisher information is thus given, for τ ∈ M (m) , by and therefore χ * is defined by The conjecture (named 'unification problem' by D. Voiculescu [129]) is Conjecture 7.1: For any µ such that χ(µ) > −∞, Remark 7.2: If the conjecture would hold for any µ ∈ M (m) and not only for µ with finite microstates entropy, it would provide an affirmative answer to Connes question since it is known that if µ is the law of X 1 , · · · , X m and S 1 , · · · , S m free semicircular variables, free with X 1 , · · · , X m then, for any ǫ > 0 the distribution µ ⊞ σ ǫ of (X 1 + ǫS 1 , · · · , X m + ǫS m ) satisfies χ * (µ ⊞ σ ǫ ) > −∞, and hence the above equality would imply χ(µ ⊞ σ ǫ ) > −∞ so that one could find matrices whose empirical distribution approximates µ ⊞ σ ǫ and thus µ since ǫ can be chosen arbitrarily small.
In [18], we proved that Moreover, we can define another entropy χ * * such that Typically, χ * is obtained as an infimum of a rate function on lows of noncommutative processes with given terminal data, whereas χ * * is the infimum of the same rate function but on a a priori smaller set.
From this result, we as well obtain bounds on the entropy dimension where τ ⊞ σ ǫ stands for the free convolution by m free semi-circular variables with parameter ǫ > 0. Define accordingly δ * , δ * * . Then In a recent work, A. Connes and D. Shlyaktenkho [35] defined another quantity ∆, candidate to be an invariant for von Neumann algebras, by generalizing the notion of L 2 -homology and L 2 -Betti numbers for a tracial von Neumann algebra. Such a definition is in particular motivated by the work of D. Gaboriau [55]. They could compare ∆ with δ * , and therefore, thanks to the above corollary, to δ.
Eventhough δ * , δ * * are not simple objects, they can be computed in some cases such as for the law of the DT-operators (see [1]) or in the case of a finitely generated group where I. Mineyev and D. Shlyakhtenko [95] proved that δ * (τ ) = β 1 (G) − β 0 (G) + 1 with the group L 2 Betti-numbers β.
DT-operators have long been an interesting candidate to try to disprove the invariance of δ. A DT-operator T can be constructed as the limit in distribution of upper triangular matrices with i.i.d Gaussian entries (which amounts to consider the law of two self-adjoint non-commutative variables T + T * and i(T − T * )). If C is a semicircular operator, which is the limit in distribution of the (non Hermitian) matrix with i.i.d Gaussian entries, then C can be written as T + T * where T, T are free copies of T . Hence, since δ(C) ≤ 2, we can hope that δ(T ) < 2. However, based on an heavy computation of moments of these DT-operators due to P. Sniady [111], K. Dykema and U. Haagerup [44] could prove that T generates L(F 2 ). Hence invariance would be disproved if δ(T ) < 2. But in fact, L. Aagaard [1] recently proved that δ * (T ) = 2 which shows at least that T is not a counter-example for the invariance of δ * ( and also settle the case for δ if one believes conjecture 7.1).
We now give the main ideas of the proof of Theorem 7.3.

Large deviation upper bound for the law of the process of the empirical distribution of Hermitian Brownian motions
In [29] and [30], we established with T. Cabanal Duvillard the inequality and we shall here study the deviations with respect to this typical behavior.
Since the U N,l are uniformly bounded, polynomial test functions provide a good topology. This amounts to restrict ourselves to a few Stieljes functionals of the Hermitian Brownian motions. However, this is already enough to study Voiculescu's entropy when one considers the deviations toward laws of bounded operators since then the polynomial functions of (ψ(X 1 ), · · · , ψ(X m )) generates the set of polynomial functions of (X 1 , · · · , X m ) and vice-versa (see Lemma 7.7).
Then, it is not hard to see that The above upper bound can be improved by realizing that the infimum has to be achieved at a free Brownian bridge (generalizing the ideas of Chapter 6, Section 6.7). In fact, if µ is the distribution of m self-adjoint operators {X 1 , . . . , X m } and {S 1 , . . . , S m } is a free Brownian motion, free with {X 1 , . . . , X m }, we denote τ b µ the distribution of ψ(tX l + (1 − t)S l We do not prove here that I is a good rate function. The idea of the proof of the large deviations estimates is again based on the construction of exponential martingales; In fact, for P ∈ F m [0,1] , it is clear that M N P (t) = E[σ N (P )|F t ]−E[σ N (P )] is a martingale for the filtration F t of the Hermitian Brownian motions. Clark-Ocone formula gives us the bracket of this martingale : Then, we can apply exactly the same techniques than in Chapter 4 to prove the upper bound.
To obtain the lower bound, we have to obtain uniqueness criteria for equations of the form τ t (P ) − σ(P ) = t 0 τ ( τ s (∇ s P |B s ) τ s (∇ s K|B s )) ds with fields K as general as possible. We proved in [18], Theorem 6.1, that if K ∈ F m [0,1] , the solutions to this equation are strong solutions in the sense that there exists a free Brownian motion S such τ is the law of the operator X satisfying dX t = dS t + τ t (∇ t K|B t )(X)dt.
But, if K ∈ F m [0,1] , it is not hard to see that τ t (∇ t K|B t )(X) is Lipschitz operator, so that we can see that there exists a unique such operator X, implying the uniqueness of the solution of our free differential equation, and hence the large deviation lower bound.

Discussion and open problems
Note that we have the following heuristic description of χ * and χ * * : where the infimum is taken over all laws µ of non-commutative processes which are null operators at time 0, operators with law τ at time one and which are the distributions of 'weak solutions' of dX t = dS t + K t (X)dt.
χ * * is defined similarly but the infimum is restricted to processes with smooth fields K (actually K ∈ F m [0,1] ). We then have proved in Theorem 7.8 that χ * * ≤ χ ≤ χ * and it is legitimate to ask when χ * * = χ * . Such a result would show χ = χ * . Note that in the classical case, the relative entropy can actually be described by the above formula by replacing the free Brownian motion by a standard Brownian motion and then all the inequalities become equalities. This question raises numerous questions : 1. First, inequalities (7.4.6) and (7.4.8) become equalities if τ b µ ∈ M c,∞ b (F m [0,1] ) that is if there exists n, times (t i , 1 ≤ i ≤ n+1) ∈ [0, 1] n+1 , and polynomial functions (Q i , 1 ≤ i ≤ n) and P such that Can we find non trivial µ ∈ M (m) such that this is true? 2. If we follow the ideas of Chapter 4, to improve the lower bound, we would like to regularize the laws by free convolution by free Cauchy variables C ǫ = (C ǫ 1 , · · · , C ǫ m ) with covariance ǫ. If X = (X 1 , · · · , X m ) is a process satisfying dX t = dS t + K t (X t )dt, for some non-commutative function K t , it is easy to see that X ǫ = X + C ǫ satisfies the same free Fokker-Planck equation with K ǫ t (X t + C ǫ ) = τ (K t (X t )|X t + C ǫ ). Then, does K ǫ is smooth with respect to the operator norm? This is what we proved for one operator in [65]. If this is true in higher dimension, then Connes question is answered positively since by Picard argument dX ǫ t = dS t + K ǫ t (X t )dt has a unique strong solution and there exists a smooth function F ǫ such that for any t > 0 X ǫ t = F ǫ t (S s , s ≤ t). In particular, for any polynomial function P ∈ C X 1 , . . . , X m µ(P (X + C ǫ )) = σ(P • F ǫ 1 (S s , s ≤ 1)) = lim where we used in the last line the smoothness of F ǫ 1 as well as the convergence of the Hermitian Brownian motion towards the free Brownian motion. Hence, since ǫ is arbitrary, we can approximate µ by the empirical distribution of the matrices F ǫ 1 (H N s , s ≤ 1), which would answer Connes question positively. As in remark 7.2, the only way to complete the argument without dealing with Connes question would be to be able to prove such a regularization property only for laws with finite entropy, but it is rather unclear how such a condition could enter into the game. This could only be true if the hyperfinite factor would have specific analytical properties.
3. If we think that the only point of interest is what happens at time one, then we can restrict the preceding discussion by showing that if X = (X 1 , · · · , X m ) are non-commutative variables with law µ, and (J µ i , 1 ≤ i ≤ m) is the Hilbert transform of µ and if we let µ ǫ be the law of X + C ǫ , then we would like to show that (J µ ǫ i , 1 ≤ i ≤ m) is smooth for ǫ > 0. In the case m = 1, J µ ǫ is analytic in {|ℑ(z)| < ǫ}. The generalization to higher dimension is wide open. 4. A related open question posed by D. Voiculescu [129] (in a paragraph entitled Technical problems) could be to try to show that the free convolution acts smoothly on Fisher information in the sense that t ∈ R + → τ X+tS (|J τX+tS i | 2 ) is continuous. 5. A different approach to microstates entropy could be to study the generating functions Λ(P ) given, for P ∈ C X 1 , · · · , X m ⊗ C X 1 , · · · , X m , by It is easy to see (and written down in [67]) that Reciprocally, Λ(P ) = sup τ ∈M (m) {χ(τ ) + τ ⊗ τ (P )}.
Therefore, we see that the understanding of the first order of all matrix models is equivalent to that of χ. In particular, the convergence of all of their free energies would allow to replace the limsup in the definition of the microstates entropy by a liminf, which would already be a great achievement in free entropy theory. Note also that in the usual proof of Cramer's theorem for commutative variables, the main point is to show that one can restrict the supremum over the polynomial functions P ∈ C X 1 , · · · , X m ⊗2 to polynomial functions in C X 1 , · · · , X m (i.e. take linear functions of the empirical distribution). This can not be the case here since this would entail that the microstates entropy is convex which it cannot be according to D. Voiculescu [124] who proved actually that if τ = τ ′ ∈ M (m) with m ≥ 2, τ and τ ′ having finite microstates entropy, then ατ + (1 − α)τ ′ have infinite entropy for α ∈ (0, 1).
Acknowledgments : I would like to take this opportunity to thank all my coauthors, I had a great time developping this research program with them. I am particularly indebted toward O. Zeitouni for a carefull reading of preliminary versions of this manuscript. I am also extremely grateful to many people A. Guionnet/Large deviations for random matrices 162 who very kindly helped and encouraged me in my struggle for understanding points in fields that I used to ignore entirely, among whom D. Voiculescu, D. Shlyakhtenko, N. Brown, P. Sniady, C. Villani, D. Serre, Y. Brenier, A. Okounkov, S. Zelditch, V. Kazakov, I. Kostov, B. Eynard. I wish also to thank the scientific committee and the organizers of the XXIX conference on Stochastic processes and Applications for giving me the opportunity to write these notes, as well as to discover the amazingly enjoyable style of Brazilian conferences.