The Annals of Mathematical Statistics

On a Characterization of the Normal Distribution from Properties of Suitable Linear Statistics

R. G. Laha

Abstract

In recent years, problems related to the characterization of the normal distribution from the property of stochastic independence of linear functions of independent random variables have been investigated by various authors. The most general result in this direction is that obtained independently by Darmois [4] and Skitovich [14], who proved that if there exist two linear functions $$X = \sum^n_{j = 1} a_jx_j \text{and} Y = \sum^n_{j = 1} b_jx_j\quad\text{with} a_ib_j \neq 0$$ $(j = 1, 2, \cdots n)$, such that they are stochastically independent, where $x_1, x_2, \cdots, x_n$ are $n$ independently (but not necessarily identically) distributed proper random variables, then each $x_j$ is normally distributed. Their methods of proof are similar in nature, both being based on the use of characteristic functions, without any assumption about the existence of moments. The same theorem has been proved independently by Basu [1], under the assumption that the random variables are identically and independently distributed and have finite moments of all orders. This result is also obtained independently by Lukacs and King [11], under the assumption that the random variables are independently (but not necessarily identically) distributed, each having finite moments up to order $n$, and by Linnik [10], under the assumption that the random variables have only finite variances. The special case of this theorem for $n = 2$ was proved earlier independently by Kac [6], Gnedenko [5], and Darmois [3], without any assumption about the existence of moments. Thus we see that the problems on the consequences of stochastic independence of linear statistics have been exhaustively studied. Hence the question that naturally arises is whether similar investigations into the distribution laws of the random variables are possible under the assumption of a suitable type of stochastic dependence of the linear statistics. In this direction, the author [8] has derived a characterization of the stable law with finite expectation from the property of linearity of regression of one linear statistic on the other for the case $n = 2$. The author [7] has also obtained some characterizations of the normal distribution from the consequence of the linearity of the multiple regression of one random variable on several others, when the variables have the linear structural setup as in the case of the bi-factor theory of Spearman. For the formulation of the problem investigated in the present paper, we require a precise definition of the terms conditional distribution, linearity of regression, and homoscedasticity. Let $F(x, y)$ and $F_0(x)$ denote respectively the distribution function of the two-dimensional random variable $(x, y)$ and the marginal distribution function of $x$. Then we define the conditional distribution function of $y$ for fixed $x$ by $F_x(y)$ so that it satisfies the relation \begin{equation*}\tag{1.1}\int^x_{-\infty} F_\xi(y) dF_0(\xi) = F(x, y).\end{equation*} In the present investigation, the following assumptions on the distributions of the random variables will be made: Assumption 1. The conditional distribution of $y$ for fixed $x$ as defined in (1.1) is assumed to exist, wherever needed. Assumption 2. Each of the random variables concerned has a finite second moment. This assumption allows us to take derivatives of the characteristic functions of the corresponding random variables up to and including the second order. Assumption 3. Without any loss of generality in the proof, it is assumed that the expectation of each of the random variables concerned is equal to zero. The role of these assumptions is to ensure the existence of the expectation and the variance of the conditional distribution of $y$ for fixed $x$, which may be defined respectively as \begin{align*}\tag{1.2}E_x(y) &= \int^\infty_{-\infty} y dF_x(y), \\ \tag{1.3}V_x(y) &= \int^\infty_{-\infty} \{y - E_x(y)\}^2 dF_x(y) \\ &= S_x(y) - \{E_x(y)\}^2, \end{align*} where $$S_x(y) = \int^\infty_{-infty} y^2 dF_x(y).$$ In this case, the regression of $y$ on $x$ is said to be linear, if the relation \begin{equation*}\tag{1.4}E_x(y) = \beta x\end{equation*} is satisfied for all $x$, except for a set of probability measure zero, as the expectations of both the random variables $x$ and $y$ are already assumed to be zero. The constant $\beta$ in equation (1.4) is called the coefficient of regression of $y$ on $x$. Similarly the conditional distribution of $y$ for fixed $x$ is said to be homoscedastic, if the conditional variance $V_x(y)$ as defined in (1.3) is a constant $\sigma^2_0$ not involving $x$. Thus if the regression of $y$ and $x$ is linear and given by $\beta x$ and the conditional variance of $y$ for fixed $x$ is a constant $\sigma^2_0$, we have the relation \begin{equation*}\tag{1.5}S_x(y) = \sigma^2_0 + \beta^2x^2\end{equation*} to be satisfied for all $x$, in addition to the condition (1.4). For simplicity in notation, throughout the present paper we shall use the term the conditional distribution of $y$ for fixed $x$ is L.R.H. ($\beta, \sigma^2_0)$ meaning thereby that the regression of $y$ on $x$ is linear and given by $\beta x$ and that the conditional variance of $y$ for fixed $x$ is $\sigma^2_0$, being free of $x$, i.e., equivalent to the statement that both the relations (1.4) and (1.5) are simultaneously satisfied. In the following sections we shall derive some characterizations of the normal distribution from the property of linearity of regression and homoscedasticity of suitable linear statistics. The main theorem (Theorem 3.2) is given in Section 3. The proof of this theorem uses as a starting point a very simple set of necessary and sufficient conditions for the linearity of regression and homoscedasticity (Lemma 2.1) and finally a theorem of Linnik (Lemma 2.3) on an analytical extension of Cramer's theorem [2] on the normal law. Several important corollaries are deduced in the subsequent section.

Article information

Source
Ann. Math. Statist., Volume 28, Number 1 (1957), 126-139.

Dates
First available in Project Euclid: 27 April 2007

https://projecteuclid.org/euclid.aoms/1177707041

Digital Object Identifier
doi:10.1214/aoms/1177707041

Mathematical Reviews number (MathSciNet)
MR83833

Zentralblatt MATH identifier
0080.13304

JSTOR