## The Annals of Statistics

### Parameter priors for directed acyclic graphical models and the characterization of several probability distributions

#### Abstract

We develop simple methods for constructing parameter priors for model choice among directed acyclic graphical (DAG) models. In particular, we introduce several assumptions that permit the construction of parameter priors for a large number of DAG models from a small set of assessments. We then present a method for directly computing the marginal likelihood of every DAG model given a random sample with no missing observations. We apply this methodology to Gaussian DAG models which consist of a recursive set of linear regression models. We show that the only parameter prior for complete Gaussian DAG models that satisfies our assumptions is the normal-Wishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let $W$ be an $n \times n$, $n \ge 3$, positive definite symmetric matrix of random variables and $f(W)$ be a pdf of $W$. Then, $f(W)$ is a Wishart distribution if and only if $W_{11} - W_{12} W_{22}^{-1} W'_{12}$ is independent of $\{W_{12},W_{22}\}$ for every block partitioning $W_{11},W_{12}, W'_{12}, W_{22}$ of $W$. Similar characterizations of the normal and normal-Wishart distributions are provided as well.

#### Article information

Source
Ann. Statist. Volume 30, Number 5 (2002), 1412-1440.

Dates
First available in Project Euclid: 28 October 2002

http://projecteuclid.org/euclid.aos/1035844981

Digital Object Identifier
doi:10.1214/aos/1035844981

Mathematical Reviews number (MathSciNet)
MR1936324

Zentralblatt MATH identifier
1016.62064

#### Citation

Geiger, Dan; Heckerman, David. Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 (2002), no. 5, 1412--1440. doi:10.1214/aos/1035844981. http://projecteuclid.org/euclid.aos/1035844981.

#### References

• ACZÉL, J. (1966). Lectures on Functional Equations and Their Applications. Academic Press, New York.
• ANDERSSON, S. A., MADIGAN, D. and PERLMAN, M. D. (1997). A characterization of Markov equivalence classes for acy clic digraphs. Ann. Statist. 25 505-541.
• BERNARDO, J. M. and SMITH, A. F. M. (1994). Bayesian Theory. Wiley, New York.
• BUNTINE, W. (1994). Operations for learning with graphical models. J. Artificial Intelligence Research 2 159-225.
• CHICKERING, D. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal 87-98. Morgan Kaufmann, San Francisco.
• CHICKERING, D. (1996). Learning Bayesian networks from data. Ph.D. dissertation, Univ. California, Los Angeles.
• COOPER, G. and HERSKOVITS, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9 309-347.
• COWELL, R., DAWID, A. P., LAURITZEN, S. and SPIEGELHALTER, D. (1999). Probabilistic Networks and Expert Sy stems. Springer, New York.
• DAWID, A. P. and LAURITZEN, S. (1993). Hy per-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272-1317.
• DEGROOT, M. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
• FRIEDMAN, N. and GOLDSZMIDT, M. (1997). Sequential update of Bayesian network structures. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence 165-174. Morgan Kaufmann, Providence, RI.
• GEIGER, D. and HECKERMAN, D. (1994). Learning Gaussian networks. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence 235-243. Morgan Kaufmann, San Francisco.
• GEIGER, D. and HECKERMAN, D. (1997). A characterization of the Dirichlet distribution through global and local parameter independence. Ann. Statist. 25 1344-1369.
• GEIGER, D. and HECKERMAN, D. (1998). A characterization of the bivariate Wishart distribution. Probab. Math. Statist. 18 119-131.
• GEIGER, D. and HECKERMAN, D. (1999). Parameter priors for directed graphical models and the characterization of several probability distributions. In Proceedings of Fifteenth Conference on Uncertainty in Artificial Intelligence 216-225. Morgan Kaufmann, San Francisco.
• HECKERMAN, D. and GEIGER, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence 274-284. Morgan Kaufmann, San Francisco.
• HECKERMAN, D., GEIGER, D. and CHICKERING, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 197-243.
• HECKERMAN, D., MAMDANI, A. and WELLMAN, M. (1995). Real-world applications of Bayesian networks. Comm. ACM 38.
• HOWARD, R. and MATHESON, J. (1981). Influence diagrams. In The Principles and Applications of Decision Analy sis 2 (R. Howard and J. Matheson, eds.) 721-762. Strategic Decisions Group, Menlo Park, CA.
• JÁRAI, A. (1986). On regular solutions of functional equations. Aequationes Math. 30 21-54.
• JÁRAI, A. (1998). Regularity property of the functional equation of the Dirichlet distribution. Aequationes Math. 56 37-46.
• KADANE, J. B., DICKEY, J. M., WINKLER, R. L., SMITH, W. S. and PETERS, S. C. (1980). Interactive elicitation of opinion for a normal linear model. J. Amer. Statist. Assoc. 75 845-854.
• KAGAN, A. M., LINNIK, Y. V. and RAO, C. R. (1973). Characterization Problems in Mathematical Statistics. Wiley, New York.
• MADIGAN, D., ANDERSSON, S. A., PERLMAN, M. D. and VOLINSKY, C. T. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acy clic digraphs. Comm. Statist. Theory Methods 25 2493-2519.
• PEARL, J. (1988). Probabilistic Reasoning in Intelligent Sy stems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA.
• PRESS, J. S. (1972). Applied Multivariate Analy sis. Holt, Rinehart and Winston, New York.
• SHACHTER, R. and KENLEY, C. (1989). Gaussian influence diagrams. Management Sci. 35 527- 550.
• SPIEGELHALTER, D., DAWID, A., LAURITZEN, S. and COWELL, R. (1993). Bayesian analysis in expert sy stems (with discussion). Statist. Sci. 8 219-283.
• SPIEGELHALTER, D. and LAURITZEN, S. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks 20 579-605.
• SPIRTES, P., GLy MOUR, C. and SCHEINES, R. (2001). Causation, Prediction, and Search. MIT Press.
• SPIRTES, P. and MEEK, C. (1995). Learning Bayesian networks with discrete variables from data. In Proceedings of First International Conference on Knowledge Discovery and Data Mining 294-299. Morgan Kaufmann, San Francisco.
• THIESSON, B., MEEK, C., CHICKERING, D. and HECKERMAN, D. (1998). Computationally efficient methods for selecting among mixtures of graphical models. In Bayesian Statistics 6 (J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 631-656. Clarendon Press, Oxford.
• VERMA, T. and PEARL, J. (1990). Equivalence and sy nthesis of causal models. In Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence 220-227. Morgan Kaufmann, San Francisco.
• REDMOND, WASHINGTON 98052-6399 E-MAIL: heckerma@microsoft.com