Open Access
2013 Consistent model selection of discrete Bayesian networks from incomplete data
Nikolay Balov
Electron. J. Statist. 7: 1047-1077 (2013). DOI: 10.1214/13-EJS802

Abstract

A maximum likelihood based model selection of discrete Bayesian networks is considered. The structure learning is performed by employing a scoring function $S$, which, for a given network $G$ and $n$-sample $D_{n}$, is defined as the maximum marginal log-likelihood $l$ minus a penalization term $\lambda_{n}h$ proportional to network complexity $h(G)$, $$S(G|D_{n})=l(G|D_{n})-\lambda_{n}h(G).$$ An available case analysis is developed with the standard log-likelihood replaced by the sum of sample average node log-likelihoods. The approach utilizes partially missing data records and allows for comparison of models fitted to different samples.

In missing completely at random settings the estimation is shown to be consistent if and only if the sequence $\lambda_{n}$ converges to zero at a slower than $n^{-{1/2}}$ rate. In particular, the BIC model selection ($\lambda_{n}=0.5\log(n)/n$) applied to the node-average log-likelihood is shown to be inconsistent in general. This is in contrast to the complete data case when BIC is known to be consistent. The conclusions are confirmed by numerical experiments.

Citation

Download Citation

Nikolay Balov. "Consistent model selection of discrete Bayesian networks from incomplete data." Electron. J. Statist. 7 1047 - 1077, 2013. https://doi.org/10.1214/13-EJS802

Information

Published: 2013
First available in Project Euclid: 15 April 2013

zbMATH: 1336.62087
MathSciNet: MR3044509
Digital Object Identifier: 10.1214/13-EJS802

Subjects:
Primary: 62F12
Secondary: 62H12

Keywords: Bayesian networks , categorical data , missing completely at random , Model selection , penalized maximum likelihood

Rights: Copyright © 2013 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top