Advances in Applied Probability

Error bounds on multivariate Normal approximations for word count statistics

Haiyan Huang

Source: Adv. in Appl. Probab. Volume 34, Number 3 (2002), 559-586.

Abstract

Given a sequence S and a collection Ω of d words, it is of interest in many applications to characterize the multivariate distribution of the vector of counts U = (N(S,w1), ..., N(S,wd)), where N(S,w) is the number of times a word w ∈ Ω appears in the sequence S. We obtain an explicit bound on the error made when approximating the multivariate distribution of U by the normal distribution, when the underlying sequence is i.i.d. or first-order stationary Markov over a finite alphabet. When the limiting covariance matrix of U is nonsingular, the error bounds decay at rate O((log n) / √n) in the i.i.d. case and O((log n)3 / √n) in the Markov case. In order for U to have a nondegenerate covariance matrix, it is necessary and sufficient that the counted word set Ω is not full, that is, that Ω is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, we use a version of Stein's method.

Primary Subjects: 62E17, 62E20, 60J22, 92D20
Keywords: First-order stationary Markov chain; full word set; Stein's method; word counts

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aap/1033662166
Digital Object Identifier: doi:10.1239/aap/1033662166
Mathematical Reviews number (MathSciNet): MR1929598
Zentralblatt MATH identifier: 1021.62008


2010 © Applied Probability Trust