Error bounds on multivariate Normal approximations for word count statistics
Haiyan Huang
Source: Adv. in Appl. Probab. Volume 34, Number 3 (2002), 559-586.
Abstract
Given a sequence S and a collection Ω of d words, it is of interest in many applications to characterize the multivariate distribution of the vector of counts U = (N(S,w1), ..., N(S,wd)), where N(S,w) is the number of times a word w ∈ Ω appears in the sequence S. We obtain an explicit bound on the error made when approximating the multivariate distribution of U by the normal distribution, when the underlying sequence is i.i.d. or first-order stationary Markov over a finite alphabet. When the limiting covariance matrix of U is nonsingular, the error bounds decay at rate O((log n) / √n) in the i.i.d. case and O((log n)3 / √n) in the Markov case. In order for U to have a nondegenerate covariance matrix, it is necessary and sufficient that the counted word set Ω is not full, that is, that Ω is not the collection of all possible words of some length k over the given finite alphabet. To supply the bounds on the error, we use a version of Stein's method.
Full-text: Access denied (no subscription detected)
Permanent link to this document: http://projecteuclid.org/euclid.aap/1033662166
Digital Object Identifier: doi:10.1239/aap/1033662166
Mathematical Reviews number (MathSciNet):
MR1929598
Zentralblatt MATH identifier:
1021.62008
Advances in Applied Probability