Internet Mathematics

Estimating entropy and entropy norm on data streams

Amit Chakrabarti, Khanh Do Ba, and S. Muthukrishnan

Source: Internet Math. Volume 3, Number 1 (2006), 63-78.

Abstract

We consider the problem of computing information-theoretic functions, such as entropy, on a data stream, using sublinear space.

Our first result deals with a measure we call the entropy norm of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic-space, one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold.

Our second group of results is for estimating the empirical entropy of an input stream. We first present a sublinear-space, one-pass algorithm for this problem. For a stream of $m$ items and a given real parameter $\alpha$, our algorithm uses space $\widetilde{O}(m^{2\alpha})$ and provides an approximation of $1/\alpha$ in the worst case and $(1+\eps)$ in "most'' cases. We then present a two-pass, polylogarithmic-space, $(1+\eps)$-approximation algorithm. All our algorithms are quite simple.

Primary Subjects: 68P99
Secondary Subjects: 94A15

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.im/1175266368
Mathematical Reviews number (MathSciNet): MR2283884


2010 © A K Peters, Ltd.