We are given a set of n points that might be uniformly distributed in the unit square [0,1]2. We wish to test whether the set, although mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve with Cα-norm bounded by β. An asymptotic detection threshold exists in this problem; for a constant T−(α,β)>0, if the number of points sampled from the curve is smaller than T−(α,β)n1/(1+α), reliable detection is not possible for large n. We describe a multiscale significant-runs algorithm that can reliably detect concentration of data near a smooth curve, without knowing the smoothness information α or β in advance, provided that the number of points on the curve exceeds T*(α,β)n1/(1+α). This algorithm therefore has an optimal detection threshold, up to a factor T*/T−.
At the heart of our approach is an analysis of the data by counting membership in multiscale multianisotropic strips. The strips will have area 2/n and exhibit a variety of lengths, orientations and anisotropies. The strips are partitioned into anisotropy classes; each class is organized as a directed graph whose vertices all are strips of the same anisotropy and whose edges link such strips to their “good continuations.” The point-cloud data are reduced to counts that measure membership in strips. Each anisotropy graph is reduced to a subgraph that consist of strips with significant counts. The algorithm rejects H0 whenever some such subgraph contains a path that connects many consecutive significant counts.
"Adaptive multiscale detection of filamentary structures in a background of uniform random points." Ann. Statist. 34 (1) 326 - 349, February 2006. https://doi.org/10.1214/009053605000000787