We study the problem of learning a node-labeled tree given independent traces from an appropriately defined deletion channel. This problem, tree trace reconstruction, generalizes string trace reconstruction, which corresponds to the tree being a path. For many classes of trees, including complete trees and spiders, we provide algorithms that reconstruct the labels using only a polynomial number of traces. This exhibits a stark contrast to known results on string trace reconstruction, which require exponentially many traces, and where a central open problem is to determine whether a polynomial number of traces suffice. Our techniques combine novel combinatorial and complex analytic methods.
The research of S.D. was supported by NSF CAREER Grant 1651861 and the David & Lucile Packard Foundation. The research of M.Z.R. was supported in part by NSF Grant DMS-1811724.
We thank Nina Holden for helpful discussions relating to Lemma 5.3, and Bichlien Nguyen and Karin Strauss for pointing us to connections on branched DNA and recent work in this area. We also thank Alyshia Olsen for help designing the figures. Finally, we thank Tatiana Brailovskaya and an anonymous referee for their careful reading of the paper and their numerous helpful questions and suggestions that helped improve the paper.
An extended abstract of this paper appears in the Proceedings of the 32nd Conference on Learning Theory (COLT), 2019 .
"Reconstructing trees from traces." Ann. Appl. Probab. 31 (6) 2772 - 2810, December 2021. https://doi.org/10.1214/21-AAP1662