The Annals of Applied Statistics

Clonality: Point estimation

Lu Tian, Yi Liu, Andrew Z. Fire, Scott D. Boyd, and Richard A. Olshen

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Assessments of biological complexity for populations that are of mixed species are central in many biological contexts, including microbiomes, tumor cell population structure, and immune cell populations. Here we address the problem of quantifying the population diversity in experiments where high throughput DNA sequencing is used to distinguish a large number of cell subpopulations. Our model assumes a list of clonal species and their observed frequencies in each of several replicate sequencing libraries. Though the underlying distribution of frequencies cannot be estimated well from data coming from only a small fraction of the total cell population, one can estimate well the population-level clonality, defined as the sum of squared underlying fractions of the respective clones, the complement of the Gini–Simpson index. Specifically, we proposed to adaptively combine multiple unbiased estimators of clonality derived from pairs of replicates to construct a single estimator without relying on the commonly used but restrictive multinomial assumption. The new estimator performs particularly well for replicates of unequal size. We further illustrate the proposed methods with extensive simulations and a small real data example.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 113-131.

Received: March 2017
Revised: June 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Clonality V(D)J rearrangements richness jackknife


Tian, Lu; Liu, Yi; Fire, Andrew Z.; Boyd, Scott D.; Olshen, Richard A. Clonality: Point estimation. Ann. Appl. Stat. 13 (2019), no. 1, 113--131. doi:10.1214/18-AOAS1197.

Export citation


  • Aldrich, R. J. (2010). GCHQ: The Uncensored Story of Britain’s Most Secret Intelligence Agency. Harper Collins, London.
  • Boyd, S. D., Marshall, E. L., Merker, J. D., Maniar, J. M., Zhang, L. N., Sahaf, B., Jones, C. D., Simen, B. B., Hanczaruk, B., Nguyen, K. D., Nadeau, K. C., Egholm, M., Miklos, D. B., Zehnder, J. L. and Fire, A. Z. (2009). Measurement and clinical monitoring of human lymphocyte clonality by massively parallel V-D-J pyrosequencing. Sci. Transl. Med. 1 12a23.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
  • Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43 783–791.
  • Chao, A. (1989). Estimating population size for sparse data in capture-recapture experiments. Biometrics 45 427–438.
  • Efron, B. and Thisted, R. (1976). Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 435–447.
  • Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econom. J. 19 C1–C32.
  • Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.
  • Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237–264.
  • Kaplinsky, J. and Aranout, R. (2016). Robust estimates of overall immune-repertoire diversity from high throughput measurements on samples. Nat. Commun. 7 11881. DOI:10.1038/ncomms11881.
  • Laydon, D. J., Bangham, C. R. M. and Asquith, B. (2015). Estimating T-cell repertoire diversity: Limitations of classical estimators and a new approach. Philos. Trans. R. Soc. B 370 20140291. DOI:10.1098/rstb.2014.0291.
  • McKane, A. G., Alonso, D. and Solé, R. V. (2004). Analytic solution of Hubbell’s model of local community dynamics. Theor. Popul. Biol. 65 67–73.
  • Miller, R. G. (1974). The jackknife—a review. Biometrika 61 1–15.
  • Parameswaran, P., Liu, Y., Roskin, K. M., Jackson, K. K., Dixit, V. F., Lee, J. Y., Artiles, K. S., Zompi, S., Vargas, M. J., et al. (2013). Convergent antibody signatures in human dengue. Cell Host Microbe 13 691–700.
  • Qi, Q., Liu, Y., Cheng, Y., Glanville, J., Zhang, D., Lee, J.-Y., Olshen, R. A., Weyand, C. M., Boyd, S. and Goronzy, J. J. (2014). Diversity and clonal selection in human T cell repertoire. Proc. Natl. Acad. Sci. USA 111 13139–13144.
  • Robbins, H. E. (1968). Estimating the total probability of the unobserved outcomes of an experiment. Ann. Math. Stat. 39 256–257.
  • Schatz, D. G. and Ji, Y. (2011). Recombination centres and the orchestration of V (D) J recombination. Nat. Rev., Immunol. 11 251–263.
  • Tian, L., Greenberg, S. A., Kong, S. S., Altschuler, J., Kohane, I. S. and Park, P. J. (2005). Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. 102 13544–9.