Electronic Journal of Statistics

Familywise error rate control via knockoffs

Lucas Janson and Weijie Su

Full-text: Open access

Abstract

We present a novel method for controlling the $k$-familywise error rate ($k$-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Candès. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on $p$-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the $k$-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.

Article information

Source
Electron. J. Statist. Volume 10, Number 1 (2016), 960-975.

Dates
Received: October 2015
First available in Project Euclid: 12 April 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1460463651

Digital Object Identifier
doi:10.1214/16-EJS1129

Mathematical Reviews number (MathSciNet)
MR3486422

Zentralblatt MATH identifier
1341.62245

Subjects
Primary: 62J15: Paired and multiple comparisons 62F03: Hypothesis testing
Secondary: 62J05: Linear regression

Keywords
$k$-familywise error rate knockoffs multiple testing linear regression Lasso negative binomial distribution

Citation

Janson, Lucas; Su, Weijie. Familywise error rate control via knockoffs. Electron. J. Statist. 10 (2016), no. 1, 960--975. doi:10.1214/16-EJS1129. https://projecteuclid.org/euclid.ejs/1460463651


Export citation

References

  • [1] R. F. Barber and E. J. Candès. Controlling the false discovery rate via knockoffs., Ann. Statist., 43(5) :2055–2085, 10 2015.
  • [2] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing., Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995.
  • [3] Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency., Ann. Statist., 29(4) :1165–1188, 08 2001.
  • [4] M. Bogdan, E. van den Berg, C. Sabatti, W. Su, and E. J. Candès. SLOPE—adaptive variable selection via convex optimization., The Annals of Applied Statistics, 9(3) :1103, 2015.
  • [5] V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors., The Annals of Statistics, 41(6) :2786–2819, 2013.
  • [6] S. Delattre and E. Roquain. New procedures controlling the false discovery proportion via Romano-Wolf’s heuristic., arXiv preprint arXiv:1311.4030, 2013.
  • [7] O. J. Dunn. Multiple Comparisons Among Means., Journal of the American Statistical Association, 56(293):52–64, 1961.
  • [8] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression., The Annals of Statistics, 32(2):407–499, 2004.
  • [9] C. Genovese and L. Wasserman. A stochastic process approach to false discovery control., The Annals of Statistics, pages 1035–1061, 2004.
  • [10] C. Genovese and L. Wasserman. Exceedance control of the false discovery proportion., Journal of the American Statistical Association, 101(476) :1408–1417, 2006.
  • [11] A. Gordon, G. Glazko, X. Qiu, and A. Yakovlev. Control of the mean number of false discoveries, bonferroni and stability of multiple testing., Ann. Appl. Stat., 1(1):179–190, 06 2007.
  • [12] W. Guo, L. He, and S. K. Sarkar. Further results on controlling the false discovery proportion., Ann. Statist., 42(3) :1070–1101, 06 2014.
  • [13] W. Guo and J. P. Romano. A generalized Sidak-Holm procedure and control of generalized error rates under independence., Statistical Applications in Genetics and Molecular Biology, 6(1), 2007.
  • [14] Y. Hochberg. A sharper Bonferroni procedure for multiple tests of significance., Biometrika, 75(4):800–802, 1988.
  • [15] S. Holm. A simple sequentially rejective multiple test procedure., Scandinavian Journal of Statistics, 6(2):65–70, 1979.
  • [16] G. Hommel and T. Hoffmann. Controlled uncertainty. In P. Bauer, G. Hommel, and E. Sonnemann, editors, Multiple Hypothesenprüfung / Multiple Hypotheses Testing, volume 70 of Medizinische Informatik und Statistik, pages 154–161. Springer Berlin Heidelberg, 1988.
  • [17] A. Javanmard and A. Montanari. Confidence intervals and hypothesis testing for high-dimensional regression., The Journal of Machine Learning Research, 15(1) :2869–2909, 2014.
  • [18] S. Karlin and Y. Rinott. Classes of orderings of measures and related correlation inequalities. i. multivariate totally positive distributions., Journal of Multivariate Analysis, 10(4):467 – 498, 1980.
  • [19] J. D. Lee, D. L. Sun, Y. Sun, and J. E. Taylor. Exact post-selection inference with the Lasso., arXiv preprint arXiv:1311.6238, 2013.
  • [20] E. L. Lehmann and J. P. Romano. Generalizations of the familywise error rate., Ann. Statist., 33(3) :1138–1154, 06 2005.
  • [21] R. Lockhart, J. E. Taylor, R. J. Tibshirani, and R. Tibshirani. A significance test for the Lasso., Ann. Statist., 42(2):413–468, 04 2014.
  • [22] R. Marcus, P. Eric, and K. R. Gabriel. On closed testing procedures with special reference to ordered analysis of variance., Biometrika, 63(3):655–660, 1976.
  • [23] X. Meng, J. Wang, and X. Wu. Multiple comparisons controlling expected number of false discoveries., Communications in Statistics - Theory and Methods, 43(13) :2830–2843, 2014.
  • [24] S.-Y. Rhee, W. J. Fessel, A. R. Zolopa, L. Hurley, T. Liu, J. Taylor, D. P. Nguyen, S. Slome, D. Klein, M. Horberg, J. Flamm, S. Follansbee, J. M. Schapiro, and R. W. Shafer. HIV-1 protease and reverse-transcriptase mutations: Correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance., Journal of Infectious Diseases, 192(3):456–465, 2005.
  • [25] S.-Y. Rhee, J. Taylor, G. Wadhera, A. Ben-Hur, D. L. Brutlag, and R. W. Shafer. Genotypic predictors of human immunodeficiency virus type 1 drug resistance., Proceedings of the National Academy of Sciences, 103(46) :17355–17360, 2006.
  • [26] J. P. Romano and A. M. Shaikh. Stepup procedures for control of generalizations of the familywise error rate., Ann. Statist., 34(4) :1850–1873, 08 2006.
  • [27] J. P. Romano and M. Wolf. Control of generalized error rates in multiple testing., The Annals of Statistics, 35(4) :1378–1408, 08 2007.
  • [28] W. Su, M. Bogdan, and E. J. Candès. False discoveries occur early on the Lasso path., arXiv preprint arXiv:1511.01957, 2015.
  • [29] M. J. van der Laan, S. Dudoit, and K. S. Pollard. Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives., Statistical Applications in Genetics and Molecular Biology, 3(1), 2004.
  • [30] Z. Šidák. Rectangular confidence regions for the means of multivariate normal distributions., Journal of the American Statistical Association, 62(318):626–633, 1967.
  • [31] P. H. Westfall and S. S. Young. $p$ value adjustments for multiple tests in multivariate binomial models., Journal of the American Statistical Association, 84(407):780–786, 1989.