Institute of Mathematical Statistics Collections

Kendall’s tau in high-dimensional genomic parsimony

Pranab K. Sen

Abstract

High-dimensional data models, often with low sample size, abound in many interdisciplinary studies, genomics and large biological systems being most noteworthy. The conventional assumption of multinormality or linearity of regression may not be plausible for such models which are likely to be statistically complex due to a large number of parameters as well as various underlying restraints. As such, parametric approaches may not be very effective. Anything beyond parametrics, albeit, having increased scope and robustness perspectives, may generally be baffled by the low sample size and hence unable to give reasonable margins of errors. Kendall’s tau statistic is exploited in this context with emphasis on dimensional rather than sample size asymptotics. The Chen–Stein theorem has been thoroughly appraised in this study. Applications of these findings in some microarray data models are illustrated.

First Page: Show Hide
Primary Subjects: 62G10, 62G99
Secondary Subjects: 62P99
Keywords: bioinformatics; Chen–Stein theorem; dimensional asymptotics; FDR; multiple hypotheses testing; nonparametrics; permutational invariance; U-statistics
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.imsc/1209398473
Digital Object Identifier: doi:10.1214/074921708000000183

References

[1] Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen–Stein method: Rejoinder. Statist. Sci. 5 432–434.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
[3] Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probab. 3 534–545.
Mathematical Reviews (MathSciNet): MR428387
Digital Object Identifier: doi:10.1214/aop/1176996359
[4] Dudoit, S., Shaffer, J. and Boldrick, J. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
Mathematical Reviews (MathSciNet): MR1997066
Digital Object Identifier: doi:10.1214/ss/1056397487
Project Euclid: euclid.ss/1056397487
[5] Ghosal, S., Sen, A. and van der Vaart, A. W. (2000). Testing monotonicity of regression. Ann. Statist. 28 1054–1081.
Mathematical Reviews (MathSciNet): MR1810919
Zentralblatt MATH: 1105.62337
Digital Object Identifier: doi:10.1214/aos/1016218228
Project Euclid: euclid.aos/1015956707
[6] Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 800–802.
Mathematical Reviews (MathSciNet): MR995126
Zentralblatt MATH: 0661.62067
Digital Object Identifier: doi:10.1093/biomet/75.4.800
[7] Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 293–325.
Mathematical Reviews (MathSciNet): MR26294
Zentralblatt MATH: 0032.04101
Digital Object Identifier: doi:10.1214/aoms/1177730196
Project Euclid: euclid.aoms/1177730196
[8] Jurečková, J. and Sen, P. K. (1996). Robust Statistical Procedures: Asymptotics and Interrelations. Wiley, New York.
[9] Karlin, S. (1969). A First Course in Stochastic Processes. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR208657
[10] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
Mathematical Reviews (MathSciNet): MR2195631
Zentralblatt MATH: 1072.62060
Digital Object Identifier: doi:10.1214/009053605000000084
Project Euclid: euclid.aos/1120224098
[11] Lobenhofer, E. K., Bennett, L., Cable, P. L., Li, L., Bushel, P. R. and Afshari, C. A. (2002). Regulation of DNA replication fork genes by 17β-estradiol. Molecular Endocrinology 16 1219–1229.
[12] Peddada, S., Harris, S., Zajd, J. and Harvey, E. (2005). ORIGEN: Order restricted inference ordered gene expression data. Bioinformatics 21 3933–3934.
[13] Roy, S. N. (1953). A heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statist. 24 220–238.
Mathematical Reviews (MathSciNet): MR57519
Zentralblatt MATH: 0051.36701
Digital Object Identifier: doi:10.1214/aoms/1177729029
Project Euclid: euclid.aoms/1177729029
[14] Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. Ann. Statist. 34 394–415.
Mathematical Reviews (MathSciNet): MR2275247
Zentralblatt MATH: 1091.62060
Digital Object Identifier: doi:10.1214/009053605000000778
Project Euclid: euclid.aos/1146576268
[15] Sen, P. K. (1968). Estimates of regression coefficients based on Kendall’s tau. J. Amer. Statist. Assoc. 63 1379–1389.
Mathematical Reviews (MathSciNet): MR258201
Zentralblatt MATH: 0167.47202
Digital Object Identifier: doi:10.2307/2285891
[16] Sen, P. K. (2004). Excursions in Biostochastics: Biometry to Biostatistics to Bioinformatics. Institute of Statistical Studies, Academia Sinica, Taipei.
[17] Sen, P. K. (2006). Robust statistical inference for high-dimensional data models with applications to genomics. Austrian J. Statist. 35 197–214.
[18] Sen, P. K., Tsai, M.-T. and Jou, Y.-S. (2007). High-dimension low sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration. J. Amer. Statist. Assoc. 102 686–694.
Mathematical Reviews (MathSciNet): MR2370860
Zentralblatt MATH: 1172.62335
Digital Object Identifier: doi:10.1198/016214507000000077
[19] Sibuya, M. (1959). Bivariate extreme statistics. Ann. Inst. Statist. Math. 11 195–210.
[20] Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754.
Mathematical Reviews (MathSciNet): MR897872
Zentralblatt MATH: 0613.62067
Digital Object Identifier: doi:10.1093/biomet/73.3.751
[21] Storey, J. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing. J. Roy. Statist. Soc. Ser. B 69 1–22.
Mathematical Reviews (MathSciNet): MR2323757
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.005592.x

2012 © Institute of Mathematical Statistics

Institute of Mathematical Statistics Collections

Institute of Mathematical Statistics Collections