The Annals of Statistics

Successive normalization of rectangular arrays

Richard A. Olshen and Bala Rajaratnam

Full-text: Open access

Abstract

Standard statistical techniques often require transforming data to have mean 0 and standard deviation 1. Typically, this process of “standardization” or “normalization” is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates “outcome” for a specific feature, for example, “gene,” “polymorphic site” or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires BOTH the subjects and the features to be “on the same footing.” Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.

Article information

Source
Ann. Statist. Volume 38, Number 3 (2010), 1638-1664.

Dates
First available in Project Euclid: 24 March 2010

Permanent link to this document
http://projecteuclid.org/euclid.aos/1269452650

Digital Object Identifier
doi:10.1214/09-AOS743

Mathematical Reviews number (MathSciNet)
MR2662355

Zentralblatt MATH identifier
05712434

Subjects
Primary: 62H05: Characterization and structure theory 60F15: Strong theorems 60G46: Martingales and classical analysis

Keywords
Normalization standardization backwards martingale convergence theorem

Citation

Olshen, Richard A.; Rajaratnam, Bala. Successive normalization of rectangular arrays. Ann. Statist. 38 (2010), no. 3, 1638--1664. doi:10.1214/09-AOS743. http://projecteuclid.org/euclid.aos/1269452650.


Export citation

References

  • [1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • [2] Ashley, E. A., Ferrara, R., King, J. Y., Vailaya, A., Kuchinsky, A., He, X., Byers, B., Gerckens, U., Oblin, S., Tsalenko, A., Soito, A., Spin, J., Tabibiazar, R., Connolly, A. J., Simpson, J. B., Grube, E. and Quertermous, T. (2006). Network analysis of human in-stent restenosis. Circulation 114 2644–2654.
  • [3] Doob, J. L. (1940). Regularity properties of certain functions of chance variables. Trans. Amer. Math. Soc. 47 455–486.
  • [4] Durrett, R. (1995). Probability: Theory and Examples, 2nd ed. Duxbury Press, Belmont, CA.
  • [5] Efron, B. (1969). Student’s t-test under symmetry conditions. J. Amer. Statist. Assoc. 64 1278–1302.
  • [6] Feller, W. (1971). An Introduction to Probability Theory and Its Applications 2, 2nd ed. Wiley, New York.
  • [7] Gnedenko, B. V. and Kolmogorov, A. N. (1954). Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Boston, MA.
  • [8] Muirhead, R. J. (1999). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • [9] Scheffé, H. (1999). The Analysis of Variance. Wiley, New York.
  • [10] Zolotarev, V. M. (1986). One-Dimensional Stable Distributions. Amer. Math. Soc., Providence, RI.