## The Annals of Applied Statistics

### Principal trend analysis for time-course data with applications in genomic medicine

#### Abstract

Time-course high-throughput gene expression data are emerging in genomic and translational medicine. Extracting interesting time-course patterns from a patient cohort can provide biological insights for further clinical research and patient treatment. We propose principal trend analysis (PTA) to extract principal trends of time-course gene expression data from a group of patients, and identify genes that make dominant contributions to the principal trends. Through simulations, we demonstrate the utility of PTA for dimension reduction, time-course signal recovery and feature selection with high-dimensional data. Moreover, PTA derives new insights in real biological and clinical research. We demonstrate the usefulness of PTA by applying it to longitudinal gene expression data of a circadian regulation system and burn patients. These applications show that PTA can extract interesting time-course trends with biological significance, which helps the understanding of biological mechanisms of circadian regulation systems as well as the recovery of burn patients. Overall, the proposed PTA approach will benefit the genomic medicine research. Our method is implemented into an R-package: PTA (Principal Trend Analysis).

#### Article information

Source
Ann. Appl. Stat., Volume 7, Number 4 (2013), 2205-2228.

Dates
First available in Project Euclid: 23 December 2013

https://projecteuclid.org/euclid.aoas/1387823316

Digital Object Identifier
doi:10.1214/13-AOAS659

Mathematical Reviews number (MathSciNet)
MR3161719

Zentralblatt MATH identifier
1283.62126

#### Citation

Zhang, Yuping; Davis, Ronald. Principal trend analysis for time-course data with applications in genomic medicine. Ann. Appl. Stat. 7 (2013), no. 4, 2205--2228. doi:10.1214/13-AOAS659. https://projecteuclid.org/euclid.aoas/1387823316

#### References

• Bläsing, O. E., Gibon, Y., Günther, M., Höhne, M., Morcuende, R., Osuna, D., Thimm, O., Usadel, B., Scheible, W. R. and Stitt, M. (2005). Sugars and circadian regulation make major contributions to the global regulation of diurnal gene expression in Arabidopsis. The Plant Cell Online 17 3257.
• Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
• Breiman, L. and Spector, P. (1992). Submodel selection and evaluation in regression. The $x$-random case. International Statistical Review 60 291–319.
• De Leeuw, J. and Michailidis, G. (1994). Block relaxation algorithms in statistics. In Information Systems and Data Analysis 308–325. Springer, Berlin.
• Finnerty, C. C., Herndon, D. N., Przkora, R., Pereira, C. T., Oliveira, H. M., Queiroz, D. M. M., Rocha, A. M. C. and Jeschke, M. G. (2006). Cytokine expression profile over time in severely burned pediatric patients. Shock 26 13–19.
• Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• Hubbell, E., Liu, W.-M. and Mei, R. (2002). Robust estimators for expression analysis. Bioinformatics 18 1585–1592.
• Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495–502.
• Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence, Vol. 14 1137–1145. Lawrence Erlbaum Associates, Mahwah, NJ.
• Li, C. and Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98 31–36.
• Li, H. T., Su, Y. P., Cheng, T. M., Xu, J. M., Liao, J., Chen, J. C., Ji, C. Y., Ai, G. P. and Wang, J. P. (2010). The interaction between interferon-induced protein with tetratricopeptide repeats-1 and eukaryotic elongation factor-1A. Molecular and Cellular Biochemistry 337 101–110.
• Mairal, J., Bach, F., Ponce, J. and Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11 19–60.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
• Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
• Witten, D. M. and Tibshirani, R. J. (2009). Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8 Art. 28, 29.
• Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• Zhang, Y. and Davis, R. (2013). Supplement to “Principal trend analysis for time-course data with applications in genomic medicine.” DOI:10.1214/13-AOAS659SUPP.
• Zhang, Y., Tibshirani, R. J. and Davis, R. W. (2010). Predicting patient survival from longitudinal gene expression. Stat. Appl. Genet. Mol. Biol. 9 Art. 41, 23.
• Zhang, Y., Tibshirani, R. and Davis, R. (2013). Classification of patients from time-course gene expression. Biostatistics 14 87–98.
• Zhong, H. H., Young, J. C., Pease, E. A., Hangarter, R. P. and McClung, C. R. (1994). Interactions between light and the circadian clock in the regulation of CAT2 expression in Arabidopsis. Plant Physiology 104 889–898.
• Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
• Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.

#### Supplemental materials

• Supplementary material: Supplement to “Principal trend analysis for time-course data with applications in genomic medicine”. The supplementary material includes “Proof of biconvex property” and “Derivation of PTA algorithm.”.