## The Annals of Statistics

### Principal component analysis for second-order stationary vector time series

#### Abstract

We extend the principal component analysis (PCA) to second-order stationary vector time series in the sense that we seek for a contemporaneous linear transformation for a $p$-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those subseries are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When $p$ is large, an additional step is required to perform a permutation in terms of either maximum cross-correlations or FDR based on multiple tests. The asymptotic theory is established for both fixed $p$ and diverging $p$ when the sample size $n$ tends to infinity. Numerical experiments with both simulated and real data sets indicate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.

#### Article information

Source
Ann. Statist., Volume 46, Number 5 (2018), 2094-2124.

Dates
Revised: July 2017
First available in Project Euclid: 17 August 2018

https://projecteuclid.org/euclid.aos/1534492830

Digital Object Identifier
doi:10.1214/17-AOS1613

Mathematical Reviews number (MathSciNet)
MR3845012

Zentralblatt MATH identifier
06964327

#### Citation

Chang, Jinyuan; Guo, Bin; Yao, Qiwei. Principal component analysis for second-order stationary vector time series. Ann. Statist. 46 (2018), no. 5, 2094--2124. doi:10.1214/17-AOS1613. https://projecteuclid.org/euclid.aos/1534492830

#### References

• Anderson, T. W. (1963). The use of factor analysis in the statistical analysis of multiple time series. Psychometrika 28 1–25.
• Back, A. D. and Weigend, A. S. (1997). A first application of independent component analysis to extracting structure from stock returns. Int. J. Neural Syst. 8 473–484.
• Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
• Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F. and Moulines, E. (1997). A blind source separation technique using second-order statistics. IEEE Trans. Signal Process. 45 434–444.
• Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• Box, G. E. P. and Jenkins, G. M. (1970). Times Series Analysis. Forecasting and Control. Holden-Day, San Francisco, CA–London–Amsterdam.
• Box, G. E. P. and Tiao, G. C. (1977). A canonical analysis of multiple time series. Biometrika 64 355–365.
• Brillinger, D. R. (1975). Time Series: Data Analysis and Theory. Holt, Rinehart and Winston, Inc., New York–Montreal, QC–London.
• Brockwell, P. J. and Davis, R. A. (1996). Introduction to Time Series and Forecasting. Springer, New York.
• Cardoso, J. (1998). Multidimensional independent component analysis. In Proceedings of the 1998 IEEE Int. Conf. Acoustics, Speech and Signal Processing 4 1941–1944.
• Chang, J., Guo, B. and Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. J. Econometrics 189 297–312.
• Chang, J., Guo, B. and Yao, Q. (2018). Supplement to “Principal component analysis for second-order stationary vector time series.” DOI:10.1214/17-AOS1613SUPP.
• Chang, J., Yao, Q. and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika 104 111–127.
• Davis, R. A., Zang, P. and Zheng, T. (2016). Sparse vector autoregressive modeling. J. Comput. Graph. Statist. 25 1077–1096.
• Fan, J., Wang, M. and Yao, Q. (2008). Modelling multivariate volatilities via conditionally uncorrelated components. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 679–702.
• Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York.
• Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2005). The generalized dynamic factor model: One-sided estimation and forecasting. J. Amer. Statist. Assoc. 100 830–840.
• Guo, S., Wang, Y. and Yao, Q. (2016). High-dimensional and banded vector autoregressions. Biometrika 103 889–903.
• Han, F., Lu, H. and Liu, H. (2015). A direct estimation of high dimensional stationary vector autoregressions. J. Mach. Learn. Res. 16 3115–3150.
• Huang, D. and Tsay, R. S. (2014). A refined scalar component approach to multivariate time series modeling. Manuscript.
• Hyvärinen, A., Karhunen, J. and Oja, E. (2001). Independent Component Analysis. Wiley, New York.
• Jakeman, A. J., Steele, L. P. and Young, P. C. (1980). Instrumental variable algorithms for multiple input systems described by multiple transfer functions. IEEE Trans. Syst. Man Cybern. Syst. 10 593–602.
• Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist. 40 694–726.
• Lam, C., Yao, Q. and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika 98 901–918.
• Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
• Liu, W., Xiao, H. and Wu, W. B. (2013). Probability and moment inequalities under dependence. Statist. Sinica 23 1257–1272.
• Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, Berlin.
• Matteson, D. S. and Tsay, R. S. (2011). Dynamic orthogonal components for multivariate time series. J. Amer. Statist. Assoc. 106 1450–1463.
• Pan, J. and Yao, Q. (2008). Modelling multiple time series via common factors. Biometrika 95 365–379.
• Paparoditis, E. and Politis, D. N. (2012). Nonlinear spectral density estimation: Thresholding the correlogram. J. Time Series Anal. 33 386–397.
• Peña, D. and Box, G. E. P. (1987). Identifying a simplifying structure in time series. J. Amer. Statist. Assoc. 82 836–843.
• Reinsel, G. C. (1993). Elements of Multivariate Time Series Analysis. Springer, New York.
• Rio, E. (2000). Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques & Applications (Berlin) [Mathematics & Applications] 31. Springer, Berlin.
• Sarkar, S. K. and Chang, C.-K. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 92 1601–1608.
• Shojaie, A. and Michailidis, G. (2010). Discovering graphical Granger causality using the truncated lasso penalty. Bioinformatics 26 517–523.
• Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754.
• Song, S. and Bickel, P. J. (2011). Large vector auto regressions. Available at arXiv:1106.3519.
• Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
• Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167–1179.
• Stock, J. H. and Watson, M. W. (2005). Implications of dynamic factor models for VAR analysis. Available at: www.nber.org/papers/w11467.
• Theis, F. J., Meyer-Baese, A. and Lang, E. W. (2004). Second-order blind source separation based on multi-dimensional autocovariances. In Independent Component Analysis and Blind Signal Separation (C. G. Puntonet and A. Prieto, eds.) 726—733. Springer, Berlin.
• Tiao, G. C. and Tsay, R. S. (1989). Model specification in multivariate time series. J. Roy. Statist. Soc. Ser. B 51 157–213. With discussion.
• Tong, L., Xu, G. and Kailath, T. (1994). Blind identification and equalization based on second-order statistics: A time domain approach. IEEE Trans. Inform. Theory 40 340–349.
• Tsay, R. S. (2014). Multivariate Time Series Analysis: With R and Financial Applications. Wiley, Hoboken, NJ.

#### Supplemental materials

• Supplement to “Principal component analysis for second-order stationary vector time series”. This supplement contains simulation studies and all technical proofs.