Communications in Applied Mathematics and Computational Science

Analysis of persistent nonstationary time series and applications

Philipp Metzner, Lars Putzig, and Illia Horenko

Full-text: Open access

Abstract

We give an alternative and unified derivation of the general framework developed in the last few years for analyzing nonstationary time series. A different approach for handling the resulting variational problem numerically is introduced. We further expand the framework by employing adaptive finite element algorithms and ideas from information theory to solve the problem of finding the most adequate model based on a maximum-entropy ansatz, thereby reducing the number of underlying probabilistic assumptions. In addition, we formulate and prove the result establishing the link between the optimal parametrizations of the direct and the inverse problems and compare the introduced algorithm to standard approaches like Gaussian mixture models, hidden Markov models, artificial neural networks and local kernel methods. Furthermore, based on the introduced general framework, we show how to create new data analysis methods for specific practical applications. We demonstrate the application of the framework to data samples from toy models as well as to real-world problems such as biomolecular dynamics, DNA sequence analysis and financial applications.

Article information

Source
Commun. Appl. Math. Comput. Sci., Volume 7, Number 2 (2012), 175-229.

Dates
Received: 29 July 2011
Revised: 23 March 2012
Accepted: 5 May 2012
First available in Project Euclid: 20 December 2017

Permanent link to this document
https://projecteuclid.org/euclid.camcos/1513732056

Digital Object Identifier
doi:10.2140/camcos.2012.7.175

Mathematical Reviews number (MathSciNet)
MR3005737

Zentralblatt MATH identifier
1275.62067

Subjects
Primary: 60G20: Generalized stochastic processes 62H25: Factor analysis and principal components; correspondence analysis 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11]
Secondary: 62M07: Non-Markovian processes: hypothesis testing 62M09: Non-Markovian processes: estimation 62M05: Markov processes: estimation 62M02: Markov processes: hypothesis testing

Keywords
nonstationary time series analysis nonstationary data analysis clustering finite element method

Citation

Metzner, Philipp; Putzig, Lars; Horenko, Illia. Analysis of persistent nonstationary time series and applications. Commun. Appl. Math. Comput. Sci. 7 (2012), no. 2, 175--229. doi:10.2140/camcos.2012.7.175. https://projecteuclid.org/euclid.camcos/1513732056


Export citation

References

  • H. Akaike, A new look at the statistical model identification, IEEE Trans. Automat. Control 19 (1974), no. 6, 716–723.
  • A. N. Akansu and R. A. Haddad, Multiresolution signal decomposition: transforms, subbands, and wavelets, Academic Press, Boston, 1992.
  • L. E. Baum, An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities, III (O. Shisha, ed.), Academic Press, New York, 1972, pp. 1–8.
  • L. E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, Ann. Math. Stat. 37 (1966), no. 6, 1554–1563.
  • C. M. Bishop, Neural networks for pattern recognition, Clarendon, Oxford, 1995. http://www.ams.org/mathscinet-getitem?mr=97m:68172MR 97m:68172
  • F. Black, Studies of stock price volatility changes, Proceedings of the 1976 Meetings of the American Statistical Association, Business and Economics Statistics Section, American Statistical Association, Washington, DC, 1976, pp. 177–181.
  • T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31 (1986), no. 3, 307–327.
  • D. Braess, Finite elements: theory, fast solvers, and applications in solid mechanics, 2nd ed., Cambridge University Press, Cambridge, 2001.
  • P. Brémaud, Markov chains: Gibbs fields, Monte Carlo simulation, and queues, Texts in Applied Mathematics, no. 31, Springer, New York, 1999.
  • U. Çelikyurt and S. Özekici, Multiperiod portfolio optimization models in stochastic markets using the mean-variance approach, Eur. J. Oper. Res. 179 (2007), no. 1, 186–202.
  • M. Dellnitz and O. Junge, On the approximation of complicated dynamical behavior, SIAM J. Numer. Anal. 36 (1999), no. 2, 491–515.
  • A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. Ser. B 39 (1977), no. 1, 1–38.
  • P. Deuflhard and C. Schütte, Molecular conformation dynamics and computational drug design, Applied mathematics entering the 21st century (J. M. Hill and R. Moore, eds.), SIAM, Philadelphia, 2004, pp. 91–119.
  • P. Deuflhard and M. Weber, Robust Perron cluster analysis in conformation dynamics, Linear Algebra Appl. 398 (2005), 161–184.
  • R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, Econometrica 50 (1982), no. 4, 987–1007. http://www.emis.de/cgi-bin/MATH-item?0491.62099Zbl 0491.62099
  • E. F. Fama, The behavior of stock-market prices, J. Bus. 38 (1965), no. 1, 34–105.
  • A. Fischer, S. Waldhausen, I. Horenko, E. Meerbach, and C. Schütte, Identification of biomolecular conformations from incomplete torsion angle observations by hidden Markov models, J. Comput. Chem. 28 (2007), no. 15, 2453–2464.
  • C. Franzke, D. Crommelin, A. Fischer, and A. J. Majda, A hidden Markov model perspective on regimes and metastability in atmospheric flows, J. Climate 21 (2008), no. 8, 1740–1757.
  • T. Gasser and H.-G. Müller, Kernel estimation of regression functions, Smoothing techniques for curve estimation (T. Gasser and M. Rosenblatt, eds.), Lecture Notes in Math., no. 757, Springer, Berlin, 1979, pp. 23–68.
  • ––––, Estimating regression functions and their derivatives by the kernel method, Scand. J. Stat. 11 (1984), no. 3, 171–185.
  • A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian data analysis, 2nd ed., Chapman & Hall/CRC, Boca Raton, FL, 2004.
  • D. Giannakis and A. J. Majda, Quantifying the predictive skill in long-range forecasting, I: Coarse-grained predictions in a simple ocean model, J. Climate 25 (2011), 1793–1813.
  • ––––, Quantifying the predictive skill in long-range forecasting, II: Model error in coarse-grained Markov models with application to ocean-circulation regimes, J. Climate 25 (2011), 1814–1826.
  • D. Giannakis, A. J. Majda, and I. Horenko, Information theory, model error, and predictive skill of stochastic models for complex nonlinear systems, preprint, CIMS/NYU and University of Lugano, 2011, Submitted to Physica D.
  • A. L. Gibbs and F. E. Su, On choosing and bounding probability metrics, Int. Stat. Rev. 70 (2002), no. 3, 419–435.
  • J. Hadamard, Sur les problèmes aux dérivées partielles et leur signification physique, Princeton Univ. Bull. 13 (1902), 49–52.
  • J. D. Hamilton, A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica 57 (1989), no. 2, 357–384.
  • A. Hoerl, Application of ridge analysis to regression problems, Chem. Eng. Prog. 58 (1962), no. 3, 54–59.
  • I. Horenko, On simultaneous data-based dimension reduction and hidden phase identification, J. Atmos. Sci. 65 (2008), no. 6, 1941–1954.
  • ––––, On robust estimation of low-frequency variability trends in discrete Markovian sequences of Atmospherical Circulation Patterns, J. Atmos. Sci. 66 (2009), no. 7, 2059–2072.
  • ––––, Finite element approach to clustering of multidimensional time series, SIAM J. Sci. Comput. 32 (2010), no. 1, 62–83.
  • ––––, On clustering of non-stationary meteorological time series, Dyn. of Atmos. and Oceans 49 (2010), no. 2-3, 164–187.
  • ––––, On identification of non-stationary factor models and its application to atmospherical data analysis, J. Atmos. Sci. 67 (2010), no. 5, 1559–1574.
  • ––––, Nonstationarity in multifactor models of discrete jump processes, memory and application to cloud modeling, J. Atmos. Sci. 68 (2011), no. 7, 1493–1506.
  • ––––, On analysis of nonstationary categorical data time series: dynamical dimension reduction, model selection, and applications to computational sociology, Multiscale Model. Simul. 9 (2011), no. 4, 1700–1726.
  • I. Horenko, E. Dittmer, A. Fischer, and C. Schütte, Automated model reduction for complex systems exhibiting metastability, Multiscale Model. Simul. 5 (2006), no. 3, 802–827.
  • I. Horenko, E. Dittmer, and C. Schütte, Reduced stochastic models for complex molecular systems, Comput. Vis. Sci. 9 (2006), no. 2, 89–102.
  • I. Horenko, S. Dolaptchiev, A. Eliseev, I. Mokhov, and R. Klein, Metastable decomposition of high-dimensional meteorological data with gaps, J. Atmos. Sci. 65 (2008), no. 11, 3479–3496.
  • I. Horenko, R. Klein, S. Dolaptchiev, and C. Schütte, Automated generation of reduced stochastic weather models, I: Simultaneous dimension and model reduction for time series analysis, Multiscale Model. Simul. 6 (2007), no. 4, 1125–1145.
  • I. Horenko, J. Schmidt-Ehrenberg, and C. Schütte, Set-oriented dimension reduction: localizing principal component analysis via hidden Markov models, Computational life sciences II (M. R. Berthold, R. Glen, and I. Fischer, eds.), Lecture Notes in Comput. Sci., no. 4216, Springer, Berlin, 2006, pp. 74–85.
  • I. Horenko and C. Schütte, Likelihood-based estimation of multidimensional Langevin models and its application to biomolecular dynamics, Multiscale Model. Simul. 7 (2008), no. 2, 731–773.
  • W. Huisinga, Metastability of Markovian systems: a transfer operator approach in application to molecular dynamics, Ph.D. thesis, Free University Berlin, 2001.
  • E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev. $(2)$ 106 (1957), 620–630.
  • ––––, Information theory and statistical mechanics, II, Phys. Rev. $(2)$ 108 (1957), 171–190.
  • R. Kalman, A new approach to linear filtering and prediction problems, J. Basic Eng. 82 (1960), no. 1, 35–45.
  • J. N. Kapur, Maximum-entropy models in science and engineering, Wiley, New York, 1989.
  • Koninklijk Nederlands Meteorologisch Instituut, \KNMII.
  • H.-M. Krolzig, Predicting Markov-switching vector autoregressive processes, preprint 2000-W31, University of Oxford, 2000.
  • D. Kulp, D. Haussler, M. G. Reese, and F. H. Eeckman, A generalized hidden Markov model for the recognition of human genes in DNA, Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (D. J. States, P. Agarwal, T. Gaasterland, L. Hunter, and R. Smith, eds.), AAAI Press, Menlo Park, CA, 1996, pp. 134–142.
  • C. L. Lawson and R. J. Hanson, Solving least squares problems, Prentice-Hall, Englewood Cliffs, NJ, 1974.
  • C. Lee and M.-H. Yu, Protein folding and disease, J. Biochem. Molec. Biol. 38 (2005), no. 3, 275–280.
  • C. Loader, Local regression and likelihood, Springer, New York, 1999.
  • J. B. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, I: Statistics (L. M. Le Cam and J. Neyman, eds.), University of California Press, Berkeley, CA, 1967, pp. 281–297.
  • A. J. Majda, C. L. Franzke, A. Fischer, and D. T. Crommelin, Distinct metastable atmospheric regimes despite nearly Gaussian statistics: a paradigm model, Proc. Natl. Acad. Sci. USA 103 (2006), no. 22, 8309–8314.
  • A. J. Majda and X. Wang, Non-linear dynamics and statistical theories for basic geophysical flows, Cambridge University Press, Cambridge, 2006.
  • G. McLachlan and D. Peel, Finite mixture models, Wiley, New York, 2000.
  • P. Metzner, M. Weber, and C. Schütte, Observation uncertainty in reversible Markov chains, Phys. Rev. E $(3)$ 82 (2010), no. 3, Paper #031114.
  • J.-J. Moreau, P. D. Panagiotopoulos, and G. Strang (eds.), Topics in nonsmooth mechanics, Birkhäuser, Basel, 1988.
  • National Center for Biotechnology Information, Saccharomyces cerevisiae chromosome I, complete sequence, inbib\NCBII.
  • R. Preis, M. Dellnitz, M. Hessel, C. Schütte, and E. Meerbach, Dominant paths between almost invariant sets of dynamical systems, preprint 154, Deusche Forschungsgemeinschaft Schwerpunktprogramm, 2004.
  • W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical recipes: the art of scientific computing, 3rd ed., Cambridge University Press, Cambridge, 2007.
  • J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte, and F. Noé, Markov models of molecular kinetics: generation and validation, J. Chem. Phys. 134 (2011), no. 17, Paper #174105.
  • L. Putzig, D. Becherer, and I. Horenko, Optimal allocation of a futures portfolio utilizing numerical market phase detection, SIAM J. Financial Math. 1 (2010), 752–779.
  • L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (1989), no. 2, 257–286.
  • C. Schütte, Conformational dynamics: modelling, theory, algorithm, and application to biomolecules, preprint SC 99-18, Konrad-Zuse-Zentrum für Informationstechnik, Berlin, 1999.
  • C. Schütte, A. Fischer, W. Huisinga, and P. Deuflhard, A direct approach to conformational dynamics based on hybrid Monte Carlo, J. Comput. Phys. 151 (1999), no. 1, 146–168. http://www.ams.org/mathscinet-getitem?mr=2000d:92004MR 2000d:92004
  • C. Schütte and W. Huisinga, Biomolecular conformations can be identified as metastable sets of molecular dynamics, Handbook of numerical analysis, 10: Computational chemistry (C. Le Bris, ed.), North-Holland, Amsterdam, 2003, pp. 699–744.
  • C. Schütte, F. Noé, J. Lu, M. Sarich, and E. Vanden-Eijnden, Markov state models based on milestoning, J. Chem. Phys. 134 (2011), no. 20, Paper #204105.
  • C. Schütte, F. Noé, E. Meerbach, P. Metzner, and C. Hartmann, Conformation dynamics, ICIAM 07: 6th International Congress on Industrial and Applied Mathematics (R. Jeltsch and G. Wanner, eds.), European Mathematical Society, Zürich, 2009, pp. 297–335.
  • F. Takens, Detecting strange attractors in turbulence, Dynamical systems and turbulence, Warwick 1980 (D. A. Rand and L.-S. Young, eds.), Lecture Notes in Math., no. 898, Springer, Berlin, 1981, pp. 366–381.
  • A. N. Tikhonov, On the stability of inverse problems, Dokl. Akad. Nauk SSSR 39 (1943), no. 5, 195–198, In Russian; translated in C. R. $($Doklady$)$ Acad. Sci. URSS $($N.S.$)$ 39 (1943), no. 5, 176–179.
  • K. E. Trenberth, The definition of El Niño, Bull. Amer. Meteorol. Soc. 78 (1997), no. 12, 2771–2777.
  • U.S. Energy Information Administration, NYMEX futures prices, inbibhttp://tonto.eia.doe.gov/dnav/pet/pet_pri_fut_s1_d.htm.
  • G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, no. 59, SIAM, Philadelphia, 1990.
  • Yahoo Finance, Dow Jones industrial average: historical prices, inbibhttp://finance.yahoo.com/q/hp?s=^DJI+Historical+Prices.
  • A. Zellner and R. A. Highfield, Calculation of maximum entropy distributions and approximation of marginal posterior distributions, J. Econometrics 37 (1988), no. 2, 195–209.