The Annals of Applied Statistics

An MDL approach to the climate segmentation problem

QiQi Lu, Robert Lund, and Thomas C. M. Lee

Full-text: Open access

Abstract

This paper proposes an information theory approach to estimate the number of changepoints and their locations in a climatic time series. A model is introduced that has an unknown number of changepoints and allows for series autocorrelations, periodic dynamics, and a mean shift at each changepoint time. An objective function gauging the number of changepoints and their locations, based on a minimum description length (MDL) information criterion, is derived. A genetic algorithm is then developed to optimize the objective function. The methods are applied in the analysis of a century of monthly temperatures from Tuscaloosa, Alabama.

Article information

Source
Ann. Appl. Stat. Volume 4, Number 1 (2010), 299-319.

Dates
First available in Project Euclid: 11 May 2010

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1273584456

Digital Object Identifier
doi:10.1214/09-AOAS289

Zentralblatt MATH identifier
1189.62180

Mathematical Reviews number (MathSciNet)
MR2758173

Citation

Lu, QiQi; Lund, Robert; Lee, Thomas C. M. An MDL approach to the climate segmentation problem. The Annals of Applied Statistics 4 (2010), no. 1, 299--319. doi:10.1214/09-AOAS289. http://projecteuclid.org/euclid.aoas/1273584456.


Export citation

References

  • Alba, E. and Troya, J. M. (1999). A survey of parallel-distributed genetic algorithms. Complexity 4 31–52.
  • Beasley, D., Bull, D. R. and Martin, R. R. (1993). An overview of genetic algorithm: Part 1, fundamentals. University Computing 15 58–69.
  • Berkes, I., Horvath, L., Kokoszka, P. and Shao, Q. M. (2006). On discriminating between long-range dependence and changes in mean. Ann. Statist. 34 1140–1165.
  • Braun, J. V. and Müller, H. G. (1998). Statistical methods for DNA sequence segmentation. Statist. Sci. 13 142–162.
  • Caussinus, H. and Mestre, O. (2004). Detection and correction of artificial shifts in climate series. J. Roy. Statist. Soc. Ser. C 53 405–425.
  • Chen, J. and Gupta, A. K. (1997). Testing and locating variance changepoints with application to stock prices. J. Amer. Statist. Assoc. 92 739–747.
  • Cochrane, D. and Orcutt, G. H. (1949). Application of least squares regression to relationships containing autocorrelated error terms. J. Amer. Statist. Assoc. 44 32–61.
  • Davis, L. (1991). Handbook of Genetic Algorithm. Reinhold, New York.
  • Davis, R. A., Lee, T. C. M. and Rodriguez-Yam, G. A. (2006). Structural break estimation for nonstationary time series models. J. Amer. Statist. Assoc. 101 223–239.
  • Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statist. Comput. 16 203–213.
  • Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA.
  • Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–378.
  • Hansen, M. H. and Yu, B. (2001). Model selection and the principle of minimum description length. J. Amer. Statist. Assoc. 96 746–774.
  • Holland, J. H. (1975). Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA.
  • Inclan, C. and Tiao, G. C. (1994). Use of cumulative sums of squares for retrospective detection of changes of variance. J. Amer. Statist. Assoc. 89 913–923.
  • Lee, T. C. M. (2000). A minimum description length based image segmentation procedure, and its comparison with a cross-validation based segmentation procedure. J. Amer. Statist. Assoc. 95 259–270.
  • Lee, T. C. M. (2001). An introduction to coding theory and the two-part minimum description length principle. International Statistical Review 69 169–183.
  • Lee, T. C. M. (2002). Automatic smoothing for discontinuous regression functions. Statist. Sinica 12 823–842.
  • Lu, Q. and Lund, R. B. (2007). Simple linear regression with multiple level shifts. Canad. J. Statist. 37 447–458.
  • Lund, R. B., Shao, Q. and Basawa, I. V. (2005). Parsimonious periodic time series modeling. Aust. N. Z. J. Statist. 48 33–47.
  • Lund, R. B., Wang, X. L., Lu, Q., Reeves, J., Gallagher, C. and Feng, Y. (2007). Changepoint detection in periodic and autocorrelated time series. Journal of Climate 20 5178–5190.
  • Mitchell, J. M., Jr. (1953). On the causes of instrumentally observed secular temperature trends. Journal of Applied Meteorology 10 244–261.
  • Menne, J. M. and Williams, C. N., Jr. (2005). Detection of undocumented changepoints using multiple test statistics and composite reference series. Journal of Climate 18 4271–4286.
  • Menne, J. M. and Williams, C. N., Jr. (2009). Homogenization of temperature series via pairwise comparisons. Journal of Climate 22 1700–1717.
  • Pagano, M. (1978). On periodic and multiple autoregressions. Ann. Statist. 6 1310–1317.
  • Reeves, C. (1993). Modern Heuristic Techniques for Combinatorial Problems. Wiley, New York.
  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.
  • Rissanen, J. (2007). Information and Complexity in Statistical Modeling, Springer, New York.
  • Shao, Q. and Lund, R. B. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. J. Time Ser. Anal. 25 359–372.
  • Vincent, L. A. (1998). A technique for the identification of inhomogeneities in Canadian temperature series. Journal of Climate 11 1094–1104.