The Annals of Applied Statistics

An MDL approach to the climate segmentation problem

QiQi Lu, Robert Lund, and Thomas C. M. Lee

Full-text: Open access


This paper proposes an information theory approach to estimate the number of changepoints and their locations in a climatic time series. A model is introduced that has an unknown number of changepoints and allows for series autocorrelations, periodic dynamics, and a mean shift at each changepoint time. An objective function gauging the number of changepoints and their locations, based on a minimum description length (MDL) information criterion, is derived. A genetic algorithm is then developed to optimize the objective function. The methods are applied in the analysis of a century of monthly temperatures from Tuscaloosa, Alabama.

Article information

Ann. Appl. Stat. Volume 4, Number 1 (2010), 299-319.

First available in Project Euclid: 11 May 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Changepoints genetic algorithm level shifts minimum description length periodic autoregression time series


Lu, QiQi; Lund, Robert; Lee, Thomas C. M. An MDL approach to the climate segmentation problem. Ann. Appl. Stat. 4 (2010), no. 1, 299--319. doi:10.1214/09-AOAS289.

Export citation


  • Alba, E. and Troya, J. M. (1999). A survey of parallel-distributed genetic algorithms. Complexity 4 31–52.
  • Beasley, D., Bull, D. R. and Martin, R. R. (1993). An overview of genetic algorithm: Part 1, fundamentals. University Computing 15 58–69.
  • Berkes, I., Horvath, L., Kokoszka, P. and Shao, Q. M. (2006). On discriminating between long-range dependence and changes in mean. Ann. Statist. 34 1140–1165.
  • Braun, J. V. and Müller, H. G. (1998). Statistical methods for DNA sequence segmentation. Statist. Sci. 13 142–162.
  • Caussinus, H. and Mestre, O. (2004). Detection and correction of artificial shifts in climate series. J. Roy. Statist. Soc. Ser. C 53 405–425.
  • Chen, J. and Gupta, A. K. (1997). Testing and locating variance changepoints with application to stock prices. J. Amer. Statist. Assoc. 92 739–747.
  • Cochrane, D. and Orcutt, G. H. (1949). Application of least squares regression to relationships containing autocorrelated error terms. J. Amer. Statist. Assoc. 44 32–61.
  • Davis, L. (1991). Handbook of Genetic Algorithm. Reinhold, New York.
  • Davis, R. A., Lee, T. C. M. and Rodriguez-Yam, G. A. (2006). Structural break estimation for nonstationary time series models. J. Amer. Statist. Assoc. 101 223–239.
  • Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statist. Comput. 16 203–213.
  • Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA.
  • Handcock, M. S. and Wallis, J. R. (1994). An approach to statistical spatial-temporal modeling of meteorological fields. J. Amer. Statist. Assoc. 89 368–378.
  • Hansen, M. H. and Yu, B. (2001). Model selection and the principle of minimum description length. J. Amer. Statist. Assoc. 96 746–774.
  • Holland, J. H. (1975). Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA.
  • Inclan, C. and Tiao, G. C. (1994). Use of cumulative sums of squares for retrospective detection of changes of variance. J. Amer. Statist. Assoc. 89 913–923.
  • Lee, T. C. M. (2000). A minimum description length based image segmentation procedure, and its comparison with a cross-validation based segmentation procedure. J. Amer. Statist. Assoc. 95 259–270.
  • Lee, T. C. M. (2001). An introduction to coding theory and the two-part minimum description length principle. International Statistical Review 69 169–183.
  • Lee, T. C. M. (2002). Automatic smoothing for discontinuous regression functions. Statist. Sinica 12 823–842.
  • Lu, Q. and Lund, R. B. (2007). Simple linear regression with multiple level shifts. Canad. J. Statist. 37 447–458.
  • Lund, R. B., Shao, Q. and Basawa, I. V. (2005). Parsimonious periodic time series modeling. Aust. N. Z. J. Statist. 48 33–47.
  • Lund, R. B., Wang, X. L., Lu, Q., Reeves, J., Gallagher, C. and Feng, Y. (2007). Changepoint detection in periodic and autocorrelated time series. Journal of Climate 20 5178–5190.
  • Mitchell, J. M., Jr. (1953). On the causes of instrumentally observed secular temperature trends. Journal of Applied Meteorology 10 244–261.
  • Menne, J. M. and Williams, C. N., Jr. (2005). Detection of undocumented changepoints using multiple test statistics and composite reference series. Journal of Climate 18 4271–4286.
  • Menne, J. M. and Williams, C. N., Jr. (2009). Homogenization of temperature series via pairwise comparisons. Journal of Climate 22 1700–1717.
  • Pagano, M. (1978). On periodic and multiple autoregressions. Ann. Statist. 6 1310–1317.
  • Reeves, C. (1993). Modern Heuristic Techniques for Combinatorial Problems. Wiley, New York.
  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.
  • Rissanen, J. (2007). Information and Complexity in Statistical Modeling, Springer, New York.
  • Shao, Q. and Lund, R. B. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. J. Time Ser. Anal. 25 359–372.
  • Vincent, L. A. (1998). A technique for the identification of inhomogeneities in Canadian temperature series. Journal of Climate 11 1094–1104.