## Communications in Applied Mathematics and Computational Science

### On inference of statistical regression models for extreme events based on incomplete observation data

#### Abstract

We present a computationally efficient, semiparametric, nonstationary framework for statistical regression analysis of extremes with systematically missing covariates based on the generalized extreme value (GEV) distribution. It is shown that the involved regression model becomes nonstationary if some of the relevant model covariates are systematically missing. The resulting nonstationarity and the ill-posedness of the inverse problem are resolved by deploying the recently introduced finite-element time-series analysis methodology with bounded variation of model parameters (FEM-BV). The proposed FEM-BV-GEV approach allows a well-posed problem formulation and goes beyond probabilistic a priori assumptions of methods for analysis of extremes based on, e.g., nonstationary Bayesian mixture models, smoothing kernel methods or neural networks. FEM-BV-GEV determines the significant resolved covariates, reveals directly their influence on the trend behavior in probabilities of extremes and reflects the implicit impact of missing covariates. We compare the FEM-BV-GEV approach to the state-of-the-art GEV-CDN methodology (based on artificial neural networks) on test cases and real data according to four criteria: (1) information content of the models, (2) robustness with respect to the systematically missing information, (3) computational complexity and (4) interpretability of the models.

#### Article information

Source
Commun. Appl. Math. Comput. Sci., Volume 9, Number 1 (2014), 143-174.

Dates
Revised: 28 November 2013
Accepted: 31 March 2014
First available in Project Euclid: 20 December 2017

https://projecteuclid.org/euclid.camcos/1513732107

Digital Object Identifier
doi:10.2140/camcos.2014.9.143

Mathematical Reviews number (MathSciNet)
MR3212869

Zentralblatt MATH identifier
1328.62209

#### Citation

Kaiser, Olga; Horenko, Illia. On inference of statistical regression models for extreme events based on incomplete observation data. Commun. Appl. Math. Comput. Sci. 9 (2014), no. 1, 143--174. doi:10.2140/camcos.2014.9.143. https://projecteuclid.org/euclid.camcos/1513732107

#### References

• B. Betrò, A. Bodini, and Q. A. Cossu, Using a hidden Markov model to analyse extreme rainfall events in central-east Sardinia, Environmetrics 19 (2008), no. 7, 702–713.
• ––––, Regional-scale analysis of extreme rainfalls via HMM, presented at the V International Workshop on Spatio-Temporal Modelling, 2010.
• M.-O. Boldi and A. C. Davison, A mixture model for multivariate extremes, J. R. Stat. Soc., Ser. B, Stat. Methodol. 69 (2007), no. 2, 217–229.
• S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng (eds.), Handbook of Markov chain Monte Carlo, CRC Press, Boca Raton, FL, 2011.
• K. P. Burnham and D. R. Anderson, Model selection and multimodel inference: a practical information-theoretic approach, 2nd ed., Springer, New York, 2002.
• A. C. Cameron and P. K. Trivedi, Regression analysis of count data, 2nd ed., Econometric Society Monographs, no. 53, Cambridge University Press, 2013.
• E. J. Candès, J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59 (2006), no. 8, 1207–1223.
• A. J. Cannon, A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology, Hydrol. Process. 24 (2010), no. 24, 673–685.
• ––––, GEVcdn: an R package for nonstationary extreme value analysis by generalized extreme value conditional density estimation network, Comput. Geosci. 37 (2011), 1532–1533.
• V. Chavez-Demoulin and A. C. Davison, Generalized additive modelling of sample extremes, J. R. Stat. Soc., Ser. C, Appl. Stat. 54 (2005), no. 1, 207–222.
• A. J. Chorin and O. H. Hald, Stochastic tools in mathematics and science, Surveys and Tutorials in the Applied Mathematical Sciences, no. 1, Springer, New York, 2006.
• B. Clarke, E. Fokoué, and H. H. Zhang, Principles and theory for data mining and machine learning, Springer, Dordrecht, 2009.
• S. G. Coles, An introduction to statistical modeling of extreme values, Springer, London, 2001.
• S. G. Coles and M. J. Dixon, Likelihood-based inference for extreme value models, Extremes 2 (1999), no. 1, 5–23.
• S. G. Coles and E. A. Powell, Bayesian methods in extreme value modelling: a review and new developments, Int. Stat. Rev. 64 (1996), no. 1, 119–136.
• A. C. Davison and N. I. Ramesh, Local likelihood smoothing of sample extremes, J. R. Stat. Soc., Ser. B, Stat. Methodol. 62 (2000), no. 1, 191–208.
• L. de Haan and A. Ferreira, Extreme value theory: an introduction, Springer, New York, 2006.
• J. de Wiljes, A. Majda, and I. Horenko, An adaptive Markov chain Monte Carlo approach to time series clustering of processes with regime transition behavior, Multiscale Model. Simul. 11 (2013), no. 2, 415–441.
• P. Embrechts, C. Klüppelberg, and T. Mikosch, Modelling extremal events for insurance and finance, 8th ed., Applications of Mathematics, no. 33, Springer, Berlin, 1997.
• C. Fr öhlich, Observations of irradiance variations, Space Sci. Rev. 94 (2000), no. 1–2, 15–24.
• ––––, Solar irradiance variability since 1978: revision of the PMOD composite during solar cycle 21, Space Sci. Rev. 125 (2006), no. 1–4, 53–65.
• J. Hadamard, Sur les problèmes aux dérivées partielles et leur signification physique, Princeton University Bulletin 13 (1902), 49–52.
• S. Häkkinen, P. B. Rhines, and D. L. Worthen, Atmospheric blocking and Atlantic multidecadal ocean variability, Science 334 (2011), no. 6056, 665–659.
• T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd ed., Springer, New York, 2009.
• I. Horenko, Finite element approach to clustering of multidimensional time series, SIAM J. Sci. Comput. 32 (2010), no. 1, 62–83.
• ––––, On identification of nonstationary factor models and their application to atmospherical data analysis, J. Atmos. Sci. 67 (2010), no. 5, 1559–1574.
• ––––, Nonstationarity in multifactor models of discrete jump processes, memory and application to cloud modeling, J. Atmos. Sci. 68 (2011), no. 7, 1493–1506.
• ––––, On analysis of nonstationary categorical data time series: dynamical dimension reduction, model selection, and applications to computational sociology, Multiscale Model. Simul. 9 (2011), no. 4, 1700–1726.
• I. Horenko and C. Schütte, On metastable conformational analysis of nonequilibrium biomolecular time series, Multiscale Model. Simul. 8 (2010), no. 2, 701–716.
• C. M. Hurvich and C.-L. Tsai, Regression and time series model selection in small samples, Biometrika 76 (1989), no. 2, 297–307.
• R. P. Kane and E. R. de Paula, Atmospheric $\mathrm{CO}_2$ changes at Mahuna Loa, Hawaii, J. Atmos. Terr. Phys. 58 (1996), no. 15, 1673–1681.
• A. M. G. Klein Tank, J. B. Wijngaard, G. P. K önnen, R. B öhm, G. Demarée, A. Gocheva, M. Mileta, S. Pashiardis, L. Hejkrlik, C. Kern-Hansen, R. Heino, P. Bessemoulin, G. Müller-Westermeier, M. Tzanakou, S. Szalai, T. Pálsdóttir, D. Fitzgerald, S. Rubin, M. Capaldo, M. Maugeri, A. Leitass, A. Bukantis, R. Aberfeld, A. F. V. van Engelen, E. Forland, M. Mietus, F. Coelho, C. Mares, V. Razuvaev, E. Nieplova, T. Cegnar, J. Antonio López, B. Dahlstr öm, A. Moberg, W. Kirchhofer, A. Ceylan, O. Pachaliuk, L. V. Alexander, and P. Petrovic, Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment, Int. J. Climatol. 22 (2002), no. 12, 1441–1453.
• W. Kozek, Optimally Karhunen–Loeve-like STFT expansion of nonstationary processes, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, IEEE, Piscataway, NJ, 1993, pp. 428–431.
• F. Liang, C. Liu, and R. J. Carroll, Advanced Markov chain Monte Carlo methods: learning from past samples, Wiley, Chichester, UK, 2010.
• A. R. Lima, A. J. Cannon, and W. W. Hsieh, Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy, Comput. Geosci. 50 (2013), 136–144.
• J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung, Math. Z. 15 (1922), no. 1, 211–225. \xxJFM48.0602.04
• M. Loève, Probability theory, II, 4th ed., Graduate Texts in Mathematics, no. 46, Springer, New York, 1978.
• A. MacDonald, C. J. Scarrott, D. Lee, B. Darlow, M. Reale, and G. Russell, A flexible extreme value mixture model, Comput. Statist. Data Anal. 55 (2011), no. 6, 2137–2157.
• A. J. Majda, R. V. Abramov, and M. J. Grote, Information theory and stochastics for multiscale nonlinear systems, CRM Monograph Series, no. 25, American Mathematical Society, Providence, RI, 2005.
• E. Meerbach, E. Dittmer, I. Horenko, and C. Schütte, Multiscale modelling in molecular dynamics: biomolecular conformations as metastable states, Computer simulations in condensed matter systems: from materials to chemical biology, I (M. Ferrario, G. Ciccotti, and K. Binder, eds.), Lecture Notes in Physics, no. 703, Springer, Berlin, 2006, pp. 495–517.
• P. Metzner, L. Putzig, and I. Horenko, Analysis of persistent nonstationary time series and applications, Commun. Appl. Math. Comput. Sci. 7 (2012), no. 2, 175–229.
• S. E. Neville, M. J. Palmer, and M. P. Wand, Generalized extreme value additive model analysis via mean field variational Bayes, Aust. N. Z. J. Stat. 53 (2011), no. 3, 305–330.
• R. T. Pierrehumbert, Energy balance models, 2001 Program in Geophysical Fluid Dynamics (J.-L. Thiffeault, ed.), Woods Hole Oceanographic Institution, Woods Holes, MA, 2001, pp. 72–87.
• R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol. 58 (1996), no. 1, 267–288.
• A. N. Tikhonov, On the solution of ill-posed problems and the method of regularization, Dokl. Akad. Nauk SSSR 151 (1963), 501–504, In Russian; translated in Sov. Math., Dokl. 4 (1963), 1035–1038.
• K. E. Trenberth, iño}}, B. Am. Meteorol. Soc. 78 (1997), 2771–2777.
• V. N. Vapnik, The nature of statistical learning theory, Springer, New York, 1995.
• G. Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, no. 59, SIAM, Philadelphia, PA, 1990.