The Annals of Applied Statistics

Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data

Dominique-Laurent Couturier and Maria-Pia Victoria-Feser

Full-text: Open access

Abstract

Extreme value data with a high clump-at-zero occur in many domains. Moreover, it might happen that the observed data are either truncated below a given threshold and/or might not be reliable enough below that threshold because of the recording devices. These situations occur, in particular, with radio audience data measured using personal meters that record environmental noise every minute, that is then matched to one of the several radio programs. There are therefore genuine zeros for respondents not listening to the radio, but also zeros corresponding to real listeners for whom the match between the recorded noise and the radio program could not be achieved. Since radio audiences are important for radio broadcasters in order, for example, to determine advertisement price policies, possibly according to the type of audience at different time points, it is essential to be able to explain not only the probability of listening to a radio but also the average time spent listening to the radio by means of the characteristics of the listeners. In this paper we propose a generalized linear model for zero-inflated truncated Pareto distribution (ZITPo) that we use to fit audience radio data. Because it is based on the generalized Pareto distribution, the ZITPo model has nice properties such as model invariance to the choice of the threshold and from which a natural residual measure can be derived to assess the model fit to the data. From a general formulation of the most popular models for zero-inflated data, we derive our model by considering successively the truncated case, the generalized Pareto distribution and then the inclusion of covariates to explain the nonzero proportion of listeners and their average listening time. By means of simulations, we study the performance of the maximum likelihood estimator (and derived inference) and use the model to fully analyze the audience data of a radio station in a certain area of Switzerland.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 4 (2010), 1824-1846.

Dates
First available in Project Euclid: 4 January 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1294167800

Digital Object Identifier
doi:10.1214/10-AOAS358

Mathematical Reviews number (MathSciNet)
MR2829937

Zentralblatt MATH identifier
1220.62168

Keywords
Extreme values logistic regression generalized linear models residual analysis model fit

Citation

Couturier, Dominique-Laurent; Victoria-Feser, Maria-Pia. Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data. Ann. Appl. Stat. 4 (2010), no. 4, 1824--1846. doi:10.1214/10-AOAS358. https://projecteuclid.org/euclid.aoas/1294167800


Export citation

References

  • Aitkin, M. and Clayton, D. (1980). The fitting of exponential, Weibull and extreme value distributions to complex censored survival data using GLIM. Appl. Statist. 29 156–163.
  • Beirlant, J., Vynckier, P. and Teugels, J. L. (1996). Tail index estimation, Pareto quantile plots, and regression diagnostics. J. Amer. Statist. Assoc. 91 1659–1667.
  • Castillo, E. and Hadi, A. S. (1997). Fitting the generalized Pareto distribution to data. J. Amer. Statist. Assoc. 92 1609–1620.
  • Chapados, N., Bengio, Y., Vincent, V., Ghosn, J., Dugas, C., Takeuchi, I. and Meng, L. (2002). Estimating car insurance premia: A case study in high-dimensional data inference. Advances in Neural Information Processing 14 1369–1376.
  • Chavez-Demoulin, V. and Davison, A. C. (2005). Generalized additive modelling of sample extremes. J. Roy. Statist. Soc. Ser. C 54 207–222.
  • Chen, Y., Jiang, Y. and Mao, Y. (2007). Hospital admissions associated with body mass index in Canadian adults. International Journal of Obesity 31 962–967.
  • Christmann, A. (2004). An approach to model complex high-dimensional insurance data. Allgemeines Statistisches Archiv 88 375–397.
  • Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.
  • Collett, D. (2003). Modelling Binary Data. Chapman and Hall, Boca Raton.
  • Couturier, D.-L. and Victoria-Feser, M.-P. (2010). Supplement to “Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data.” DOI: 10.1214/10-AOAS358SUPP.
  • Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds (with comments). J. Roy. Statist. Soc. Ser. B 52 393–442.
  • Duan, N., Manning, W. G., Morris, C. N. and Newhouse, J. P. (1983). A comparison of alternative models for the demand for medical care. J. Bus. Econom. Statist. 1 115–126.
  • Dupuis, D. J. and Tsao, M. (1998). A hybrid estimator for generalized Pareto and extreme-value distributions. Commun. Statist. Theory and Methods 27 925–941.
  • Dupuis, D. J. and Victoria-Feser, M.-P. (2006). A robust prediction error criterion for Pareto modelling of upper tails. Can. J. Statist. 34 639–658.
  • Dähler, M. (2006). Vom Fragen zum Messen. Entwicklung und Einführung von Radiocontrol—einem neuen Hörerforschungsinstrument—in der Schweiz. Ph.D. thesis, Faculty of Human Sciences, Univ. Bern. Available at http://www.stub.unibe.ch/download/eldiss/05daehler_m.pdf.
  • Hall, P. and Welsh, A. H. (1985). Adaptive estimates of parameters of regular variation. Ann. Statist. 13 330–341.
  • Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Ann. Statist. 3 1163–1174.
  • Hosking, J. R. M. and Wallis, J. R. (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29 339–349.
  • Juárez, S. F. and Schucany, W. R. (2004). Robust and efficient estimation for the generalized Pareto distribution. Extremes 7 237–251.
  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34 1–14.
  • Min, Y. and Agresti, A. (2002). Modeling nonnegative data with clumping at zero: A survey. Journal of the Iranian Statistical Society 1 7–33.
  • Min, Y. and Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statist. Modell. 5 1–19.
  • Mullahy, J. (1986). Specification and testing of some modified count data models. J. Econometrics 33 341–365.
  • Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370–384.
  • Peng, L. and Welsh, A. H. (2001). Robust estimation of the generalized Pareto distribution. Extremes 4 53–65.
  • Pickands, J. (1975). Statistical inference using extreme order statistics. Ann. Statist. 3 119–131.
  • Ridout, M., Demétrio, C. G. and Hinde, J. (1998). Models for count data with many zeros. In International Biometric Conference 179–192. International Biometric Conference, Cope Town.
  • Singh, V. P. and Ahmad, M. (2004). A comparative evaluation of the estimators of the three-parameter generalized Pareto distribution. J. Statist. Comput. Simul. 74 91–106.
  • Webster, J. G., Phalen, P. F. and Lichty, L. W. (2006). Ratings Analysis. Lawrence Erlbaum Associates, Inc, Publishers, Mahwah, NJ.
  • Weglarczyk, S., Strupczewski, W. G. and Singh, V. P. (2005). Three-parameter discontinuous distributions for hydrological samples with zero values. Hydrological Processes 19 2899–2914.
  • Welsh, A. H., Cunningham, R. B., Donnelly, C. F. and Lindenmayer, D. B. (1996). Modelling the abundance of rare species: Statistical models for counts with extra zeros. Ecological Modelling 88 297–308.

Supplemental materials

  • Supplementary material: Radio data set and R Code. The file “data_ZITPo.csv” contains the data set analyzed in Section 5. The observations are in rows and the variables in columns. The file “functions_ZITPo.r” contains R functions that allow to fit and analyze ZITPo models. It produces objects of class “zipto.” Usual generic functions are then available for objects of that class. The file “script_ZITPo.r” contains the R Code used to produce the results of Tables 1 and 2 and the plots of Figure 7.