Annals of Applied Statistics

Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption

Xia Wang and Dipak K. Dey

Full-text: Open access


In the information system research, a question of particular interest is to interpret and to predict the probability of a firm to adopt a new technology such that market promotions are targeted to only those firms that were more likely to adopt the technology. Typically, there exists significant difference between the observed number of “adopters” and “nonadopters,” which is usually coded as binary response. A critical issue involved in modeling such binary response data is the appropriate choice of link functions in a regression model. In this paper we introduce a new flexible skewed link function for modeling binary response data based on the generalized extreme value (GEV) distribution. We show how the proposed GEV links provide more flexible and improved skewed link regression models than the existing skewed links, especially when dealing with imbalance between the observed number of 0’s and 1’s in a data. The flexibility of the proposed model is illustrated through simulated data sets and a billing data set of the electronic payments system adoption from a Fortune 100 company in 2005.

Article information

Ann. Appl. Stat., Volume 4, Number 4 (2010), 2000-2023.

First available in Project Euclid: 4 January 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Generalized extreme value distribution latent variable Markov chain Monte Carlo posterior distribution skewness


Wang, Xia; Dey, Dipak K. Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption. Ann. Appl. Stat. 4 (2010), no. 4, 2000--2023. doi:10.1214/10-AOAS354.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Aranda-Ordaz, F. J. (1981). On two families of transformations to additivity for binary response data. Biometrika 68 357–364.
  • Arnold, B. and Groeneveld, R. (1995). Measuring skewness with respect to the mode. Amer. Statist. 49 34–38.
  • Bapna, R., Goes, P., Wei, K. K. and Zhang, Z. (2010). A finite mixture logit model to segment and predict electronic payments system adoption. Information Systems Research DOI: 10.1287/isre.1090.0277.
  • Chakravorti, S. and Davis, E. (2004). An electronic supply chain: Will payments follow. Chicago Federal Letter 206a.
  • Chau, P. and Jim, C. (2002). Adoption of electronic data interchange in small and medium-sized enterprises. Journal of Global Information Management 10 61–86.
  • Chen, M.-H., Dey, D. K. and Shao, Q.-M. (1999). A new skewed link model for dichotomous quantal response data. J. Amer. Statist. Assoc. 94 1172–1186.
  • Chen, M.-H. and Shao, Q.-M. (2000). Propriety of posterior distribution for dichotomous quantal response models with general link functions. Proc. Amer. Math. Soc. 129 293–302.
  • Chib, S. (1995). Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc. 90 1313–1321.
  • Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis–Hastings output. J. Amer. Statist. Assoc. 96 270–281.
  • Chib, S. and Jeliazkov, I. (2006). Inference in semiparameteric dynamic models for binary longitudinal data. J. Amer. Statist. Assoc. 101 685–700.
  • Coles, S. G. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, New York.
  • Coles, S., Pericchi, L. R. and Sisson, S. (2003). A fully probabilistic approach to extreme rainfall modeling. Journal of Hydrology 273 35–50.
  • Czado, C. and Santner, T. J. (1992). The effect of link misspecification on binary regression inference. J. Statist. Plann. Inference 33 213–231.
  • Dahan, E. and Mendelson, H. (2001). An extreme-value model of concept testing. Management Science 47 102–116.
  • Guerrero, V. M. and Johnson, R. A. (1982). Use of the Box–Cox transformation with binary response models. Biometrika 69 309–314.
  • Gupta, S. and Chintagunta, P. K. (1994). On using demographic variables to determine segment membership in logit mixture models. Journal of Marketing Research 31 128–136.
  • Kamakura, W. A. and Russell, G. (1989). A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26 379–390.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
  • Kim, S., Chen, M.-H. and Dey, D. K. (2008). Flexible generalized t-link models for binary response data. Biometrika 95 93–106.
  • McFadden, D. (1978). Modeling the choice of residential location. In Spatial Interaction Theory and Planning Models (A. Karlqvist, L. Lundqvist, F. Snickars and J. Weibull, eds.) 75–96. North Holland, Amsterdam.
  • Morales, C. F. (2005). Estimation of max-stable processes using Monte Carlo methods with applications to financial risk assessment. Ph.D. thesis, Dept. Statistics and Operations Research, Univ. North Carolina, Chapel Hill.
  • Morgan, B. J. T. (1983). Observations on quantit analysis. Biometrics 39 879–886.
  • Roberts, S. (2000). Extreme value statistics for novelty detection in biomedical data processing. Science, Measurement and Technology, IEE Proceedings 147 363–367.
  • Sang, H. and Gelfand, A. (2009). Hierarchical modeling for extreme values observed over space and time. Environmental and Ecological Statistics 16 407–426.
  • Shmueli, G. and Koppius, O. (2009). The challenge of prediction in information systems research. Robert H. Smith School Research Paper No. RHS 06-058.
  • Smith, R. L. (1985). Maximum likelihood estimation in a class of non-regular cases. Biometrika 72 67–90.
  • Smith, R. L. (1989). Extreme value analysis of environmental time series: An application to trend detection in ground-level ozone (with discussion). Statist. Sci. 4 367–393.
  • Smith, R. L. (2003). Statistics of extremes, with applications in environment, insurance and finance. In Extreme Values in Finance, Telecommunications and the Environment (B. Finkenstadt and H. Rootzen, eds.) 1–78. Chapman and Hall/CRC Press, London.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Stavins, J. (2003). Perspective on payments: Electronic payments networks benefit banks, businesses, and consumers. Why do so few use them? Regional Review 13 6–9.
  • Stukel, T. (1988). Generalized logistic models. J. Amer. Statist. Assoc. 83 426–431.
  • Thompson, M. L., Reynolds, J., Cox, L. H., Guttorp, P. and Sampson, P. D. (2001). A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmospheric Environment 35 617–630.
  • Wang, X. (2010). Supplement to “Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption.” DOI: 10.1214/10-AOAS354SUPP.
  • Wedel, M. and DeSarbo, W. (1993). A latent class binomial logit methodology for the analysis of paired comparison choice data: An application reinvestigating the determinants of perceived risk. Decision Sciences 24 1157–1170.
  • Wu, Y., Chen, M.-H. and Dey, D. (2002). On the relationship between links for binary response data. J. Stat. Stud. Special Volume in Honour of Professor Mir Masoom Ali’s 65th Birthday 159–172.

Supplemental materials

  • Supplementary material: R codes for GEV models with covariates. The computation for the GEV link described in this paper has been implemented in R which is available in this supplementary material.