Statistical Science

Design Issues for Generalized Linear Models: A Review

Abstract

Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM is a very important task in the development and building of an adequate model. However, one major problem that handicaps the construction of a GLM design is its dependence on the unknown parameters of the fitted model. Several approaches have been proposed in the past 25 years to solve this problem. These approaches, however, have provided only partial solutions that apply in only some special cases, and the problem, in general, remains largely unresolved. The purpose of this article is to focus attention on the aforementioned dependence problem. We provide a survey of various existing techniques dealing with the dependence problem. This survey includes discussions concerning locally optimal designs, sequential designs, Bayesian designs and the quantile dispersion graph approach for comparing designs for GLMs.

Article information

Source
Statist. Sci., Volume 21, Number 3 (2006), 376-399.

Dates
First available in Project Euclid: 20 December 2006

https://projecteuclid.org/euclid.ss/1166642442

Digital Object Identifier
doi:10.1214/088342306000000105

Mathematical Reviews number (MathSciNet)
MR2339137

Zentralblatt MATH identifier
1246.62168

Citation

Khuri, André I.; Mukherjee, Bhramar; Sinha, Bikas K.; Ghosh, Malay. Design Issues for Generalized Linear Models: A Review. Statist. Sci. 21 (2006), no. 3, 376--399. doi:10.1214/088342306000000105. https://projecteuclid.org/euclid.ss/1166642442

References

• Abdelbasit, K. M. and Plackett, R. L. (1983). Experimental design for binary data. J. Amer. Statist. Assoc. 78 90--98.
• Abdelhamid, S. N. (1973). Transformation of observations in stochastic approximation. Ann. Statist. 1 1158--1174.
• Anbar, D. (1973). On optimal estimation methods using stochastic approximation procedures. Ann. Statist. 1 1175--1184.
• Ashton, W. D. (1972). The Logit Transformation with Special Reference to Its Uses in Bioassay. Hafner, New York.
• Atkinson, A. C., Chaloner, K., Herzberg, A. M. and Juritz, J. (1993). Optimum experimental designs for properties of a compartmental model. Biometrics 49 325--337.
• Atkinson, A. C., Demetrio, C. G. B. and Zocchi, S. S. (1995). Optimum dose levels when males and females differ in response. Appl. Statist. 44 213--226.
• Atkinson, A. C. and Donev, A. N. (1992). Optimum Experimental Designs. Clarendon, Oxford.
• Atkinson, A. C. and Haines, L. M. (1996). Designs for nonlinear and generalized linear models. In Design and Analysis of Experiments (S. Ghosh and C. R. Rao, eds.) 437--475. North-Holland, Amsterdam.
• Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
• Bernardo, J.-M. (1979). Expected information as expected utility. Ann. Statist. 7 686--690.
• Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London.
• Berry, D. A. and Pearson, L. M. (1985). Optimal designs for clinical trials with dichotomous responses. Statistics in Medicine 4 497--508.
• Box, G. E. P. and Draper, N. R. (1987). Empirical Model-Building and Response Surfaces. Wiley, New York.
• Box, G. E. P. and Lucas, H. L. (1959). Design of experiments in non-linear situations. Biometrika 46 77--90.
• Brooks, R. J. (1972). A decision theory approach to optimal regression designs. Biometrika 59 563--571.
• Brooks, R. J. (1974). On the choice of an experiment for prediction in linear regression. Biometrika 61 303--311.
• Brooks, R. J. (1976). Optimal regression designs for prediction when prior knowledge is available. Metrika 23 221--230.
• Brooks, R. J. (1977). Optimal regression design for control in linear regression. Biometrika 64 319--325.
• Burridge, J. and Sebastiani, P. (1992). Optimal designs for generalised linear models. J. Italian Statist. Soc. 1 183--202.
• Burridge, J. and Sebastiani, P. (1994). $D$-optimal designs for generalised linear models with variance proportional to the square of the mean. Biometrika 81 295--304.
• Chaloner, K. (1984). Optimal Bayesian experimental designs for linear models. Ann. Statist. 12 283--300.
• Chaloner, K. (1987). An approach to design for generalized linear models. In Model-Oriented Data Analysis (V. V. Fedorov and H. Läuter, eds.) 3--12. Springer, Berlin.
• Chaloner, K. (1993). A note on optimal Bayesian design for nonlinear problems. J. Statist. Plann. Inference 37 229--235.
• Chaloner, K. and Larntz, K. (1988). Software for logistic regression experiment design. In Optimal Design and Analysis of Experiments (Y. Dodge, V. V. Fedorov and H. P. Wynn, eds.) 207--211. North-Holland, Amsterdam.
• Chaloner, K. and Larntz, K. (1989). Optimal Bayesian design applied to logistic regression experiments. J. Statist. Plann. Inference 21 191--208.
• Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10 273--304.
• Chen, K. (2000). Optimal sequential designs of case-control studies. Ann. Statist. 28 1452--1471.
• Chernoff, H. (1953). Locally optimal designs for estimating parameters. Ann. Math. Statist. 24 586--602.
• Chernoff, H. (1972). Sequential Analysis and Optimal Design. SIAM, Philadelphia.
• Clyde, M. A. (1993a). Bayesian optimal designs for approximate normality. Ph.D. dissertation, Univ. Minnesota.
• Clyde, M. A. (1993b). An object-oriented system for Bayesian nonlinear design using xlisp-stat. Technical Report 587, School of Statistics, Univ. Minnesota.
• Clyde, M. A. and Chaloner, K. (1996). The equivalence of constrained and weighted designs in multiple objective design problems. J. Amer. Statist. Assoc. 91 1236--1244.
• Clyde, M. A. and Chaloner, K. (2002). Constrained design strategies for improving normal approximations in nonlinear regression problems. J. Statist. Plann. Inference 104 175--196.
• Clyde, M. A., Müller, P. and Parmigiani, G. (1995). Optimal designs for heart defibrillators. Case Studies in Bayesian Statistics II. Lecture Notes in Statist. 105 278--292. Springer, New York.
• Das, P., Mandal, N. K. and Sinha, B. K. (2003). de la Garza phenomenon in linear regression with heteroscedastic errors. Technical report, Dept. Statistics, Calcutta Univ.
• DasGupta, A. (1996). Review of optimal Bayes designs. In Design and Analysis of Experiments (S. Ghosh and C. R. Rao, eds.) 1099--1147. North-Holland, Amsterdam.
• DasGupta, A., Mukhopadhyay, S. and Studden, W. J. (1992). Compromise designs in heteroscedastic linear models. J. Statist. Plann. Inference 32 363--384.
• DasGupta, A. and Studden, W. J. (1991). Robust Bayesian experimental designs in normal linear models. Ann. Statist. 19 1244--1256.
• Dawid, A. and Sebastiani, P. (1999). Coherent dispersion criteria for optimal experimental design. Ann. Statist. 27 65--81.
• DeGroot, M. H. (1986). Concepts of information based on utility. In Recent Developments in the Foundations of Utility and Risk Theory (L. Daboni, A. Montesano and M. Lines, eds.) 265--275. Reidel, Dordrecht.
• de la Garza, A. (1954). Spacing of information in polynomial regression. Ann. Math. Statist. 25 123--130.
• Dette, H. and Neugebauer, H.-M. (1996). Bayesian optimal one-point designs for one-parameter nonlinear models. J. Statist. Plann. Inference 52 17--30.
• Dette, H. and Sperlich, S. (1994). A note on Bayesian $D$-optimal designs for a generalization of the exponential growth model. South African Statist. J. 28 103--117.
• Diaz, M. P., Barchuk, A. H., Luque, S. and Oviedo, C. (2002). Generalized linear models to study spatial distribution of tree species in Argentinean arid Chaco. J. Appl. Statist. 29 685--694.
• Dixon, W. J. and Mood, A. M. (1948). A method for obtaining and analyzing sensitivity data. J. Amer. Statist. Assoc. 43 109--126.
• Dobson, A. J. (2002). An Introduction to Generalized Linear Models, 2nd ed. Chapman and Hall/CRC, Boca Raton, FL.
• Draper, N. R. and Hunter, W. G. (1967). The use of prior distributions in the design of experiments for parameter estimation in non-linear situations: Multiresponse case. Biometrika 54 662--665.
• Dubov, E. L. (1977). $D$-optimal designs for nonlinear models under the Bayesian approach. In Regression Experiments 103--111. Moscow Univ. Press. (In Russian.)
• DuMouchel, W. and Jones, B. (1994). A simple Bayesian modification of $D$-optimal designs to reduce dependence on an assumed model. Technometrics 36 37--47.
• Etzioni, R. and Kadane, J. B. (1993). Optimal experimental design for another's analysis. J. Amer. Statist. Assoc. 88 1404--1411.
• Fabian, V. (1968). On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39 1327--1332.
• Fabian, V. (1983). A local asymptotic minimax optimality of an adaptive Robbins--Monro stochastic approximation procedure. Mathematical Learning Models---Theory and Algorithms. Lecture Notes in Statist. 20 43--49. Springer, New York.
• Fedorov, V. V. (1972). Theory of Optimal Experiments. Academic Press, New York.
• Flournoy, N. (1993). A clinical experiment in bone marrow transplantation: Estimating a percentage point of a quantal response curve. Case Studies in Bayesian Statistics. Lecture Notes in Statist. 83 324--336. Springer, New York.
• Ford, I., Torsney, B. and Wu, C. F. J. (1992). The use of a cannonical form in the construction of locally optimal designs for nonlinear problems. J. Roy. Statist. Soc. Ser. B 54 569--583.
• Freeman, P. R. (1970). Optimal Bayesian sequential estimation of the median effective dose. Biometrika 57 79--89.
• Frees, E. W. and Ruppert, D. (1990). Estimation following a sequentially designed experiment. J. Amer. Statist. Assoc. 85 1123--1129.
• Gill, P. E., Murray, W., Saunders, M. A. and Wright, M. H. (1986). User's guide for NPSOL (version 4.0): A Fortran package for nonlinear programming. Technical Report SOL 86-2, Dept. Operations Research, Stanford Univ.
• Haines, L. (1995). A geometric approach to optimal design for one-parameter non-linear models. J. Roy. Statist. Soc. Ser. B 57 575--598.
• Hatzis, C. and Larntz, K. (1992). Optimal design in nonlinear multiresponse estimation: Poisson model for filter feeding. Biometrics 48 1235--1248.
• Heise, M. A. and Myers, R. H. (1996). Optimal designs for bivariate logistic regression. Biometrics 52 613--624.
• Hern, A. and Dorn, S. (2001). Statistical modelling of insect behavioral responses in relation to the chemical composition of test extracts. Physiological Entomology 26 381--390.
• Jewell, N. P. and Shiboski, S. (1990). Statistical analysis of HIV infectivity based on partner studies. Biometrics 46 1133--1150.
• Khan, M. K. and Yazdi, A. A. (1988). On $D$-optimal designs for binary data. J. Statist. Plann. Inference 18 83--91.
• Khuri, A. I. (1993). Response surface methodology within the framework of GLM. J. Combin. Inform. System Sci. 18 193--202.
• Khuri, A. I. and Cornell, J. A. (1996). Response Surfaces, 2nd ed. Dekker, New York.
• Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23 462--466.
• Kuo, L. (1983). Bayesian bioassay design. Ann. Statist. 11 886--895.
• Kuo, L., Soyer, R. and Wang, F. (1999). Optimal design for quantal bioassay via Monte Carlo methods. In Bayesian Statistics VI (J.-M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 795--802. Oxford Univ. Press, New York.
• Lai, T. L. and Robbins, H. (1979). Adaptive design and stochastic approximation. Ann. Statist. 7 1196--1221.
• Läuter, E. (1974). Experimental design in a class of models. Math. Oper. Statist. 5 379--396.
• Läuter, E. (1976). Optimal multipurpose designs for regression models. Math. Oper. Statist. 7 51--68.
• Lee, Y. J. and Nelder, J. A. (2002). Analysis of ulcer data using hierarchical generalized linear models. Statistics in Medicine 21 191--202.
• Lindley, D. V. (1956). On a measure of the information provided by an experiment. Ann. Math. Statist. 27 986--1005.
• Lindley, D. V. and Singpurwalla, N. D. (1991). On the evidence needed to reach agreed action between adversaries, with application to acceptance sampling. J. Amer. Statist. Assoc. 86 933--937.
• Lindsey, J. K. (1997). Applying Generalized Linear Models. Springer, New York.
• Liski, E. P., Mandal, N. K., Shah, K. R. and Sinha, B. K. (2002). Topics in Optimal Design. Lecture Notes in Statist. 163. Springer, New York.
• Marks, B. L. (1962). Some optimal sequential schemes for estimating the mean of a cumulative normal quantal response curve. J. Roy. Statist. Soc. Ser. B 24 393--400.
• Markus, R. A., Frank, J., Groshen, S. and Azen, S. P. (1995). An alternative approach to the optimal design of an LD50 bioassay. Statistics in Medicine 14 841--852.
• Mathew, T. and Sinha, B. K. (2001). Optimal designs for binary data under logistic regression. J. Statist. Plann. Inference 93 295--307.
• Matthews, J. N. S. (1999). Effect of prior specification on Bayesian design for two-sample comparison of a binary outcome. Amer. Statist. 53 254--256.
• McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
• McCulloch, C. E. and Searle, S. R. (2001). Generalized, Linear and Mixed Models. Wiley, New York.
• Mehrabi, Y. and Matthews, J. N. S. (1998). Implementable Bayesian designs for limiting dilution assays. Biometrics 54 1398--1406.
• Minkin, S. (1987). Optimal designs for binary data. J. Amer. Statist. Assoc. 82 1098--1103.
• Minkin, S. (1993). Experimental design for clonogenic assays in chemotherapy. J. Amer. Statist. Assoc. 88 410--420.
• Mukhopadhyay, S. and Haines, L. (1995). Bayesian $D$-optimal designs for the exponential growth model. J. Statist. Plann. Inference 44 385--397.
• Müller, P. (1999). Simulation-based optimal design. In Bayesian Statistics VI (J.-M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 459--474. Oxford Univ. Press, New York.
• Müller, P. and Parmigiani, G. (1995). Optimal design via curve fitting of Monte Carlo experiments. J. Amer. Statist. Assoc. 90 1322--1330.
• Myers, R. H. (1999). Response surface methodology---current status and future directions. J. Quality Technology 31 30--44.
• Myers, R. H., Khuri, A. I. and Carter, W. H. (1989). Response surface methodology: 1966--1988. Technometrics 31 137--157.
• Myers, R. H. and Montgomery, D. C. (1995). Response Surface Methodology. Wiley, New York.
• Myers, R. H., Montgomery, D. C. and Vining, G. G. (2002). Generalized Linear Models. Wiley, New York.
• Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. Computer J. 7 308--313.
• Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370--384.
• Owen, R. J. (1970). The optimum design of a two-factor experiment using prior information. Ann. Math. Statist. 41 1917--1934.
• Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. J. Amer. Statist. Assoc. 70 351--356.
• Palmer, J. L. and Müller, P. (1998). Bayesian optimal design in population models of hematologic data. Statistics in Medicine 17 1613--1622.
• Parmigiani, G. (1993). On optimal screening ages. J. Amer. Statist. Assoc. 88 622--628.
• Parmigiani, G. and Berry, D. A. (1994). Applications of Lindley information measure to the design of clinical experiments. In Aspects of Uncertainty (P. R. Freeman and A. F. M. Smith, eds.) 329--348. Wiley, Chichester.
• Parmigiani, G. and Müller, P. (1995). Simulation approach to one-stage and sequential optimal design problems. In MODA-$4$---Advances in Model Oriented Data Analysis (C. P. Kitsos and W. G. Müller, eds.) 37--47. Physica, Heidelberg.
• Pilz, J. (1991). Bayesian Estimation and Experimental Design in Linear Regression Models, 2nd ed. Wiley, New York.
• Pronzato, L. and Walter, E. (1987). Robust experiment designs for nonlinear regression models. In Model-Oriented Data Analysis (V. V. Fedorov and H. Laüter, eds.) 77--86. Springer, Berlin.
• Pukelsheim, F. (1993). Optimal Design of Experiments. Wiley, New York.
• Ridout, M. S. (1995). Three-stage designs for seed testing experiments. Appl. Statist. 44 153--162.
• Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Statist. 22 400--407.
• Robbins, H. and Siegmund, D. (1971). A convergence theorem for non-negative almost positive supermartingales and some applications. In Optimizing Methods in Statistics (J. S. Rustagi, ed.) 233--257. Academic Press, New York.
• Robinson, K. S. and Khuri, A. I. (2003). Quantile dispersion graphs for evaluating and comparing designs for logistic regression models. Comput. Statist. Data Anal. 43 47--62.
• Rosenberger, W. F. and Grill, S. E. (1997). A sequential design for psychophysical experiments: An application to estimating timing of sensory events. Statistics in Medicine 16 2245--2260.
• Ruppert, D., Reish, R. L., Deriso, R. B. and Carroll, R. J. (1984). Optimization using stochastic approximation and Monte Carlo simulation (with application to harvesting of Atlantic menhaden). Biometrics 40 535--545.
• Sebastiani, P. and Settimi, R. (1997). A note on $D$-optimal designs for a logistic regression model. J. Statist. Plann. Inference 59 359--368.
• Sebastiani, P. and Settimi, R. (1998). First-order optimal designs for non-linear models. J. Statist. Plann. Inference 74 177--192.
• Sielken, R. L. (1973). Stopping times for stochastic approximation procedures. Z. Wahrsch. Verw. Gebiete 26 67--75.
• Silvey, S. D. (1980). Optimal Design. Chapman and Hall, London.
• Sitter, R. R. (1992). Robust designs for binary data. Biometrics 48 1145--1155.
• Sitter, R. R. and Forbes, B. E. (1997). Optimal two-stage designs for binary response experiments. Statist. Sinica 7 941--955.
• Sitter, R. R. and Torsney, B. (1995). Optimal designs for binary response experiments with two design variables. Statist. Sinica 5 405--419.
• Sitter, R. R. and Wu, C.-F. J. (1993). Optimal designs for binary response experiments: Fieller, $D$, and $A$ criteria. Scand. J. Statist. 20 329--341.
• Sitter, R. R. and Wu, C.-F. J. (1999). Two-stage design of quantal response studies. Biometrics 55 396--402.
• Smith, D. M. and Ridout, M. S. (1998). Locally and Bayesian optimal designs for binary dose-response models with various link functions. In COMPSTAT 98 (R. Payne and P. Green, eds.) 455--460. Physica, Heidelberg.
• Smith, D. M. and Ridout, M. S. (2003). Optimal designs for criteria involving log(potency) in comparative binary bioassays. J. Statist. Plann. Inference 113 617--632.
• Smith, D. M. and Ridout, M. S. (2005). Algorithms for finding locally and Bayesian optimal designs for binary dose-response models with control mortality. J. Statist. Plann. Inference 133 463--478.
• Spears, F. M., Brown, B. W. and Atkinson, E. N. (1997). The effect of incomplete knowledge of parameter values on single- and multiple-stage designs for logistic regression. Biometrics 53 1--10.
• Steinberg, D. M. (1985). Model robust response surface designs: Scaling two-level factorials. Biometrika 72 513--526.
• Storer, B. E. (1989). Design and analysis of phase I clinical trials. Biometrics 45 925--937.
• Storer, B. E. (1990). A sequential phase II/II trial for binary outcomes. Statistics in Medicine 9 229--235.
• Stroup, D. F. and Braun, H. I. (1982). On a new stopping rule for stochastic approximation. Z. Wahrsch. Verw. Gebiete 60 535--554.
• Sun, D., Tsutakawa, R. K. and Lu, W.-S. (1996). Bayesian design of experiment for quantal responses: What's promised versus what's delivered. J. Statist. Plann. Inference 52 289--306.
• Tierney, L. (1990). LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics. Wiley, New York.
• Toman, B. and Gastwirth, J. L. (1993). Robust Bayesian experimental design and estimation for analysis of variance models using a class of normal mixtures. J. Statist. Plann. Inference 35 383--398.
• Toman, B. and Gastwirth, J. L. (1994). Efficiency robust experimental design and estimation using a data-based prior. Statist. Sinica 4 603--615.
• Tsutakawa, R. K. (1972). Design of experiment for bioassay. J. Amer. Statist. Assoc. 67 584--590.
• Tsutakawa, R. K. (1980). Selection of dose levels for estimating a percentage point of a logistic quantal response curve. Appl. Statist. 29 25--33.
• Venter, J. H. (1967). An extension of the Robbins--Monro procedure. Ann. Math. Statist. 38 181--190.
• Wei, C. Z. (1985). Asymptotic properties of least-squares estimates in stochastic regression models. Ann. Statist. 13 1498--1508.
• Wetherill, G. B. (1963). Sequential estimation of quantal response curves (with discussion). J. Roy. Statist. Soc. Ser. B 25 1--48.
• White, L. V. (1973). An extension of the general equivalence theorem to nonlinear models. Biometrika 60 345--348.
• White, L. V. (1975). The optimal design of experiments for estimation in nonlinear models. Ph.D. dissertation, Univ. London.
• Whittle, P. (1973). Some general points in the theory of optimal experimental design. J. Roy. Statist. Soc. Ser. B 35 123--130.
• Wijesinha, M. C. and Khuri, A. I. (1987a). The sequential generation of multiresponse $D$-optimal designs when the variance--covariance matrix is not known. Comm. Statist. Simulation Comput. 16 239--259.
• Wijesinha, M. C. and Khuri, A. I. (1987b). Construction of optimal designs to increase the power of the multiresponse lack of fit test. J. Statist. Plann. Inference 16 179--192.
• Wu, C.-F. J. (1985). Efficient sequential designs with binary data. J. Amer. Statist. Assoc. 80 974--984.
• Wu, C.-F. J. (1988). Optimal design for percentile estimation of a quantal response curve. In Optimal Design and Analysis of Experiments (Y. Dodge, V. V. Federov and H. P. Wynn, eds.) 213--223. North-Holland, Amsterdam.
• Yan, Z. W., Bate, S., Chandler, R. E., Isham, V. and Wheater, H. (2002). An analysis of daily maximum wind speed in northwestern Europe using generalized linear models. J. Climate 15 2073--2088.
• Zacks, S. (1977). Problems and approaches in design of experiments for estimation and testing in non-linear models. In Multivariate Analysis IV (P. R. Krishnaiah, ed.) 209--223. North-Holland, Amsterdam.
• Zhu, W., Ahn, H. and Wong, W. K. (1998). Multiple-objective optimal designs for the logit model. Comm. Statist. A---Theory Methods 27 1581--1592.
• Zhu, W. and Wong, W. K. (2001). Bayesian optimal designs for estimating a set of symmetric quantiles. Statistics in Medicine 20 123--137.
• Zocchi, S. S. and Atkinson, A. C. (1999). Optimum experimental designs for multinomial logistic models. Biometrics 55 437--444.