Electronic Journal of Statistics

Confidence intervals for the means of the selected populations

Claudio Fuentes, George Casella, and Martin T. Wells

Full-text: Open access

Abstract

Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed with a common variance $\sigma^{2}$. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any $p$ and $k$, taking into account the selection procedure.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 58-79.

Dates
Received: February 2016
First available in Project Euclid: 5 January 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1515142842

Digital Object Identifier
doi:10.1214/17-EJS1374

Mathematical Reviews number (MathSciNet)
MR3743737

Zentralblatt MATH identifier
1384.62058

Keywords
Confidence intervals selected means selected populations asymmetric intervals simultaneous inference frequentist estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Fuentes, Claudio; Casella, George; Wells, Martin T. Confidence intervals for the means of the selected populations. Electron. J. Statist. 12 (2018), no. 1, 58--79. doi:10.1214/17-EJS1374. https://projecteuclid.org/euclid.ejs/1515142842


Export citation

References

  • [1] Ahsanullah, M., Nevzorov, V. B. and Shakil, M. (2013)., An Introduction to Order Statistics. Paris: Atlantis Press.
  • [2] Bechhofer, R. E. (1954). A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with Known Variances., Annals of Mathematical Statistics 25 16–39.
  • [3] Bechhofer, R. E., Santner, T. J. and Goldsman, D. M. (1995)., Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons. New York: Wiley.
  • [4] Benjamini, Y. and Yekutieli, D. (2005). False Discovery Rate Adjusted Multiple Confidence Intervals for Selected Parameters., Journal of the American Statistical Association 100 71–81.
  • [5] Berger, J. O. (1976). Inadmissibility Results for Generalized Bayes Estimators of Coordinates of a Location Vector., Annals of Statistics 4 302–333.
  • [6] Blumenthal, S. and Cohen, A. (1968). Estimation of the Larger of Two Normal Means., Journal of the American Statistical Association 63 861–876.
  • [7] Brown, L. D. (1979). A Heuristic Method for Determining Admissibility of Estimators with Applications., Annals of Statistics 7 960–994.
  • [8] Brown, L. D. (1987). Personal, Communication.
  • [9] Chen, H. J. and Dudewicz, E. J. (1976). Procedures for Fixed-Width Interval Estimation of the Largest Normal Mean., Journal of the American Statistical Association 71 752–756.
  • [10] Cohen, A. and Sackrowitz, H. B. (1982). Estimating the Mean of the Selected Population, In:, Third Purdue Symposium on Statistical Decision Theory and Related Topics. New York: Academic Press.
  • [11] Cohen, A. and Sackrowitz, H. B. (1986). A Decision Theoretic Formulation for Population Selection Followed by Estimating the Mean of the Selected Population, In:, Fourth Purdue Symposium on Statistical Decision Theory and Related Topics. New York: Academic Press.
  • [12] Dahiya, R. C. (1974). Estimation of the Mean of the Selected Population., Journal of the American Statistical Association 69 226–230.
  • [13] Efron, B. (2011). Tweedie’s formula and selection bias., Journal of the American Statistical Association 106 1602–1614.
  • [14] Gupta, S. S. and Miescke, K. J. (1990). On Finding the Largest Normal Mean and Estimating the Selected Mean., Sankhyā: The Indian Journal of Statistics, Series B 52 144–157.
  • [15] Gupta, S. S. and Panchapakesan, S. (2002)., Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations. Philadelphia: Society for Industrial and Applied Mathematics.
  • [16] Gupta, S. S. and Sobel, M. (1957). On a Statistic Which Arises in Selection and Ranking Problems., Annals of Mathematical Statistics 28 957–967.
  • [17] Guttman, I. and Tiao, G. C. (1964). A Bayesian Approach to Some Best Population Problems., Annals of Mathematical Statistics 35 825–835.
  • [18] Hwang, J. T. (1993). Empirical Bayes Estimation for the Means of the Selected Populations., Sankhyā: The Indian Journal of Statistics, Series A 55 285–304.
  • [19] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2014). Exact post-selection inference, with application to the lasso., arXiv preprint arXiv:1311.6238.
  • [20] Lele, C. (1993). Admissibility Results in Loss Estimation., Annals of Statistics 21 378–390.
  • [21] Putter, J. and Rubinstein, D. (1968). On Estimating the Mean of a Selected Population Technical Report No. 165, Department of Statistics, University of, Wisconsin.
  • [22] Qiu, J. and Hwang, J. T. G. (2007). Sharp Simultaneous Intervals for the Means of Selected Populations with Application to Microarray Data Analysis., Biometrics 63 767–776.
  • [23] Reid, S. and Tibshirani, R. (2014). Post selection point and interval estimation of signal sizes in Gaussian samples., arXiv preprint arXiv:1405.3340.
  • [24] Sackrowitz, H. and Samuel-Cahn, E. (1984). Estimation of the Mean of a Selected Negative Exponential Population., Journal of the Royal Statistical Society: Series B 46 242–249.
  • [25] Sackrowitz, H. and Samuel-Cahn, E. (1986). Evaluating the Chosen Population: A Bayes and Minimax Approach., Lecture Notes-Monograph Series 386–399.
  • [26] Saxena, K. M. L. (1976). A Single-Sample Procedure for the Estimation of the Largest Mean., Journal of the American Statistical Association 71 147–148.
  • [27] Saxena, K. M. L. and Tong, Y. L. (1969). Interval Estimation of the Largest Mean of k Normal Populations with Known Variances., Journal of the American Statistical Association 64 296–299.
  • [28] Simon, N. and Simon, R. (2013). On estimating many means, selection bias, and the bootstrap., arXiv preprint arXiv:1311.3709.
  • [29] Stein, C. (1964). Contribution to the Discussion of Bayesian and Non-Bayesian Decision Theory., Handout from the Institute of Mathematical Statistics Meeting.
  • [30] Van de Geer, B. P. R. Y. S. and Dezeure, R. (2014). On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models., Annals of Statistics 42 1166–1202.
  • [31] Venter, J. (1988). Estimation of the Mean of the Selected Population., Communications in Statistics-Theory and Methods 17 791–805.
  • [32] Venter, J. (1988). Confidence Bounds Based on the Largest Treatment Mean., South African Journal of Science 84 340–342.
  • [33] Venter, J. and Steel, S. (1991). Estimation of the Mean of the Population Selected from k Populations., Journal of Statistical Computation and Simulation 38 1–14.
  • [34] Zhang, C. H. and Zhang, S. (2014). Confidence Intervals for Low-Dimensional Parameters with High-Dimensional Data., Journal of the Royal Statistical Society: Series B 76 217–242.
  • [35] Zhao, Z. and Hwang, J. (2012). Empirical Bayes False Coverage Rate Controlling Confidence Intervals., Journal of the Royal Statistical Society: Series B 74 871–891.