## Electronic Journal of Statistics

### Confidence intervals for the means of the selected populations

#### Abstract

Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed with a common variance $\sigma^{2}$. The method, based on the minimization of the coverage probability, obtains confidence intervals that attain the nominal coverage probability for any $p$ and $k$, taking into account the selection procedure.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 58-79.

Dates
First available in Project Euclid: 5 January 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1515142842

Digital Object Identifier
doi:10.1214/17-EJS1374

Mathematical Reviews number (MathSciNet)
MR3743737

Zentralblatt MATH identifier
1384.62058

#### Citation

Fuentes, Claudio; Casella, George; Wells, Martin T. Confidence intervals for the means of the selected populations. Electron. J. Statist. 12 (2018), no. 1, 58--79. doi:10.1214/17-EJS1374. https://projecteuclid.org/euclid.ejs/1515142842

#### References

• [1] Ahsanullah, M., Nevzorov, V. B. and Shakil, M. (2013)., An Introduction to Order Statistics. Paris: Atlantis Press.
• [2] Bechhofer, R. E. (1954). A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with Known Variances., Annals of Mathematical Statistics 25 16–39.
• [3] Bechhofer, R. E., Santner, T. J. and Goldsman, D. M. (1995)., Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons. New York: Wiley.
• [4] Benjamini, Y. and Yekutieli, D. (2005). False Discovery Rate Adjusted Multiple Confidence Intervals for Selected Parameters., Journal of the American Statistical Association 100 71–81.
• [5] Berger, J. O. (1976). Inadmissibility Results for Generalized Bayes Estimators of Coordinates of a Location Vector., Annals of Statistics 4 302–333.
• [6] Blumenthal, S. and Cohen, A. (1968). Estimation of the Larger of Two Normal Means., Journal of the American Statistical Association 63 861–876.
• [7] Brown, L. D. (1979). A Heuristic Method for Determining Admissibility of Estimators with Applications., Annals of Statistics 7 960–994.
• [8] Brown, L. D. (1987). Personal, Communication.
• [9] Chen, H. J. and Dudewicz, E. J. (1976). Procedures for Fixed-Width Interval Estimation of the Largest Normal Mean., Journal of the American Statistical Association 71 752–756.
• [10] Cohen, A. and Sackrowitz, H. B. (1982). Estimating the Mean of the Selected Population, In:, Third Purdue Symposium on Statistical Decision Theory and Related Topics. New York: Academic Press.
• [11] Cohen, A. and Sackrowitz, H. B. (1986). A Decision Theoretic Formulation for Population Selection Followed by Estimating the Mean of the Selected Population, In:, Fourth Purdue Symposium on Statistical Decision Theory and Related Topics. New York: Academic Press.
• [12] Dahiya, R. C. (1974). Estimation of the Mean of the Selected Population., Journal of the American Statistical Association 69 226–230.
• [13] Efron, B. (2011). Tweedie’s formula and selection bias., Journal of the American Statistical Association 106 1602–1614.
• [14] Gupta, S. S. and Miescke, K. J. (1990). On Finding the Largest Normal Mean and Estimating the Selected Mean., Sankhyā: The Indian Journal of Statistics, Series B 52 144–157.
• [15] Gupta, S. S. and Panchapakesan, S. (2002)., Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations. Philadelphia: Society for Industrial and Applied Mathematics.
• [16] Gupta, S. S. and Sobel, M. (1957). On a Statistic Which Arises in Selection and Ranking Problems., Annals of Mathematical Statistics 28 957–967.
• [17] Guttman, I. and Tiao, G. C. (1964). A Bayesian Approach to Some Best Population Problems., Annals of Mathematical Statistics 35 825–835.
• [18] Hwang, J. T. (1993). Empirical Bayes Estimation for the Means of the Selected Populations., Sankhyā: The Indian Journal of Statistics, Series A 55 285–304.
• [19] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2014). Exact post-selection inference, with application to the lasso., arXiv preprint arXiv:1311.6238.
• [20] Lele, C. (1993). Admissibility Results in Loss Estimation., Annals of Statistics 21 378–390.
• [21] Putter, J. and Rubinstein, D. (1968). On Estimating the Mean of a Selected Population Technical Report No. 165, Department of Statistics, University of, Wisconsin.
• [22] Qiu, J. and Hwang, J. T. G. (2007). Sharp Simultaneous Intervals for the Means of Selected Populations with Application to Microarray Data Analysis., Biometrics 63 767–776.
• [23] Reid, S. and Tibshirani, R. (2014). Post selection point and interval estimation of signal sizes in Gaussian samples., arXiv preprint arXiv:1405.3340.
• [24] Sackrowitz, H. and Samuel-Cahn, E. (1984). Estimation of the Mean of a Selected Negative Exponential Population., Journal of the Royal Statistical Society: Series B 46 242–249.
• [25] Sackrowitz, H. and Samuel-Cahn, E. (1986). Evaluating the Chosen Population: A Bayes and Minimax Approach., Lecture Notes-Monograph Series 386–399.
• [26] Saxena, K. M. L. (1976). A Single-Sample Procedure for the Estimation of the Largest Mean., Journal of the American Statistical Association 71 147–148.
• [27] Saxena, K. M. L. and Tong, Y. L. (1969). Interval Estimation of the Largest Mean of k Normal Populations with Known Variances., Journal of the American Statistical Association 64 296–299.
• [28] Simon, N. and Simon, R. (2013). On estimating many means, selection bias, and the bootstrap., arXiv preprint arXiv:1311.3709.
• [29] Stein, C. (1964). Contribution to the Discussion of Bayesian and Non-Bayesian Decision Theory., Handout from the Institute of Mathematical Statistics Meeting.
• [30] Van de Geer, B. P. R. Y. S. and Dezeure, R. (2014). On Asymptotically Optimal Confidence Regions and Tests for High-Dimensional Models., Annals of Statistics 42 1166–1202.
• [31] Venter, J. (1988). Estimation of the Mean of the Selected Population., Communications in Statistics-Theory and Methods 17 791–805.
• [32] Venter, J. (1988). Confidence Bounds Based on the Largest Treatment Mean., South African Journal of Science 84 340–342.
• [33] Venter, J. and Steel, S. (1991). Estimation of the Mean of the Population Selected from k Populations., Journal of Statistical Computation and Simulation 38 1–14.
• [34] Zhang, C. H. and Zhang, S. (2014). Confidence Intervals for Low-Dimensional Parameters with High-Dimensional Data., Journal of the Royal Statistical Society: Series B 76 217–242.
• [35] Zhao, Z. and Hwang, J. (2012). Empirical Bayes False Coverage Rate Controlling Confidence Intervals., Journal of the Royal Statistical Society: Series B 74 871–891.