Open Access
October 2019 Semi-supervised inference: General theory and estimation of means
Anru Zhang, Lawrence D. Brown, T. Tony Cai
Ann. Statist. 47(5): 2538-2566 (October 2019). DOI: 10.1214/18-AOS1756
Abstract

We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available.

Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $\ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.

References

1.

Ando, R. K. and Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6 1817–1853. 1222.68133Ando, R. K. and Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6 1817–1853. 1222.68133

2.

Ando, R. K. and Zhang, T. (2007). Two-view feature generation model for semi-supervised learning. In Proceedings of the 24th International Conference on Machine Learning 25–32. ACM, New York.Ando, R. K. and Zhang, T. (2007). Two-view feature generation model for semi-supervised learning. In Proceedings of the 24th International Conference on Machine Learning 25–32. ACM, New York.

3.

Azriel, D., Brown, L. D., Sklar, M., Berk, R., Buja, A. and Zhao, L. (2016). Semi-supervised linear regression. Preprint. Available at  arXiv:1612.023911612.02391Azriel, D., Brown, L. D., Sklar, M., Berk, R., Buja, A. and Zhao, L. (2016). Semi-supervised linear regression. Preprint. Available at  arXiv:1612.023911612.02391

4.

Bickel, P. J., Ritov, Y. and Wellner, J. A. (1991). Efficient estimation of linear functionals of a probability measure $P$ with known marginal distributions. Ann. Statist. 19 1316–1346. 0742.62034 10.1214/aos/1176348251 euclid.aos/1176348251Bickel, P. J., Ritov, Y. and Wellner, J. A. (1991). Efficient estimation of linear functionals of a probability measure $P$ with known marginal distributions. Ann. Statist. 19 1316–1346. 0742.62034 10.1214/aos/1176348251 euclid.aos/1176348251

5.

Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1998). Efficient and Adaptive Estimation for Semiparametric Models. Springer, New York. Reprint of the 1993 original. 0894.62005Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1998). Efficient and Adaptive Estimation for Semiparametric Models. Springer, New York. Reprint of the 1993 original. 0894.62005

6.

Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 92–100. ACM, New York.Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 92–100. ACM, New York.

7.

Bratley, P., Fox, B. L. and Schrage, L. E. (1987). A Guide to Simulation. Springer, New York. 0515.68070Bratley, P., Fox, B. L. and Schrage, L. E. (1987). A Guide to Simulation. Springer, New York. 0515.68070

8.

Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., Zhan, K. and Zhao, L. (2014). Models as approximations, Part I: A conspiracy of nonlinearity and random regressors in linear regression. Preprint. Available at  arXiv:1404.15781404.1578Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M., Zhan, K. and Zhao, L. (2014). Models as approximations, Part I: A conspiracy of nonlinearity and random regressors in linear regression. Preprint. Available at  arXiv:1404.15781404.1578

9.

Buja, A., Berk, R., Brown, L., George, E., Kuchibhotla, A. K. and Zhao, L. (2016). Models as approximations—Part II: A general theory of model-robust regression. Preprint. Available at  ArXiv:1612.032571612.03257Buja, A., Berk, R., Brown, L., George, E., Kuchibhotla, A. K. and Zhao, L. (2016). Models as approximations—Part II: A general theory of model-robust regression. Preprint. Available at  ArXiv:1612.032571612.03257

10.

Chakrabortty, A. and Cai, T. (2018). Efficient and adaptive linear regression in semi-supervised settings. Ann. Statist. 46 1541–1572. 1403.62041 10.1214/17-AOS1594 euclid.aos/1530086425Chakrabortty, A. and Cai, T. (2018). Efficient and adaptive linear regression in semi-supervised settings. Ann. Statist. 46 1541–1572. 1403.62041 10.1214/17-AOS1594 euclid.aos/1530086425

11.

Cochran, W. G. (1953). Sampling Techniques. Wiley, New York. 0051.10707Cochran, W. G. (1953). Sampling Techniques. Wiley, New York. 0051.10707

12.

Deng, L.-Y. and Wu, C.-F. J. (1987). Estimation of variance of the regression estimator. J. Amer. Statist. Assoc. 82 568–576. 0629.62016 10.1080/01621459.1987.10478467Deng, L.-Y. and Wu, C.-F. J. (1987). Estimation of variance of the regression estimator. J. Amer. Statist. Assoc. 82 568–576. 0629.62016 10.1080/01621459.1987.10478467

13.

Fishman, G. S. (1996). Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York. 0859.65001Fishman, G. S. (1996). Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York. 0859.65001

14.

Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica 79 437–452. 1210.62058 10.3982/ECTA7379Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica 79 437–452. 1210.62058 10.3982/ECTA7379

15.

Hansen, B. E. (2017). Econometrics. Book draft. Available at  http://www.ssc.wisc.edu/~bhansen/econometrics/.Hansen, B. E. (2017). Econometrics. Book draft. Available at  http://www.ssc.wisc.edu/~bhansen/econometrics/.

16.

Hasminskii, R. Z. and Ibragimov, I. A. (1983). On asymptotic efficiency in the presence of an infinite-dimensional nuisance parameter. In Probability Theory and Mathematical Statistics (Tbilisi, 1982). Lecture Notes in Math. 1021 195–229. Springer, Berlin.Hasminskii, R. Z. and Ibragimov, I. A. (1983). On asymptotic efficiency in the presence of an infinite-dimensional nuisance parameter. In Probability Theory and Mathematical Statistics (Tbilisi, 1982). Lecture Notes in Math. 1021 195–229. Springer, Berlin.

17.

Hickernell, F. J., Lemieux, C. and Owen, A. B. (2005). Control variates for quasi-Monte Carlo. Statist. Sci. 20 1–31. 1100.65006 10.1214/088342304000000468 euclid.ss/1118065040Hickernell, F. J., Lemieux, C. and Owen, A. B. (2005). Control variates for quasi-Monte Carlo. Statist. Sci. 20 1–31. 1100.65006 10.1214/088342304000000468 euclid.ss/1118065040

18.

Johnson, R. and Zhang, T. (2008). Graph-based semi-supervised learning and spectral kernel design. IEEE Trans. Inform. Theory 54 275–288. 1304.68147 10.1109/TIT.2007.911294Johnson, R. and Zhang, T. (2008). Graph-based semi-supervised learning and spectral kernel design. IEEE Trans. Inform. Theory 54 275–288. 1304.68147 10.1109/TIT.2007.911294

19.

Kriegler, B. and Berk, R. (2010). Small area estimation of the homeless in Los Angeles: An application of cost-sensitive stochastic gradient boosting. Ann. Appl. Stat. 4 1234–1255. 1202.62178 10.1214/10-AOAS328 euclid.aoas/1287409371Kriegler, B. and Berk, R. (2010). Small area estimation of the homeless in Los Angeles: An application of cost-sensitive stochastic gradient boosting. Ann. Appl. Stat. 4 1234–1255. 1202.62178 10.1214/10-AOAS328 euclid.aoas/1287409371

20.

Kuchibhotla, A. (2017). Research notes on efficiency in semi-supervised problems. Available from the author at  arunku@wharton.upenn.edu.Kuchibhotla, A. (2017). Research notes on efficiency in semi-supervised problems. Available from the author at  arunku@wharton.upenn.edu.

21.

Lafferty, J. D. and Wasserman, L. (2008). Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems 801–808.Lafferty, J. D. and Wasserman, L. (2008). Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems 801–808.

22.

Lohr, S. (2009). Sampling: Design and Analysis. Nelson Education. 1273.62010Lohr, S. (2009). Sampling: Design and Analysis. Nelson Education. 1273.62010

23.

Peng, H. and Schick, A. (2002). On efficient estimation of linear functionals of a bivariate distribution with known marginals. Statist. Probab. Lett. 59 83–91. 1056.62045 10.1016/S0167-7152(02)00206-7Peng, H. and Schick, A. (2002). On efficient estimation of linear functionals of a bivariate distribution with known marginals. Statist. Probab. Lett. 59 83–91. 1056.62045 10.1016/S0167-7152(02)00206-7

24.

Pitkin, E., Berk, R., Brown, L., Buja, A., George, E., Zhang, K. and Zhao, L. (2013). Improved precision in estimating average treatment effects. Preprint. Available at  arXiv:1311.02911311.0291Pitkin, E., Berk, R., Brown, L., Buja, A., George, E., Zhang, K. and Zhao, L. (2013). Improved precision in estimating average treatment effects. Preprint. Available at  arXiv:1311.02911311.0291

25.

Rossi, P. H. (1991). Strategies for homeless research in the 1990s. Hous. Policy Debate 2 1027–1055.Rossi, P. H. (1991). Strategies for homeless research in the 1990s. Hous. Policy Debate 2 1027–1055.

26.

Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci. 10 (1923), 1–51]. Statist. Sci. 5 472–480.Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci. 10 (1923), 1–51]. Statist. Sci. 5 472–480.

27.

Spłlawa-Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472. Translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. 0955.01560 10.1214/ss/1177012031 euclid.ss/1177012031Spłlawa-Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472. Translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. 0955.01560 10.1214/ss/1177012031 euclid.ss/1177012031

28.

van der Vaart, A. (2002). Semiparametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Lecture Notes in Math. 1781 331–457. Springer, Berlin. 1013.62031van der Vaart, A. (2002). Semiparametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Lecture Notes in Math. 1781 331–457. Springer, Berlin. 1013.62031

29.

Vapnik, V. N. (2013). The Nature of Statistical Learning Theory. Springer, Berlin. 0934.62009Vapnik, V. N. (2013). The Nature of Statistical Learning Theory. Springer, Berlin. 0934.62009

30.

Wang, J. and Shen, X. (2007). Large margin semi-supervised learning. J. Mach. Learn. Res. 8 1867–1891. 1222.68329Wang, J. and Shen, X. (2007). Large margin semi-supervised learning. J. Mach. Learn. Res. 8 1867–1891. 1222.68329

31.

Wang, J., Shen, X. and Liu, Y. (2008). Probability estimation for large-margin classifiers. Biometrika 95 149–167. 05563384 10.1093/biomet/asm077Wang, J., Shen, X. and Liu, Y. (2008). Probability estimation for large-margin classifiers. Biometrika 95 149–167. 05563384 10.1093/biomet/asm077

32.

Wang, J., Shen, X. and Pan, W. (2009). On efficient large margin semisupervised learning: Method and theory. J. Mach. Learn. Res. 10 719–742. 1235.68203Wang, J., Shen, X. and Pan, W. (2009). On efficient large margin semisupervised learning: Method and theory. J. Mach. Learn. Res. 10 719–742. 1235.68203

33.

Yaskov, P. (2014). Lower bounds on the smallest eigenvalue of a sample covariance matrix. Electron. Commun. Probab. 19 no. 83, 10. 1320.60023 10.1214/ECP.v19-3807Yaskov, P. (2014). Lower bounds on the smallest eigenvalue of a sample covariance matrix. Electron. Commun. Probab. 19 no. 83, 10. 1320.60023 10.1214/ECP.v19-3807

34.

Zhang, A., Brown, L. D. and Cai, T. T. (2019). Supplement to “Semi-supervised inference: General theory and estimation of means.”  DOI:10.1214/18-AOS1756SUPP.Zhang, A., Brown, L. D. and Cai, T. T. (2019). Supplement to “Semi-supervised inference: General theory and estimation of means.”  DOI:10.1214/18-AOS1756SUPP.

35.

Zhu, X. (2008). Semi-supervised learning literature survey. Technical report.Zhu, X. (2008). Semi-supervised learning literature survey. Technical report.

36.

Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3 1–130.  DOI:10.2200/S00196ED1V01Y200906AIM0061209.68435Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3 1–130.  DOI:10.2200/S00196ED1V01Y200906AIM0061209.68435
Copyright © 2019 Institute of Mathematical Statistics
Anru Zhang, Lawrence D. Brown, and T. Tony Cai "Semi-supervised inference: General theory and estimation of means," The Annals of Statistics 47(5), 2538-2566, (October 2019). https://doi.org/10.1214/18-AOS1756
Received: 1 August 2017; Published: October 2019
Vol.47 • No. 5 • October 2019
Back to Top