Density-sensitive semisupervised inference

Martin Azizyan; Aarti Singh; Larry Wasserman

doi:10.1214/13-AOS1092

April 2013 Density-sensitive semisupervised inference

Martin Azizyan, Aarti Singh, Larry Wasserman

Ann. Statist. 41(2): 751-771 (April 2013). DOI: 10.1214/13-AOS1092

Abstract

Semisupervised methods are techniques for using labeled data $(X_{1},Y_{1}),\ldots,(X_{n},Y_{n})$ together with unlabeled data $X_{n+1},\ldots,X_{N}$ to make predictions. These methods invoke some assumptions that link the marginal distribution $P_{X}$ of $X$ to the regression function $f(x)$. For example, it is common to assume that $f$ is very smooth over high density regions of $P_{X}$. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution $P_{X}$. Our model includes a parameter $\alpha$ that controls the strength of the semisupervised assumption. We then use the data to adapt to $\alpha$.

References

1.

Azizyan, M., Singh, A. and Wasserman, L. (2013). Supplement to “Density-sensitive semisupervised inference.” DOI:10.1214/13-AOS1092SUPP.Azizyan, M., Singh, A. and Wasserman, L. (2013). Supplement to “Density-sensitive semisupervised inference.” DOI:10.1214/13-AOS1092SUPP.

2.

Belkin, M. and Niyogi, P. (2004). Semi-supervised learning on Riemannian manifolds. Machine Learning 56 209–239.Belkin, M. and Niyogi, P. (2004). Semi-supervised learning on Riemannian manifolds. Machine Learning 56 209–239.

3.

Ben-David, S., Lu, T. and Pal, D. (2008). Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In 21st Annual Conference on Learning Theory (COLT). Available at http://www.informatik.uni-trier.de/~ley/db/conf/colt/colt2008.html.Ben-David, S., Lu, T. and Pal, D. (2008). Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In 21st Annual Conference on Learning Theory (COLT). Available at http://www.informatik.uni-trier.de/~ley/db/conf/colt/colt2008.html.

4.

Bijral, A., Ratliff, N. and Srebro, N. (2011). Semi-supervised learning with density based distances. In 27th Conference on Uncertainty in Artificial Intelligence. Available at http://auai.org/uai2011/accepted.html.Bijral, A., Ratliff, N. and Srebro, N. (2011). Semi-supervised learning with density based distances. In 27th Conference on Uncertainty in Artificial Intelligence. Available at http://auai.org/uai2011/accepted.html.

5.

Bousquet, O., Chapelle, O. and Hein, M. (2004). Measure based regularization. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.Bousquet, O., Chapelle, O. and Hein, M. (2004). Measure based regularization. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.

6.

Castelli, V. and Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recognition Letters 16 105–111.Castelli, V. and Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recognition Letters 16 105–111.

7.

Castelli, V. and Cover, T. M. (1996). The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inform. Theory 42 2102–2117. MR1447517 10.1109/18.556600Castelli, V. and Cover, T. M. (1996). The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inform. Theory 42 2102–2117. MR1447517 10.1109/18.556600

8.

Craig, C. C. (1933). On the Tchebychef inequality of Bernstein. Ann. Math. Statist. 4 94–102.Craig, C. C. (1933). On the Tchebychef inequality of Bernstein. Ann. Math. Statist. 4 94–102.

9.

Culp, M. (2011a). On propagated scoring for semisupervised additive models. J. Amer. Statist. Assoc. 106 248–259. MR2816718 10.1198/jasa.2011.tm09316Culp, M. (2011a). On propagated scoring for semisupervised additive models. J. Amer. Statist. Assoc. 106 248–259. MR2816718 10.1198/jasa.2011.tm09316

10.

Culp, M. (2011b). spa: Semi-supervised semi-parametric graph-based estimation in R. Journal of Statistical Software 40 1–29.Culp, M. (2011b). spa: Semi-supervised semi-parametric graph-based estimation in R. Journal of Statistical Software 40 1–29.

11.

Culp, M. and Michailidis, G. (2008). An iterative algorithm for extending learners to a semi-supervised setting. J. Comput. Graph. Statist. 17 545–571. MR2451341 10.1198/106186008X344748Culp, M. and Michailidis, G. (2008). An iterative algorithm for extending learners to a semi-supervised setting. J. Comput. Graph. Statist. 17 545–571. MR2451341 10.1198/106186008X344748

12.

Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York. MR1920390Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York. MR1920390

13.

Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036–4048. MR2298532 10.1109/TIT.2006.880031Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036–4048. MR2298532 10.1109/TIT.2006.880031

14.

Kpotufe, S. (2011). $k$-NN regression adapts to local intrinsic dimension. In Advances in Neural Information Processing Systems 24 729–737. MIT Press, Cambridge, MA.Kpotufe, S. (2011). $k$-NN regression adapts to local intrinsic dimension. In Advances in Neural Information Processing Systems 24 729–737. MIT Press, Cambridge, MA.

15.

Lafferty, J. and Wasserman, L. (2007). Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems 20 801–808. MIT Press, Cambridge, MA.Lafferty, J. and Wasserman, L. (2007). Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems 20 801–808. MIT Press, Cambridge, MA.

16.

Lee, A. B. and Wasserman, L. (2008). Spectral connectivity analysis. Preprint. Available at arXiv:0811.0121. MR2752618 10.1198/jasa.2010.tm09754Lee, A. B. and Wasserman, L. (2008). Spectral connectivity analysis. Preprint. Available at arXiv:0811.0121. MR2752618 10.1198/jasa.2010.tm09754

17.

Liang, F., Mukherjee, S. and West, M. (2007). The use of unlabeled data in predictive modeling. Statist. Sci. 22 189–205. MR2408958 10.1214/088342307000000032 euclid.ss/1190905518 Liang, F., Mukherjee, S. and West, M. (2007). The use of unlabeled data in predictive modeling. Statist. Sci. 22 189–205. MR2408958 10.1214/088342307000000032 euclid.ss/1190905518

18.

Nadler, B., Srebro, N. and Zhou, X. (2009). Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data. In Advances in Neural Information Processing Systems 22 1330–1338. MIT Press, Cambridge, MA.Nadler, B., Srebro, N. and Zhou, X. (2009). Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data. In Advances in Neural Information Processing Systems 22 1330–1338. MIT Press, Cambridge, MA.

19.

Niyogi, P. (2008). Manifold regularization and semi-supervised learning: Some theoretical analyses. Technical Report TR-2008-01, Computer Science Dept., Univ. Chicago. Available at http://people.cs.uchicago.edu/~niyogi/papersps/ssminimax2.pdf.Niyogi, P. (2008). Manifold regularization and semi-supervised learning: Some theoretical analyses. Technical Report TR-2008-01, Computer Science Dept., Univ. Chicago. Available at http://people.cs.uchicago.edu/~niyogi/papersps/ssminimax2.pdf.

20.

Ratsaby, J. and Venkatesh, S. S. (1995). Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of the Eighth Annual Conference on Computational Learning Theory 412–417. ACM, New York.Ratsaby, J. and Venkatesh, S. S. (1995). Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of the Eighth Annual Conference on Computational Learning Theory 412–417. ACM, New York.

21.

Rigollet, P. (2007). Generalized error bounds in semi-supervised classification under the cluster assumption. J. Mach. Learn. Res. 8 1369–1392. MR2332435Rigollet, P. (2007). Generalized error bounds in semi-supervised classification under the cluster assumption. J. Mach. Learn. Res. 8 1369–1392. MR2332435

22.

Sajama andOrlitsky, A. (2005). Estimating and computing density based distance metrics. In Proceedings of the 22nd International Conference on Machine Learning. ICML 2005 760–767. ACM, New York.Sajama andOrlitsky, A. (2005). Estimating and computing density based distance metrics. In Proceedings of the 22nd International Conference on Machine Learning. ICML 2005 760–767. ACM, New York.

23.

Singh, A., Nowak, R. D. and Zhu, X. (2008). Unlabeled data: Now it helps, now it doesn’t. Technical report, ECE Dept., Univ. Wisconsin–Madison. Available at http://www.cs.cmu.edu/~aarti/pubs/SSL_TR.pdf.Singh, A., Nowak, R. D. and Zhu, X. (2008). Unlabeled data: Now it helps, now it doesn’t. Technical report, ECE Dept., Univ. Wisconsin–Madison. Available at http://www.cs.cmu.edu/~aarti/pubs/SSL_TR.pdf.

24.

Sinha, K. and Belkin, M. (2009). Semi-supervised learning using sparse eigenfunction bases. In Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.) 1687–1695. MIT Press, Cambridge, MA.Sinha, K. and Belkin, M. (2009). Semi-supervised learning using sparse eigenfunction bases. In Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.) 1687–1695. MIT Press, Cambridge, MA.

Citation Download Citation

Martin Azizyan, Aarti Singh, and Larry Wasserman "Density-sensitive semisupervised inference," The Annals of Statistics 41(2), 751-771, (April 2013). https://doi.org/10.1214/13-AOS1092

Published: April 2013

Access the abstract

JOURNAL ARTICLE
21 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY