## The Annals of Statistics

### Density-sensitive semisupervised inference

#### Abstract

Semisupervised methods are techniques for using labeled data $(X_{1},Y_{1}),\ldots,(X_{n},Y_{n})$ together with unlabeled data $X_{n+1},\ldots,X_{N}$ to make predictions. These methods invoke some assumptions that link the marginal distribution $P_{X}$ of $X$ to the regression function $f(x)$. For example, it is common to assume that $f$ is very smooth over high density regions of $P_{X}$. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution $P_{X}$. Our model includes a parameter $\alpha$ that controls the strength of the semisupervised assumption. We then use the data to adapt to $\alpha$.

#### Article information

Source
Ann. Statist. Volume 41, Number 2 (2013), 751-771.

Dates
First available in Project Euclid: 8 May 2013

http://projecteuclid.org/euclid.aos/1368018172

Digital Object Identifier
doi:10.1214/13-AOS1092

Zentralblatt MATH identifier
1267.62057

Mathematical Reviews number (MathSciNet)
MR3099120

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62G07: Density estimation

#### Citation

Azizyan, Martin; Singh, Aarti; Wasserman, Larry. Density-sensitive semisupervised inference. Ann. Statist. 41 (2013), no. 2, 751--771. doi:10.1214/13-AOS1092. http://projecteuclid.org/euclid.aos/1368018172.

#### References

• Azizyan, M., Singh, A. and Wasserman, L. (2013). Supplement to “Density-sensitive semisupervised inference.” DOI:10.1214/13-AOS1092SUPP.
• Belkin, M. and Niyogi, P. (2004). Semi-supervised learning on Riemannian manifolds. Machine Learning 56 209–239.
• Ben-David, S., Lu, T. and Pal, D. (2008). Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In 21st Annual Conference on Learning Theory (COLT). Available at http://www.informatik.uni-trier.de/~ley/db/conf/colt/colt2008.html.
• Bijral, A., Ratliff, N. and Srebro, N. (2011). Semi-supervised learning with density based distances. In 27th Conference on Uncertainty in Artificial Intelligence. Available at http://auai.org/uai2011/accepted.html.
• Bousquet, O., Chapelle, O. and Hein, M. (2004). Measure based regularization. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.
• Castelli, V. and Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recognition Letters 16 105–111.
• Castelli, V. and Cover, T. M. (1996). The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inform. Theory 42 2102–2117.
• Craig, C. C. (1933). On the Tchebychef inequality of Bernstein. Ann. Math. Statist. 4 94–102.
• Culp, M. (2011a). On propagated scoring for semisupervised additive models. J. Amer. Statist. Assoc. 106 248–259.
• Culp, M. (2011b). spa: Semi-supervised semi-parametric graph-based estimation in R. Journal of Statistical Software 40 1–29.
• Culp, M. and Michailidis, G. (2008). An iterative algorithm for extending learners to a semi-supervised setting. J. Comput. Graph. Statist. 17 545–571.
• Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
• Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036–4048.
• Kpotufe, S. (2011). $k$-NN regression adapts to local intrinsic dimension. In Advances in Neural Information Processing Systems 24 729–737. MIT Press, Cambridge, MA.
• Lafferty, J. and Wasserman, L. (2007). Statistical analysis of semi-supervised regression. In Advances in Neural Information Processing Systems 20 801–808. MIT Press, Cambridge, MA.
• Lee, A. B. and Wasserman, L. (2008). Spectral connectivity analysis. Preprint. Available at arXiv:0811.0121.
• Liang, F., Mukherjee, S. and West, M. (2007). The use of unlabeled data in predictive modeling. Statist. Sci. 22 189–205.
• Nadler, B., Srebro, N. and Zhou, X. (2009). Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data. In Advances in Neural Information Processing Systems 22 1330–1338. MIT Press, Cambridge, MA.
• Niyogi, P. (2008). Manifold regularization and semi-supervised learning: Some theoretical analyses. Technical Report TR-2008-01, Computer Science Dept., Univ. Chicago. Available at http://people.cs.uchicago.edu/~niyogi/papersps/ssminimax2.pdf.
• Ratsaby, J. and Venkatesh, S. S. (1995). Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of the Eighth Annual Conference on Computational Learning Theory 412–417. ACM, New York.
• Rigollet, P. (2007). Generalized error bounds in semi-supervised classification under the cluster assumption. J. Mach. Learn. Res. 8 1369–1392.
• Sajama andOrlitsky, A. (2005). Estimating and computing density based distance metrics. In Proceedings of the 22nd International Conference on Machine Learning. ICML 2005 760–767. ACM, New York.
• Singh, A., Nowak, R. D. and Zhu, X. (2008). Unlabeled data: Now it helps, now it doesn’t. Technical report, ECE Dept., Univ. Wisconsin–Madison. Available at http://www.cs.cmu.edu/~aarti/pubs/SSL_TR.pdf.
• Sinha, K. and Belkin, M. (2009). Semi-supervised learning using sparse eigenfunction bases. In Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.) 1687–1695. MIT Press, Cambridge, MA.

#### Supplemental materials

• Supplementary material: Supplement to “Density-sensitive semisupervised inference”. Contains technical details, proofs and extensions.