Open Access
2010 Regularization Techniques for Machine Learning on Graphs and Networks with Biological Applications
Charles DeLisi, Yue Fan, Shinuk Kim, Mark Kon, Louise Raphael
Commun. Math. Anal. 8(3): 136-145 (2010).
Abstract

The representation of a high dimensional machine learning (ML) feature space $F$ as a function space for the purpose of denoising data is introduced. We illustrate an application of such a representation of feature vectors by applying a local averaging denoising method for functions on Euclidean and metric spaces (together with its graph generalization) to the regularization of feature vectors in ML. We first discuss this technique for noisy functions on $\mathbb{R}$, and then extend it to functions defined on graphs and networks. This method exhibits a paradoxical property of the bias-variance problem in machine learning, namely, that as the scale over which averages are taken decreases, the error rate for classification first decreases and then increases. This approach is tested on two benchmark DNA microarray data sets used for classification of breast tumors based on predicted metastasis.

References

1.

H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, and T. Ideker, Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3 (2007), pp 140-149. H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, and T. Ideker, Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3 (2007), pp 140-149.

2.

R. R. Coifman and D. L. Donoho, Translation invariant de-noising. Wavelets and Statistics A. Antoniadis and G. Oppenheim, Eds., Springer-Verlag Lecture Notes (1995). R. R. Coifman and D. L. Donoho, Translation invariant de-noising. Wavelets and Statistics A. Antoniadis and G. Oppenheim, Eds., Springer-Verlag Lecture Notes (1995).

3.

F. Cuker and S. Smale, Best Choices for Regularization Parameters in Learning Theory: On the Bias-Variance Problem. Found. Comput. Math., 2 (2002), pp 413-428.  MR1930945 10.1007/s102080010030 F. Cuker and S. Smale, Best Choices for Regularization Parameters in Learning Theory: On the Bias-Variance Problem. Found. Comput. Math., 2 (2002), pp 413-428.  MR1930945 10.1007/s102080010030

4.

I. S. Dhillon, Y. Guan, and B. Kulis, Weighted graph cuts without eigenvectors a multilevel approach. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29 (Nov. 2007), pp. 1944-1957. I. S. Dhillon, Y. Guan, and B. Kulis, Weighted graph cuts without eigenvectors a multilevel approach. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29 (Nov. 2007), pp. 1944-1957.

5.

S. Geman, E. Bienenstock and R. Doursat, Neural Networks and the Bias/Variance Dilemma. Neural Computation 4 (1992), pp 1-58. S. Geman, E. Bienenstock and R. Doursat, Neural Networks and the Bias/Variance Dilemma. Neural Computation 4 (1992), pp 1-58.

6.

W. Härdle, G. Kerkyacharian, D. Picard and A. Tsybakov, Wavelets, Approximation and Statistical Applications (1995) Springer-Verlag.  MR1618204 W. Härdle, G. Kerkyacharian, D. Picard and A. Tsybakov, Wavelets, Approximation and Statistical Applications (1995) Springer-Verlag.  MR1618204

7.

P. D. Lax, Functional Analysis (2002) Wiley-Interscience.  MR1892228 1009.47001 P. D. Lax, Functional Analysis (2002) Wiley-Interscience.  MR1892228 1009.47001

8.

E. Lee, H.-Y. Chuang, J.-W. Kim, T. Ideker, and D. Lee, Inferring pathway activity toward precise disease classification. PLoS Computational Biology. 4:e1000217 (2008). E. Lee, H.-Y. Chuang, J.-W. Kim, T. Ideker, and D. Lee, Inferring pathway activity toward precise disease classification. PLoS Computational Biology. 4:e1000217 (2008).

9.

L. Matthews, G. Gopinath, M. Gillespie, M. Caudy, D. Croft, B. de Bono, P. Garapati, J. Hemish, H. Hermjakob, B. Jassal, A. Kanapin, S. Lewis, S. Mahajan, B. May, E. Schmidt, I. Vastrik, G. Wu, E. Birney, L. Stein and P. D'Eustachio, Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res. 3 (Nov. 2008). L. Matthews, G. Gopinath, M. Gillespie, M. Caudy, D. Croft, B. de Bono, P. Garapati, J. Hemish, H. Hermjakob, B. Jassal, A. Kanapin, S. Lewis, S. Mahajan, B. May, E. Schmidt, I. Vastrik, G. Wu, E. Birney, L. Stein and P. D'Eustachio, Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res. 3 (Nov. 2008).

10.

J. A. Olson, J. R. Marks, H. K. Dressman, M. West, and J. R. Nevins, Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439 (2006), pp 353-357. J. A. Olson, J. R. Marks, H. K. Dressman, M. West, and J. R. Nevins, Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439 (2006), pp 353-357.

11.

F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J-P. Vert, Classification of microarray data using gene networks. BMC Bioinformatics. 8:35 (2007). F. Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, and J-P. Vert, Classification of microarray data using gene networks. BMC Bioinformatics. 8:35 (2007).

12.

S. Razick, G. Magklaras, and I. M. Donaldson, iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 9 (2008). S. Razick, G. Magklaras, and I. M. Donaldson, iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 9 (2008).

13.

M. J. van de Vijver, Y. D. He, L. J. van't Veer, A. A. Dai H, Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend et al, A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347 (2002) pp 1999-2009. M. J. van de Vijver, Y. D. He, L. J. van't Veer, A. A. Dai H, Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend et al, A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347 (2002) pp 1999-2009.

14.

Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look, F. Yang, D. Talantov, M. Timmermans, M. E. Meijer-van Gelder, J. Yu, T. Jatkoe, E. M. Berns, D. Atkins, and J. A. Foekens, Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365 (2005) pp 671-679. Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look, F. Yang, D. Talantov, M. Timmermans, M. E. Meijer-van Gelder, J. Yu, T. Jatkoe, E. M. Berns, D. Atkins, and J. A. Foekens, Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365 (2005) pp 671-679.

15.

S. Yang and E. D. Kolaczyk, Target detection via network filtering. eprint arXiv:0902.3714 (2009).  0902.3714 MR2729798 10.1109/TIT.2010.2043770 S. Yang and E. D. Kolaczyk, Target detection via network filtering. eprint arXiv:0902.3714 (2009).  0902.3714 MR2729798 10.1109/TIT.2010.2043770

16.

B. Zhang and S. Horvath, A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology. 4 (2005).  MR2170433 1077.92042 10.2202/1544-6115.1128 B. Zhang and S. Horvath, A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology. 4 (2005).  MR2170433 1077.92042 10.2202/1544-6115.1128
Copyright © 2010 Mathematical Research Publishers
Charles DeLisi, Yue Fan, Shinuk Kim, Mark Kon, and Louise Raphael "Regularization Techniques for Machine Learning on Graphs and Networks with Biological Applications," Communications in Mathematical Analysis 8(3), 136-145, (2010). https://doi.org/
Published: 2010
Vol.8 • No. 3 • 2010
Back to Top