## Electronic Journal of Statistics

### Nonparametric clustering of functional data using pseudo-densities

#### Abstract

We study nonparametric clustering of smooth random curves on the basis of the $L^{2}$ gradient flow associated to a pseudo-density functional and we discuss the conditions under which the clustering is well-defined both at the population and at the sample level. We provide an algorithm to idenify significant local modes of the estimated pseudo-density, which are associated to informative sample clusters, and we prove its consistency and other statistical properties. Our theory is developed under weak assumptions, which essentially reduce to the integrability of the random curves. If the underlying probability distribution is supported on a finite-dimensional subspace, we show that the proposed pseudo-density functional and the expectation of a kernel density estimator induce the same gradient flow, hence the same population clustering. Although our theory is developed for smooth curves that belong to a potentially infinite-dimensional functional space, we provide consistent procedures that can be used with real functional data (discretized and noisy curves). We illustrate these procedures by means of applications both on simulated and real datasets.

#### Article information

Source
Electron. J. Statist. Volume 10, Number 2 (2016), 2922-2972.

Dates
First available in Project Euclid: 31 October 2016

https://projecteuclid.org/euclid.ejs/1477918853

Digital Object Identifier
doi:10.1214/16-EJS1198

#### Citation

Ciollaro, Mattia; Genovese, Christopher R.; Wang, Daren. Nonparametric clustering of functional data using pseudo-densities. Electron. J. Statist. 10 (2016), no. 2, 2922--2972. doi:10.1214/16-EJS1198. https://projecteuclid.org/euclid.ejs/1477918853

#### References

• Ambrosetti, A. and Prodi, G. (1995)., A Primer of Nonlinear Analysis 34. Cambridge University Press.
• Bongiorno, E. and Goia, A. (2015). A clustering method for Hilbert functional data based on the small ball probability., arXiv preprint arXiv:1501.04308.
• Carreira-Perpiñán, M. Á. (2006). Fast nonparametric clustering with Gaussian blurring mean-shift. In, Proceedings of the 23rd International Conference on Machine Learning 153–160.
• Chacón, J. E. (2015). A population background for nonparametric density-based clustering., Statistical Science 30 518–532.
• Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2014). Robust topological inference: distance to a measure and kernel distance., arXiv preprint arXiv:1412.7197.
• Cheng, Y. (1995). Mean shift, mode seeking, and clustering., IEEE Transactions on Pattern Analysis and Machine Intelligence 17 790–799.
• Ciollaro, M., Genovese, C. R., Lei, J. and Wasserman, L. (2014). The mean-shift algorithm for mode hunting and clustering in infinite dimensions., arXiv preprint arXiv:1408.1187.
• Comaniciu, D., Ramesh, V. and Meer, P. (2001). The variable bandwidth mean shift and data-driven scale selection. In, Proceedings of the Eighth IEEE International Conference on Computer Vision 1 438–445.
• Cuevas, A. (2014). A partial overview of the theory of statistics with functional data., Journal of Statistical Planning and Inference 147 1–23.
• Delaigle, A. and Hall, P. (2010). Defining probability density for a distribution of random functions., The Annals of Statistics 38 1171–1193.
• Evans, L. C. (1998)., Partial Differential Equations. American Mathematical Society.
• Ferraty, F., Kudraszow, N. and Vieu, P. (2012). Nonparametric estimation of a surrogate density function in infinite-dimensional spaces., Journal of Nonparametric Statistics 24 447–464.
• Ferraty, F. and Romain, Y. (2011)., The Oxford Handbook of Functional Data Analaysis. Oxford University Press.
• Ferraty, F. and Vieu, P. (2006)., Nonparametric Functional Data Analysis: Theory and Practice. Springer.
• Fukunaga, K. and Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition., IEEE Transactions on Information Theory 21 32–40.
• Gasser, T., Hall, P. and Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves., Journal of the Royal Statistical Society, Series B 60 681–691.
• Gasser, T. and Müller, H.-G. (1984). Estimating regression functions and their derivatives by the kernel method., Scandinavian Journal of Statistics 171–185.
• Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2016). Non-parametric inference for density modes., Journal of the Royal Statistical Society, Series B 78 99–126.
• Goia, A. and Vieu, P. (2015). An introduction to recent advances in high/infinite dimensional statistics., Journal of Multivariate Analysis.
• Hall, P. and Heckman, N. E. (2002). Estimating and depicting the structure of a distribution of random functions., Biometrika 89 145–158.
• Horváth, L. and Kokoszka, P. (2012)., Inference for Functional Data with Applications. Springer.
• Hsing, T. and Eubank, R. (2015)., Theoretical Foundations of Functional Data Analysis, With an Introduction to Linear Operators. John Wiley & Sons.
• Hunter, J. K. and Nachtergaele, B. (2001)., Applied Analysis. World Scientific.
• Jacques, J. and Preda, C. (2013). Functional data clustering: a survey., Advances in Data Analysis and Classification 1–25.
• Jost, J. (2011)., Riemannian Geometry and Geometric Analysis. Springer.
• Kudraszow, N. L. and Vieu, P. (2013). Uniform consistency of kNN regressors for functional variables., Statistics & Probability Letters 83 1863–1870.
• Ramsay, J. O. and Silverman, B. W. (2005)., Functional Data Analysis. Springer.
• Schechter, M. (2004)., An Introduction to Nonlinear Analysis. Cambridge University Press.
• Scott, D. W. (2015)., Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.
• Shiryayev, A. N. (1993)., Selected Works of A. N. Kolmogorov: Volume III: Information Theory and the Theory of Algorithms 27. Springer Science & Business Media.
• Taylor, D. M., Tillery Helms, S. I. and Schwartz, A. B. (2002). Direct cortical control of 3D neuroprosthetic devices., Science 296 1829–1832.
• Yurinskiĭ, V. V. (1976). Exponential inequalities for sums of random vectors., Journal of Multivariate Analysis 6 473–499.
• Zhang, J.-T. (2013)., Analysis of Variance for Functional Data. CRC Press.