The Annals of Applied Statistics

The discriminative functional mixture model for a comparative analysis of bike sharing systems

Charles Bouveyron, Etienne Côme, and Julien Jacques

Full-text: Open access


Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work was motivated by interest in analyzing and comparing several European BSSs to identify common operating patterns in BSSs and to propose practical solutions to avoid potential issues. Our approach relies on the identification of common patterns between and within systems. To this end, a model-based clustering method, called FunFEM, for time series (or more generally functional data) is developed. It is based on a functional mixture model that allows the clustering of the data in a discriminative functional subspace. This model presents the advantage in this context to be parsimonious and to allow the visualization of the clustered systems. Numerical experiments confirm the good behavior of FunFEM, particularly compared to state-of-the-art methods. The application of FunFEM to BSS data from JCDecaux and the Transport for London Initiative allows us to identify 10 general patterns, including pathological ones, and to propose practical improvement strategies based on the system comparison. The visualization of the clustered data within the discriminative subspace turns out to be particularly informative regarding the system efficiency. The proposed methodology is implemented in a package for the R software, named funFEM, which is available on the CRAN. The package also provides a subset of the data analyzed in this work.

Article information

Ann. Appl. Stat., Volume 9, Number 4 (2015), 1726-1760.

Received: July 2014
Revised: April 2015
First available in Project Euclid: 28 January 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Model-based clustering functional data dimension reduction open data bike sharing systems


Bouveyron, Charles; Côme, Etienne; Jacques, Julien. The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann. Appl. Stat. 9 (2015), no. 4, 1726--1760. doi:10.1214/15-AOAS861.

Export citation


  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716–723.
  • Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Stat. Comput. 22 455–470.
  • Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33–73.
  • Borgnat, P., Robardet, C., Rouquier, J. B., Parice, A., Fleury, E. and Flandrin, P. (2011). Shared bicycles in a city: A signal processing and data analysis perspective. Adv. Complex Syst. 14 1–24.
  • Bouveyron, C. and Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22 301–324.
  • Bouveyron, C. and Brunet, C. (2014). Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput. Statist. 29 489–513.
  • Bouveyron, C., Girard, S. and Schmid, C. (2007). High-dimensional data clustering. Comput. Statist. Data Anal. 52 502–519.
  • Bouveyron, C. and Jacques, J. (2011). Model-based clustering of time series in group-specific functional subspaces. Adv. Data Anal. Classif. 5 281–300.
  • Cadima, J. and Jolliffe, I. T. (1995). Loadings and correlations in the interpretation of principal components. J. Appl. Stat. 22 203–214.
  • Côme, E. and Oukhellou, L. (2014). Model-based count series clustering for bike-sharing system usage mining, a case study with the Vélib system of Paris. Transportation Research—Part C Emerging Technologies 22 88.
  • Dell’Olio, L., Ibeas, A. and Moura, J. L. (2011). Implementing bike-sharing systems. In ICE—Municipal Engineer 164 89–101. ICE publishing, London.
  • Duda, R. O., Hart, P. E. and Stork, D. G. (2001). Pattern Classification, 2nd ed. Wiley, New York.
  • Escabias, M., Aguilera, A. M. and Valderrama, M. J. (2005). Modeling environmental data by functional principal component logistic regression. Environmetrics 16 95–107.
  • Ferraty, F. and Vieu, P. (2003). Curves discrimination: A nonparametric functional approach. Comput. Statist. Data Anal. 44 161–173.
  • Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 179–188.
  • Fraley, C. and Raftery, A. (1999). MCLUST: Software for model-based cluster analysis. J. Classification 16 297–306.
  • Froehlich, J., Neumann, J. and Oliver, N. (2008). Measuring the pulse of the city through shared bicycle programs. In International Workshop on Urban, Community, and Social Applications of Networked Sensing Systems. UrbanSense08 16–20. Raleigh, NC.
  • Froehlich, J., Neumann, J. and Oliver, N. (2009). Sensing and predicting the pulse of the city through shared bicycling. In 21st International Joint Conference on Artificial Intelligence, IJCAI’09 1420–1426. AAAI Press, Menlo Park, CA.
  • Frühwirth-Schnatter, S. and Kaufmann, S. (2008). Model-based clustering of multiple time series. J. Bus. Econom. Statist. 26 78–89.
  • Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, 2nd ed. Academic Press, Boston, MA.
  • Giacofci, M., Lambert-Lacroix, S., Marot, G. and Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics 69 31–40.
  • Heard, N. A., Holmes, C. C. and Stephens, D. A. (2006). A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J. Amer. Statist. Assoc. 101 18–29.
  • Ieva, F., Paganoni, A. M., Pigoli, D. and Vitelli, V. (2013). Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J. R. Stat. Soc. Ser. C. Appl. Stat. 62 401–418.
  • Jacques, J. and Preda, C. (2013). Funclust: A curves clustering method using functional random variable density approximation. Neurocomputing 112 164–171.
  • Jacques, J. and Preda, C. (2014). Model-based clustering for multivariate functional data. Comput. Statist. Data Anal. 71 92–106.
  • James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397–408.
  • Kahle, D. and Wickham, H. (2013). ggmap: Spatial visualization with ggplot2. The R Journal 5 144–161.
  • Lathia, N., Saniul, A. and Capra, L. (2012). Measuring the impact of opening the London shared bicycle scheme to casual users. Transportation Research Part C: Emerging Technologies 22 88–102.
  • Lévéder, C., Abraham, P. A., Cornillon, E., Matzner-Lober, E. and Molinari, N. (2004). Discrimination de courbes de prÈtrissage. In ChimiomÈtrie 2004 37–43.
  • Lin, J. R. and Yang, T. (2011). Strategic design of public bicycle sharing systems with service level constraints. Transportation Research Part E: Logistics and Transportation Review 47 284–294.
  • Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications. IMS, Hayward, CA.
  • Olszewski, R. T. (2001). Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.
  • Preda, C. (2007). Regression models for functional data by reproducing kernel Hilbert spaces methods. J. Statist. Plann. Inference 137 829–840.
  • Preda, C., Saporta, G. and Lévéder, C. (2007). PLS classification of functional data. Comput. Statist. 22 223–235.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Ray, S. and Lindsay, B. G. (2008). Model selection in high dimensions: A quadratic-risk-based approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 95–118.
  • Ray, S. and Mallick, B. (2006). Functional clustering by Bayesian wavelet methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 305–332.
  • Samé, A., Chamroukhi, F., Govaert, G. and Aknin, P. (2011). Model-based clustering and segmentation of time series with changes in regime. Adv. Data Anal. Classif. 5 301–321.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Vogel, P., Greiser, T. and Mattfeld, D. C. (2011). Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia—Social and Behavioral Sciences 20 514–523.
  • Vogel, P. and Mattfeld, D. C. (2011). Strategic and operational planning of bike-sharing systems by data mining—A case study. In ICCL 127–141. Springer, Berlin.
  • Xi, X., Keogh, E., Shelton, C., Wei, L. and Ratanamahatana, C. A. (2006). Fast time series classification using numerosity reduction. In 23rd International Conference on Machine Learning (ICML 2006) 1033–1040.